16 KiB
layout | title | date | cover_image | authors | |||
---|---|---|---|---|---|---|---|
post | Bybit incident report and mitigating controls | 2025-03-20 | /assets/images/whale_shark.jpg |
|
The Bybit incident is an example of a nation state actor using a series of sophisticated attacks to compromise high value targets. When the value at stake is such that it justifies spending funds on buying 0-days, in some cases multiples, and combining them into elaborate exploit chains, attacking multiple different layers of the tech stack, highly targetted social engineering, compromise of individuals, planting of moles or even phsyical attacks, the threat model which needs to be assume to adequately address risks needs to be extreme.
Threat Model Assumptions
The assumptions we make about nation state actors at Distrust:
- All screens are visible to the adversary
- All keyboards are logging to the adversary
- Any firmware/boot-loaders not verified on every boot are compromised
- Any host OS with network access is compromised
- Any guest OS used for any purpose other than production access is compromised
- At least one member of the Production Team is always compromised
- At least one maintainer of third party code used in the system is compromised
- At least one member of third party system used is compromised
- Physical attacks are viable and likely
- Side-channel attacks are viable and likely
The suggested mitigating controls following in this report consist of tools which we developed to address exactly this type of threat model and are at varying levels of maturity. The good news is the reference designs and concepts are available to you today, but some of the tooling needs more work - so if you care about these issues and want to help us complete the work on the missing pieces, please talk to us.
The Method
This report highlight the major single points of failure, which rely on a single individual and/or computer, thus creating an opportunity for compromise. Blockchains benefit from security of the network via strong cryptography and decentralization. More "traditional" parts of the infrastructure historically have not had the ability to distribute trust, but with some clever tactics we can achieve a decentralization of trust which helps us ensure that no single individual or computer can compromise a system.
Root Cause Analysis and Mitigating Controls
I. Developer Workstation Compromise
Earliest known malicious activity was identified, when a developer’s Mac OS workstation was compromised, likely through social engineering. (Sygnia report)
Primary Mitigation
Day-to-day work machines should not be used for production access / managing tokens for production access. This is an operational security shortcoming, as any interaction with production systems, whether via an API token, or web interface should be done via a dedicated computer or highly isolated environment (hardware-based virtualization like QubesOS/Xen preferred) with minimal dependencies only used for carrying out production tasks. Any interactions outside of production related tasks create opportunities for the system to be compromised - downloading and opening files, downloading and running software libraries (such as Docker which was the source of malware in this case), visiting websites (yes, the browser sandbox can be broken) etc.
Advanced Mitigation
Another way to mitigate this risk is to use a hardened server, such as a secure enclave, which is immutable, and can remotely attest to the code it's running. Setting up that server to only deploy code that's signed by x trusted PGP (or other signing algorithms) can achieve a state where no single individual has the ability to modify the infrastructure.
-
Use EnclaveOS - a minimal and immutable operating system for running security critical software with high accountability on secure enclaves. EnclaveOS can also be extended to support multi-party management of secrets such that no person can control them alone. This can be used to set up secure enclave which acts as the deployment system. EnclaveOS is a reference implementation, but we are happy to help invest energy into making this tool easier to use for everyone.
-
Use Bootproof alongside EnclaveOS to prove which software booted on a given system by leveraging platform hardware or firmware remote attestation technologies. This tool is designed but not yet in development. Currently EnclaveOS can be used with Nitro VMs on AWS with some work to achieve remote attestation - and several Distrust clients are using this setup in production today. Our team would be happy to invest energy to develop this tooling if anyone is willing to help fund it. It would unlock use of general hardware like TPMs and other remote attestation technologies to allow deploying remote attestation setups to different cloud platforms for more security via diversity.
Additional notes
-
This isn't the first time an attack like this happened. Those who have been around for a while will remember the Axie Infinity Hack which also happend due to compromise of a developer who used their day to day machine for managing cryptographic material and accessing production systems.
-
The use of tools like Mobile Device Management on systems for production access is not recommended, as they create a single point of failure. Most MDM solutions mean that a third party has complete access to the fleet of computers it's "protecting", and even self hosted creates a large single point of failure which is challenging to mitigate to a resonable degree. Instead, the approach should rely on making the surface area for attack so minimal that introducing anything else introduces more risk than benefit. For illustrative pruposes, imagine a hardware-based virtual machine which only has a minimal operating system, the CLI tool for the preferred cloud platform, and a network interface which has a firewall configuration permitting only connections to a specific production asset. If this sytem is only used for accessing that specific asset, the introduction of anything additional, including an MDM or anti-malware/anti-virus software, actually increases the surface area for attack. Of course, this is a stepping stone to improve controls around accessing production systems until better mitigating controls can be put in place, making it impossible to directly interact with and change production systems as an individual.
-
Additional resiliency can be achieved by deploying a system for deployment across multiple accouts with different ownership or even different cloud platforms. This is out of scope of this report which focuses on mitigating controls where most companies should start their journey to improve their supply chain security.
-
It is also worth noting that it appears a Docker container with network connectivity was used to compromise a developer's machine initially. This points to an often overlooked issue, which is that Docker is not a secure containerization technology, as it makes it fairly trivial to move files across the container boundary, as part of its design. This is useful for some usecases but not for strict isolation - which should instead rely on hardware-based virtualization.
II. JavaScript Code Tampering
Preliminary incident reports by both Sygnia and Verichains were shared by Bybit’s CEO, Ben Zhou in his X post. Both reports highlighted the same attack vector – the modification of JavaScript resources directly on the S3 bucket serving the domain app.safe[.]global. (Sygnia repors)
Primary Mitigation
Ensure that the bucket / server serving the website can not be modified by a single individual. Set up immutable infrastructure by deploying software using a hardened server—such as an enclave—that only serves software reproduced across multiple systems and signed by a set of trusted parties. The software is then deployed to an immutable server or bucket for secure delivery to clients. The main risk to mitigate here is the "root" access account controlling the infrastructure. However, secure enclaves and remote attestation can effectively reduce this risk (EnclaveOS + Bootproof).
Advanced Mitigation
-
Leverage bit-for-bit reproducibility to ensure that the software being delivered has not been tampered. In the case of JS code, which is not compiled but interpreted, the source code can be reviewed, and hashed to have a way for checking integrity of the code. This process of hashing should be done in trusted isolated environments, and ideally on multiple machines to ensure that no single computer has the ability to tamper with the code.
- This video (4:30-6:30) explains how reproducibility helps protect the integrity of software. For those new to reproduction and determinism, it's advised to watch the whole video.
-
This attack vector actually extends to all underlying software used in the build environment, such as the different libraries, as well as the compiler. To maximally mitigate this risk, a bootstrapped compiler should be used, and all software including the compiler itself should be built deterministically to close off tampering attack vectors across the whole foundation of software used in build environments. This allows one to reproduce the identical bit-for-bit binary in diverse environments (different OS, different chipset, different cloud platform, different access etc.), and ensure that the is still exactly the same - proving there has been no tampering.
- Use StageX to reproduce your software and close off compiler and environment risks. StageX is a minimalism and security first repository of reproducible and multi-signed OCI images of common open source software toolchains full-source bootstrapped from Stage 0 all the way up. It's currently actively being used by Talos Linux, Mysten Labs (SUI) and Turnkey to name a few of the widely known projects.
- Use ReprOS to help with reproduction. It's a bare-bones immutable OS designed for securely reproducing and signing software. Each build is executed in a one-time use environment, eliminating persistent risks. It is in currently in beta testing. This project is currently in beta.
Additional Notes
All third party code should be manually reviewed. Currently most companies rely on Static Application Security Testing tools. This is not enough, as SAST tools are unable to detect novel exploits. The cost of using open source code, at a minimum, should be to review every line of code manually. If companies are so stringent about having developers review their first party code, why do companies choose to not apply the same principles to third party code? It is burdensome, but necessary for high risk targets. If you're unfamiliar a good example of what's possible with supply chain attacks is the xz backdoor.
- Distrust's answer to this is SigRev, which helps harness the power of nerds to create a repository of signed reports for reviews of open source software. The idea is that companies can come together to fund review of common open source software, to save money, and simultaneously help secure Open Source software. SIgRev has been designed, but is not yet in development and is seeking funding.
III. Compromise of WebUI
Bybit initiated a transaction from the targeted cold wallet using Safe{Wallet}’s web interface. The transaction was manipulated, and the attackers siphoned the funds from the cold wallets. (Sygnia report)
Primary Mitigation
Initializing transactions from a WebUI leaves a lot of surface area for the attack as browsers are known for being difficult to protect. This is due to the nature of what a browser is - a window into the open internet. Additionally, the v8 engine which is the backbone of most browsers is an immensly complex and difficult surface area to defend, resulting in frequent 0-day vulnerabilities, as well as supply chain issues.
-
Do not sign transactions involving large sums in a browser.
-
Use offline trusted environments for signing, to protect key material, and mitigate the risk of a compromised UI displaying incorrect information. In the case of the ByBit hack in particular, preventing the JS tampering would have mitigated this risk, but other supply chain attack vectors which can achieve the same outcome remain (extensions, v8 engine 0-day exploits etc.). By using a minimal set of CLI tools to sign transactions offline, the WebUI compromise would have been avoided.
- Use AirgapOS, which is an immutable, diskless OS used for offline secret management and operations. It is a swiss-army knife which essentially turns a laptop into a hardware wallet. Some modifications for the laptop are required such as removing radio cards from the laptop. Inside of it are keyfork and Icepick which are tools for generating and managing entropy which can be derived for different cryptographic algorithms, as well as for cryptographic signing operations. Keyfork and Icepick are both extremely minimal and written in rust and currently support Solana, Pyth, Cosmos, Kyve and Seda as we received funding to implement those, but can be extended to support other chains - we are currently working on Bitcoin, but would be happy to add support for Ethereum as well - again this is not a political decision, we just had individual sponsor implementing support for those blockchains first. These three tools are all being used in production today by multiple clients, and have been audited by several security firms whose reports can be found in the respective repositories.
Extras
We have noticed that many companies still neglect basic security hygiene practices that apply to everyone and could meaningfully improve the security of systems with relatively little effort.
-
Adopt FIDO2 as MFA wherever possible and avoid using SMS, TOTP, Yubico OTP, email codes and push notifications. If your provider doesn't offer FIDO2, you should ask them why as it's objectively the best type of MFA currently available.
-
Use smart cards for FIDO2, and for managing PGP keys which can be used for signing commits, and ssh access. We built tooling and guides which makes it easy to provision PGP keys and load them onto smart cards. Signing commits is helpful as it can help protect modification of code via attacks like commit spoofing, and keeping the SSH key securely inside of a smart card is akin to keeping seed phrases safely stored in HSMs.
Summary
The Distrust team has helped build and secure some of the highest risks systems in the world such as the vaulting systems at BitGo, Unit410, and Turnkey as well as helping electrical grid operators, industrial control system operators and other. Through working with companies that are exposed to the most sophisticated known attackers where all attacks are viable, Distrust developed a methodology to help mitigate this level of threat. We are now using our hard learned lessons to help everyone improve their security posture by open sourcing all our learnings and creating open source tooling everyone can benefit from.