164 lines
13 KiB
Markdown
164 lines
13 KiB
Markdown
---
|
||
layout: post
|
||
title: Safe{Wallet}/Bybit incident report and mitigating controls
|
||
date: 2025-03-20
|
||
---
|
||
|
||
The Safe{Wallet}/Bybit incident is an example of a nation-state actor executing a series of sophisticated, multi-layered attacks on high-value targets. In cases where the potential gain is significant, it may be justified for attackers to invest in multiple 0-day vulnerabilities and chain them into elaborate exploit sequences. These campaigns often span multiple layers of tech stack, involve precision-targeted social engineering, insider compromise, or even physical infiltration.
|
||
|
||
As such, threat model required to defend against this level of aversary must be extreme. It demands defenders adopt a much more rigorous set of assumptions about attacher capabilities and invest time in implementing controls that typical organizations may not need. When protecting high value assets, the game changes.
|
||
|
||
### Threat Model Assumptions
|
||
|
||
At Distrust, we operate under the assumption that nation-state actors are persistent, highly resourced, and capable of compromising nearly any layer of the system. Accordingly, our threat model assumes:
|
||
|
||
* All screens are visible to the adversary
|
||
* All keyboard input is being logged by the adversary
|
||
* Any firmware or bootloader not verified on every boot is considered compromised
|
||
* Any host OS with network access is compromised
|
||
* Any guest OS used for non-production purposes is compromised
|
||
* At least one member of the Production Team is compromised
|
||
* At least one maintainer of third party code used in the system is compromised
|
||
* At least one member of third party system used in production is compromised
|
||
* Physical attacks are viable and likely
|
||
* Side-channel attacks are viable and likely
|
||
|
||
These assumptions drive the design strategies and tooling outlined in this report. The controls we've developed are built specifically to address this elevated thread model. Many of the tools are ready to use today, some are reference designs, while other tooling requires further development. If you care about these issues and want to help us push this work forward, [talk to us](https://distrust.co/contact.html).
|
||
|
||
### Summary
|
||
|
||
This report identifies critical single points of failure—cases where trust is placed in a single individual or computer—creating opportunities for compromise. In contrast, blockchains offer stronger security properties through cryptography and decentralized trust models.
|
||
|
||
Traditional infrustructure has historically lacked mechanisms to distribute trust, but this limitation can be addressed. By applying targeted design strategies, it's possible to distribute trust across systems and reduce the risks of a single compromised actor undermining the integrity of the entire system.
|
||
|
||
|
||
---
|
||
|
||
## Root Cause Analysis and Mitigating Controls
|
||
|
||
In our opinion, the primary causes of this incident stem from two key issues identified in the [Sygnia report](https://www.sygnia.co/blog/sygnia-investigation-bybit-hack/):
|
||
|
||
* > ... a developer’s Mac OS workstation was compromised, likely through social engineering.
|
||
|
||
* > ... the modification of JavaScript resources directly on the S3 bucket serving the domain app.safe[.]global.
|
||
|
||
These findings highlight both endpoint compromise and weak controls around cloud infrustructure. The following sections focus on how such risks could be mitigated through architectural decisions and more rigorous threat modeling.
|
||
|
||
## Introduction
|
||
|
||
The compromise occured due to several key factors, already documented in other reports. This report focuses on how the incident **could have been prevented** through a stronger, first-principles approach to infrustructure design.
|
||
|
||
While many security teams reach for quick wins—like access token rotation, stricter IAM policies, or improved monitoring—these are often reactive measures. They may help, but they're equivalent of "plugging holes on a sinking ship" rather than rebuilding the hull from stronger material.
|
||
|
||
For example, improving access control to the S3 bucket used to serve JavaScript resources, or adding better monitoring, are good steps. But they rely on trust placed in individuals or cloud platforms, which remain vulnerable to compromise.
|
||
|
||
> At the core of this breach lies a recurring theme: single points of failure.
|
||
|
||
To explore this from first principles, consider the deployment pipeline. In most companies, one individual—an admin or developer—often has the ability to modify critical infrustructure or code. That person becomes a single point of failure.
|
||
|
||
Even if the pipeline is hardened, the risk shifts, not disappears. They's always one super-admin who has full access. Most clould platforms encourage this pattern, and the industry has come to accept it.
|
||
|
||
But this isn't about distrusting your team—it's about designing systems where **trust is distributed**. In the blockchain space, this is already accepted practice. So the question becomes:
|
||
|
||
> *Does it make sense for a single individual to hold the integrity of an entire system in their hands?*
|
||
|
||
Those who've worked with decentralized systems would say: absolutely not.
|
||
|
||
|
||
#### Mitigation Principles
|
||
|
||
To adequately defend against the risks outlined in the Distrust threat model, it is critical to distinguish between **cold** and **hot** wallets. The following princpiples are drawn from practical experience building secure systems at BitGo, Unit410, and Turnkey, as well as from diligence work conduced across leading custodial and vaulting solutions.
|
||
|
||
* A **cold cryptographic key management system** is one where all components can be built, operated, and verified offline. If any part of the system requires trusting a networked component, it becomes **hot** system by definition. For example, if a wallet relies on internet-connected components, it should be considered hot wallet—regardless of how it's marketed. While some systems make trade-offs for user experience, these often come at the cost of real security guarantees.
|
||
|
||
* Cold cryptographic key management systems that leverage truly random entropy sources are **not susceptible to remote attacks**, and are only exposed to localized threats such as physical access or side-channel attacks.
|
||
|
||
* A common misconception is that simply keeping a key offline makes a system cold and secure. But an attacker doesn't always need to steal the key—they just need to achieve the outcome where the key performs an an operation on the desired data on their behalf.
|
||
|
||
* **All software in the stack must be open source**, built deterministically (to support reproduction), and compiled using a fully bootstrapped toolchain. Otherwise, the system remains exposed to single points of failure, especially via supply chain compromise.
|
||
|
||
#### Mitigations and Reference Designs
|
||
|
||
We propose two high-level design strategies that can eliminate the types of vulnerabilities exploited in the Safe{Wallet}/Bybit attack. Both approaches offer similar levels of security assurance—but differ significantly in implementation complexity and effort.
|
||
|
||
In our view, **when billions of dollars are at stake**, it is worth investing in proven low-level mitigations, even if they are operationally harder to deploy. The accounting is simple: **invest in securing your system up front**, rather than gambling on assumptions you won't be targeted.
|
||
|
||
State funded actors are highly motivated—and when digital assets are involved, it's game theory at work. The cost of compromising a weak system is often far less than the potential gain.
|
||
|
||
We've seen this playbook used in previous incidents, including Axie Infinity, and we will see it again. Attackers are increasingly exploiting both human and technical single points of failure—while defenders often uner-invest in securing this surface area.
|
||
|
||
#### Strategy 1 - Run Everything Locally
|
||
|
||
This strategy can be implemented without major adjustments to the existing system. The goal is to move the component currently introducing risk—effectively making the wallet "hot"—into an offline component, upgrading the system to a fully cold solution.
|
||
|
||
The idea centers on extracting the **signing** component from the application (which currently operates in the UI) and converting it into an offline application. A practical example of this approach would be using a tool like **Electrum**.
|
||
|
||
However, simply making a component offline does not eliminate all single points of failure. The security requires that the individual builds the application themselves from source, using a fully bootstrapped compiler and a **deterministic build process**.
|
||
|
||
We've developed open-source tooling for this under **[StageX](https://codeberg.org/stagex/stagex)**. To learn more about the importance of reproducible builds, check out [this video](https://antonlivaja.com/videos/2024-incyber-stagex-talk.mp4), where one of our co-founders explains how the SolarWinds incident unfolded—and how it could have been prevented.
|
||
|
||
##### Reference Design
|
||
|
||
This reference design focuses on the Safe{Wallet} team, but applies to any team trying to build an offline component which has minimized single points of failure.
|
||
|
||
1. All system administrators are provided with dedicated offline laptops
|
||
|
||
* Radio cards are removed (bluetooth, wifi)
|
||
|
||
* Machine that has never been connected to the internet
|
||
|
||
2. All engineers provision and distribute their own personal signing keys (PGP)
|
||
|
||
* Use smart cards such as NitroKey or YubiKey
|
||
|
||
* Only do signing operations with these keys on the personal offline system
|
||
|
||
* Distrust has created open source tooling that simplifies secure provisioning: [Trove](https://trove.distrust.co/generated-documents/all-levels/pgp-key-provisioning.html)
|
||
|
||
3. An offline signing application is deterministically compiled, verified and signed by multiple engineers
|
||
|
||
* Includes all necessary tools to carry out offline key operations
|
||
|
||
* Distrust also developed [AirgapOS](https://git.distrust.co/public/airgap) which is custom Linux OS that is meant for managing secret material offline. It has been audited by a third party and is being used in production by several major digital asset companies.
|
||
|
||
4. All sensitive operations are fully verified offline before any cryptographic operations take place
|
||
|
||
#### Strategy 2 - Use Remotely Verified Service
|
||
|
||
This strategy re-establishes nearly identical user experience as present albeit with significantly more engineering effort to add verifiability at key points of the system. This strategy requires much more engineering effort and the tooling to execute on this design easily is not yet fully built (but we are working on it).
|
||
|
||
##### Reference Design
|
||
|
||
This design focuses on leveraging secure encalves to create servers which are immutable, deterministic and can cryptographically attest to the software they are running. While this design gets close to the fully cold design from the previous step, it will always inevitably remain exposed to attack surface area of browsers, such as via 0-day exploits, extensions in the browser, host operating system compromise etc.
|
||
|
||
1. Rewrite application to run in secure enclave
|
||
|
||
* TLS termination inside of the enclave
|
||
|
||
* Web interface served from inside of enclave
|
||
|
||
* Nothing outside of the enclave is trusted
|
||
|
||
2. Create deterministic OS image with remote attestation (TPM2, Nitro Enclave or similar)
|
||
|
||
* The whole stack is built using full source bootstrapped compiler and in a reproducible manner
|
||
|
||
3. One engineer deploys a new enclave with new code
|
||
|
||
4. Different engineer proves remote code matches reviewed code in vcs repository
|
||
|
||
5. Clients are issued a service worker on first load that pins keys allowing remote attestation verification on all subsequent loads
|
||
|
||
* User has option to verify and download application locally for full offline operations
|
||
|
||
* User is also encouraged to build themselves and match published signed hash
|
||
|
||
Implementing these strategies can be challenging, and this is a high level overview of the type of problems we work on. Depending on the chosen approach and context implementing these strategies can take anywhere from a few weeks to a few years depending on available resources.
|
||
|
||
## Summary
|
||
|
||
## About Distrust
|
||
|
||
The Distrust team has helped build and secure some of the highest risks systems in the world such as the vaulting systems at BitGo, Unit410, and Turnkey as well as helping electrical grid operators, industrial control system operators and others secure their mission critical systems. Distrust has also conducted security due dilligence probes on most major custodians. Through working with companies that are exposed to the most sophisticated known attackers where all attacks are viable, Distrust developed a methodology and open source tooling to help mitigate this level of threat. We are now using our hard learned lessons to help everyone improve their security posture by sharing what we learnined and creating open source tooling everyone can benefit from.
|
||
|