docs/quorum-key-management/src/threat-model.md

# Threat Model

QKM is designed according to a high-assurance threat model which ers on the
side of making exaggerated, rather than conservative assumptions in order to
build a resilient system.

The assumption is made that attackers who target QKM are extremely
sophisticated, well funded and patient attackers, and as such, the full arsenal
of attacks is on the table. This means that the attacker can purchase and
weaponize multiple 0day vulnerabilities, execute physical attacks or deploy
moles, target different supply chains of software, firmware and hardware used,
and generally attack the system using an array of known and unknown attacks.

One of the guiding principles in the design is the elimination of Single Points
of Failure (SPOFs), and relies on a number of different control mechanisms which
help reduce risk of any one individual being able to compromise the system,
whether it's maintainers of software used in the system, the firmware that's
used, or the individuals or locations that hold secret material which is the
backbone of the system.

To achieve this, the QKM focuses on reducing the risk by:

* Only using fully open source software and firmware to allow full verification
of their security properties

* Creating custom purpose specific tooling which eliminates dependencies in
order to reduce supply chain attacks, and adds desirable security properties

* Using a fully bootstrapped and deterministically built compiler for building
all software that's used

* Building all of the software and firmware deterministically

* Using computers which either have a hard switch for disabling networking or
which had radio networking cards (bluetooth, wifi etc.) removed

* Leveraging smart cards (personal HSMs) to protect cryptographic material

* Leveraging sharding in order to physically separate cryptographic material

* Leveraging tamper evident controls for components related to the system

* Leveraging frequency blocking methods such as TEMPEST (Telecommunications Electronics Materials Protected from Emanating Spurious Transmissions) and soundproofing

## General Threat Model Assumptions

Some additional assumptions are made to help contextualize the threat model:

* All screens are visible to an adversary

* All keyboards are logging to an adversary

* Any firmware/boot-loaders not verified on every boot are compromised

* Any host OS with network access is compromised

* Any guest OS used for any purpose other than prod access is compromised

* At least one member of the Production Team is always compromised

* At least one maintainer of third party used in the system is compromised

* Physical attacks are viable and likely

* Side-channel attacks are viable and likely

## Threat Model Levels

Different threat model levels allow an organization to start benefiting from the security properties of the QKM system immediately, with a clear path to upgrading over time as resources and time become available.

Each subsequent level assumes all threats and mitigations from the previous level, and introduces more sophisticated attacks and mitigations. As such, the levels should for the most part be adhered to one at a time, to ensure comprehensive defenses for all viable threats enumerated herein.

## Level 1

### Threat Model

#### Adversary
Low skilled individual targeting many organizations. This implies the adversary is not highly focused on compromising a specific organization, and relies on less sophisticated strategies.

#### Attacks

* Using phishing to steal data from a random set of custodian end users

* Injecting malware into the system of a random set of custodian end users

#### Requirements

* MUST require hardware anchored login for large withdrawals

* MUST require hardware anchored signature for large withdrawal requests

* MUST verify withdrawal requests according to a threshold based policy

#### Reference Design

* Ensure all users withdrawing large sums over a short period of time are using FIDO2 or PGP capable smart cards for logging in and authorizing transactions:

    * Hardware based WebAuthN/Passkey/U2F

        * Android 7.0+, iOS 14+, MacOS 10.15+, Win10 1809+, ChromeOS, Yubikey 5, Nitrokey, Ledger, Trezor

    * Consider software-based WebAuthN/Passkey/U2F as backup

* Ensure backend systems will only approve large withdrawals if signed by known smart card.

* Ensure all transaction approval keys are stored in a tamper evident append only database.

    * To achieve this storage systems such as AmazonQLDB, git, Datomic etc. can be used

* Ensure all key additions are authenticated with a quorum of existing keys

* Consider allowing quorum of support engineer keys to enroll a new key to handle lost keys

* Use hash of transaction signing request as challenge to be signed by smart-card

* Blockchain signature only issued after verification a given request is signed by authorized user smart-card(s)

## Level 2

### Threat Model

#### Adversary

Adversary is a skilled and resourceful individual targeting one organization. This type of attacker uses a combination of widely used cyber weapons, OSINT, social engineering (spear phishing), exploiting vulnerabilities, MitM attacks.

#### Attacks

* Compromise one team member with privileged access

* Inject code into any OSS library

* Exploit any vulnerability within 24h of public knowledge

### Requirements

* All production access:

  * MUST NOT be possible by any single engineer

    * Consider a bastion that can enforce m-of-n access over ssh

    * Consider hardened deployment pipeline which requires m-of-n cryptographic signatures to perform action

  * MUST be via dedicated tamper evident workstation

    * Consider: https://github.com/hashbang/book/blob/master/content/docs/security/Production_Engineering.md

  * MUST be anchored to keys in dedicated HSMs held by each administrator

    * Consider OpenPGP or PKSC#11 smart cards that support touch-approval for ssh

* Any code in the transaction signing trust supply chain:

  * MUST build deterministically

  * MUST have extensive and frequent review

  * MUST be signed in version control systems by well known author keys

  * MUST be signed by separate subject matter expert after security review

    * MUST hash-pin third party code at known reviewed versions

  * MUST be at version with all known related security patches

  * SHOULD be latest versions if security disclosures lag behind releases otherwise N-2

  * MUST be built and signed (and hashes compared) by multiple parties with no management overlay

    * Example: One build by IT, another by Infrastructure team managed CI/CD

  * MUST be signed by well known keys signed by a common CA

    * Example: OpenPGP smart cards signed under OpenPGP-CA.

  * All private keys involved:

    * MUST NOT ever come in contact with network accessible memory

  * All execution environments MUST be able to attest what binary they run

    * Examples:

      * Custom Secure Boot verifies minimum signatures against CA

      * Cloud enclave that can remotely attest it uses a multi-signed image

        * TPM2, AWS Nitro Enclave, Google Shielded VMs etc.

      * App phone stores already anchor to developer held signing keys

### Reference Design

* Create offline CA key(s)

* Consider OpenGPG key generated on airgap using keyfork, backed up, and copies transmitted to a smart cards such as a Yubikey

* CA key smart cards are stored in dual-access tamper evident locations

#### User Key Management System

* Enclave is created which is immutable with no ingress internet access

* Enclave has random ephemeral key

* Remotely attested on boot-up against multi-signed and known deterministically built system image

  * Possible on many PCR based measured boot solutions based on TPM2 and Heads, AWS Nitro Enclaves, or GCP Shielded VMs

* Ephemeral enclave key is signed with offline CA key(s) on verification.

* Enclave has ability to validate append only database of keys

* Enclave will sign new key additions/removals with ephemeral key if:

  * User has no prior keys

  * Key was signed with an existing key

  * Key was signed with 2+ known support engineer keys

#### Signing Key Generation

* M-of-N key holder quorum is selected

  * SHOULD be on different teams

  * SHOULD live in different geographical zones to mitigate natural disaster, and war related risks

  * SHOULD have their own OpenPGP smart card with pin and keys only they control

* Shard keys

  * SHOULD be an additional OpenPGP smart card separate from holder's personal key

  * SHOULD have random PIN, encrypted to a backup shard holder

  * SHOULD be stored in a neutral location only the primary and backup shard holder can access

* Done in person on air-gapped laptop that has been in [dual witnessed custody](hardware-procurement-and-chain-of-custody.md) since procurement

  * Has hardware anchor that can make all parties confident the OS image it is running is expected (Heads, etc)

  * Has two hardware sources of entropy

    * There are devices that can provide an additional source of entropy such as:

      * Computer with another architecture such as RISC-V

      * HSM which can export entropy

      * Quantis QRNG USB

      * TrueRNG

  * Runs known deterministic and immutable OS image compiled by multiple parties

* Key is generated and stored

  * Split to m-of-n Shamir's Secret Sharing shards

    * Each shard is encrypted to dedicated shard OpenPGP smart card

    * Shard smart card PIN is generated randomly

    * Shard smart card PIN is encrypted to personal smart cards of primary and backup holders

#### Signing System

* Uses an enclave which is immutable with no ingress internet access

* Has enclave bound ephemeral key

* Remotely attested on boot-up against multi-signed and known deterministically built system image

* Will accept Shamir's Secret Sharing shards encrypted to enclave bound ephemeral key

* Will restore signing key to memory when sufficient shards are submitted

* Will only sign transactions if accompanied by signed request by authorized user according to a quorum specified by a policy

  * Is able to validate signing request via CA key authorized user key management enclave signature

* Will only sign transactions that meet predefined size and rate limits by company policy and insurance levels

## Level 3

### Threat Model

#### Adversary
Adversary is an organized group with significant funding. These groups consist of individuals with different skill sets and often have access to significant funds, drastically expanding their attack capabilities.

#### Attacks

* Compromise one data center engineer into tampering with a target system

* Use a sophisticated 0 day vulnerability to compromise any one internet connected system

### Requirements

* MUST sign all transactions of significant value by multiple keys in separate geographical locations

 * Consider well vetted open source multi signature, MPC or on-chain threshold signing software

 * MUST use locations separated by hours of travel

 * MUST have independent staff for separate locations

 * Signing locations MUST NOT trust other locations

   * Each location MUST do their own reproducible build validation

   * Each location MUST do their own verifications on all large transactions

## Level 4

### Threat Model

#### Adversary

Adversary is a state actor. State actors are the best funded and most sophisticated attackers. They are the highest known threat and have the ability to execute all known attacks. Their well funded operations allow them to pursue goals over long periods of time, relying on subversion, false flags, insider threats via planting moles, compromise of hardware supply and software supply chains, the use of advanced non-commercially available cyber-warfare tools, combining many 0day vulnerabilities to construct highly effective exploit chain. This level of adversary demands the highest known standards of security, which is typically upheld only by the most sophisticated companies and the military.

#### Attacks

* Tamper with the supply chain of any single hardware/firmware component

* Quickly and covertly relocate any device to a lab environment, complete attacks within a short time period, and return the device to its original location

* Use sophisticated [side channel attacks](side-channel-attacks.md) for exfiltrating data, cryptographic material being a high risk target

* Non-deterministic encryption/signatures/data

* Differential Fault Analysis (DFA)

* Data remanence

### Requirements

* All signing systems:

  * MUST have dual implementations of all policy enforcement and signing logic

  * MUST use two or more unrelated hardware supply chains for generating cryptographic material

    * Example: Rust on RISC-V Linux on an FPGA vs C on PPC Gemalto enclave

  * MUST return deterministic results

    * Results are only exported for chain broadcast if identical

  * MUST be stored in near zero emissions vaults a single user can't open

    * See: NSA TEMPEST

  * MUST ensure that individuals are scanned for devices before entering the vault

  * MUST only communicate with outside world via fiber optic serial terminal
    - [ ] TODO do we even want this in the facility?

  * MUST be housed in Class III bank vault or better

  * MUST have constant environment deviation monitoring

    * Thermal, Acoustic, Air quality, Optical

  * MUST destroy key material on significant environment deviations

    * TODO: methods for doing this

  * MUST be accessible physically with cooperative physical access

    * MAY use FF-L-2740B or better locks with dual pin enforcement

    * MAY use dual biometric enforcement to get near area and disarm security

## Additional Threat Model Notes

### Smart Cards

The Operator Smart Card uses the default PIN because it is meant to be something
a user "has", rather than "knows". On the other hand, the Location Smart Card
is protected by a complex PIN, which can only be decrypted using the PGP keys
stored on the Operator Smart Card. This is done in order to protect the access
to the Location key by anyone except the Operator, but also to allow for adding
controls which require more than one individual to access a Location Smart Card.
In this way, there is an additional "quorum" which needs to be achieved to
access the Location key - more on this in the [Location](locations.md) section.

The Smart Cards are used as they are an HSM (Hardware Security Module) which
provides excellent protection for the cryptographic material stored on it, and
they are portable, which makes them suitable for creating systems where the
cards are in separate physical locations, and need to be brought together in
order to re-assemble secret material.