docs/disaster-recovery.md

398 lines
14 KiB
Markdown
Raw Permalink Normal View History

2023-08-04 21:59:58 +00:00
# Disaster Recovery
## Overview
This document outlines the creation of a "Disaster Recovery" (DR) system which
functions as a one-way box that we can encrypt secrets to at any time, but only
recover them with cooperation of a quorum of people with access to multiple
offline HSM devices stored in a diversity of physical locations.
In short, it should be trivial to backup data, but very expensive to recover;
recovery is difficult to do for an attacker, accepted to be time-consuming,
logistically complicated and financially costly.
Data is backed up by encrypting plaintext to a [Disaster Recovery Key](#dr-key).
The resulting ciphertext is then stored in the
[Disaster Recovery Warehouse](#dr-warehouse). In the case of a disaster,
ciphertexts can be gathered from the DR Warehouse and then decrypted using the
DR Key to regain access to the plaintext.
## Threat Model
Each of the below can always be independently tolerated if only one of them
happens and the disaster recovery goes to plan. However if more than one of
the below happen, it is possible for the DR key to either be lost or stolen.
- N-M operators lose control of their Operator Key Smartcards
- Where an M-of-N threshold is 3-of-5, then a loss of two is tolerated
- An entire cloud provider or account becoming unavailable
- An offline backup location is fully destroyed
- An adversary gains control of any single DR participant
- An adversary gains control of a single offline backup location
- An adversary has malware that can control any internet connected computer
## Components
```mermaid
flowchart TD
shard1-enc(Shard 1 <br>Encrypted)
secrets-enc(Encrypted Secrets)
loc1-prvkey-sc(Location 1<br>Key Smartcard)
loc1-prvkey-enc-pass(Location 1<br>Key Passphrase<br> Encrypted)
op1-sc(Operator 1<br> Key Smartcard)
seckey(DR Private Key)
secrets-dec --> pubkey
pubkey --> secrets-enc
pubkey(DR Public Key)
ssss(Shamir's Secret<br> Sharing System)
ssss --> seckey
secrets-enc --> seckey
seckey --> secrets-dec(Plaintext Secrets)
op1-sc --> loc1-prvkey-enc-pass
loc1-prvkey-enc-pass --> loc1-prvkey-sc
shard1-enc --> loc1-prvkey-sc
loc1-prvkey-sc --> shard1
shard1(Shard 1) --> ssss
shard2(Shard 2) -.-> ssss
shard3(Shard 3) -.-> ssss
style shard2 stroke:#222,stroke-width:1px,color:#555,stroke-dasharray: 5 5
style shard3 stroke:#222,stroke-width:1px,color:#555,stroke-dasharray: 5 5
```
Note: we assume all PGP encryption subkeys use Curve 25519 + AES 256.
### DR Key
- PGP asymmetric key pair all secrets are directly encrypted to
- We chose the PGP standard because:
- It is a widely supported with a plurality of implementations and tooling
- The PGP standard and tooling is assumed to outlive any custom made tools
- Should be more reliable than any crypto implementation we maintain
- Possible more than one DR key could exist in the future for specialized uses
### DR Key Shards
- A Shamirs Secret Share of the private portion of the DR Key
- Encrypted to a respective Location Key
- Stored solely in geographically separate Locations.
### Location
- DR Key Shards and Location Keys are distributed to separate Locations
- The Locations are geographically separated
- The Locations have a fixed human access list
- Those with access can however cooperate to transfer access to others
- Each Location has staff and physical controls that strictly enforce access
### Location Keys
- A PGP Keypair whose private key is stored only in a given Location
- Shards are encrypted to a Location public key
- The Location Private Key is used to decrypt shards
- We are confident only one of each Location private key bundle exists
- Keys are generated on an airgapped system with witnesses
- Airgapped system is proven to run code all parties agree to
- Each Location private key is replicated on three mediums:
1. Yubikey 5 (primary)
2. Encrypted on paper (backup)
3. Encrypted on a SD card (backup)
- All mediums are decrypted/unlocked with the same 256bit entropy password
### DR Courier
- A human who is on the access list for a Location
- Capable of retrieving a set of shard and its Location Keys.
- We expect a Shard and its Location Key are only accessible by one DR Courier
- May be distinct from Operator, but not strictly necessary
- Must be highly trusted, but does not have to be technically skilled
### Operator
- A human who is capable of decrypting data with a Location Key
- They use their Operator Key to decrypt the password of a Location Key
### Operator Key
- Each Operator has their own Operator Key on a smartcard
- Operator Key smartcards have a simple, known authorization PIN
- Security relies on geographic separation of the Operator Keys and Shard
- Can decrypt the pin or backup password for a specific DR Location Key
### DR Warehouse
- Online storage for encrypted data replicated across multiple providers
- All data in DR Warehouse can only be decrypted by the DR Key
- Tolerate loss of any single provider by duplicating data to all of them
- Storage backends can be any combination of the following:
- S3 Compatible object stores:
- AWS, Google Cloud, DigitalOcean, Azure, etc.
- Version control systems:
- Github, AWS CodeCommit, Gitea
- We tolerate a loss of all but one DR storage backend
- A minimum of three storage backends should be maintained
### DR Ceremony System
- Laptop trusted to be untampered by all ceremony participants
- Could be a brand new laptop, or one tightly held under multi-party custody
- Has firmware capable of attesting the hash of an OS ISO we wish to use
- Allows all parties to form trust no single party tampered with the OS
### DR Ceremony OS
- Linux operating system
- Basic shell tools
- bash, sed, grep, awk, etc.
- Cryptography tools
- gpg, openssl, etc
- A Shamirs Secret Sharing tool
- ssss, horcrux, or similar
- Built by multiple people who confirmed identical hashes
- AirgapOS is suggested:
- <https://github.com/distrust-foundation/airgapos>
### DR Generation Script
**Routine Inputs:**
- Desired m-of-n threshold for Shamir Secret Sharing split of Quorum key
- N \* 2 unprovisioned yubikeys
- N \* 2 + 1 SD cards
**Subroutines:**
- DR Key generation:
1. Generate a PGP key with an encryption subkey and signing subkey.
- DR Operator Key generation:
1. For each Operator
1. Set a simple, well documented PIN for a Yubikey
2. Provision a PGP key with an encryption subkey directly onto the Yubikey
3. Sign the public key cert of the generated key with the DR Key
- Location Key generation:
1. Generate a PGP key with an encryption subkey using multiple entropy sources
2. Generate a random, 43 char password
3. Encrypt PGP secret key to the generated password
4. Encrypt password to the Operator Keys associated with this Location
5. Export the encrypted PGP secret key to paper and an SD Card
6. Provision a yubikey with the generated password as the PIN
7. Upload the Location Key onto the yubikey
8. Sign Location Public Key with DR Key
- Shards Generation:
1. Split DR Key secret with Shamir Secret Sharing
- M is the reconstruction threshold
- N is the total number of shards
2. Encrypt each shard to its assigned Location Key
- Generate DR Key Verification Challenge
1. Encrypt a random string to the DR Key
2. Decrypt random string ciphertext with DR Key and verify
**Routine Outputs:**
- N Operator Key Yubikeys
- N SD Cards ("Operator Cards ") each containing a plaintext Operator Key
- N pieces of paper each containing a plaintext Operator Key
- N Location Key Yubikeys
- N SD Cards ("Shard Cards") Containing:
- Location Key backup (encrypted to password)
- Encrypted password (for Location Key)
- Shard encrypted to Location Key
- N pieces of paper with a Location Key backup (encrypted to password)
- SD card ("Public Card") containing:
- DR Key public key
- All DR Operator Key public keys and DR Key signatures over them
- All DR Location Key public keys and DR Key signatures over them
- DR Key verification challenge
### DR Reconstitution Script
**Routine Inputs:**
- All Shards
- m-of-n encrypted Location Key PINs
- m-of-n Location Key Yubikeys
- m-of-n Operator Yubikeys
**Routine:**
1. For m operators:
1. prompt for Operator to insert Operator Key Yubikey
2. prompt for Operator to insert Location Key Yubikey
3. Operator key is used to decrypt password
4. Decrypted password is used to authorize Location Key Yubikey
5. Location Key Yubikey is used to decrypt Shard
6. Decrypted shard is persisted in memory with other decrypted shards
7. Decrypted shards are assembled with Shamir Secret Sharing tool, outputting
DR Key
**Routine Output:**
- Plaintext DR Key
## DR Generation Ceremony
```mermaid
flowchart LR
subgraph storage[Online Storage]
loc-pubkey(Location<br> Public Keys)
op-pubkey(Operator<br> Public keys)
pubkey(DR Public Key)
shard-enc(Encrypted<br> Shards 1-N)
end
subgraph generation[Generation]
seckey(DR Private Key)
ssss[Shamir's Secret<br> Sharing System]
seckey --> ssss
ssss --> shard(Shards 1-N)
end
subgraph location[Location 1-N]
loc-prvkey-sc(Location Key<br> Smartcard)
loc-prvkey-enc-paper(Location Key<br> Encrypted<br> Paper Backup)
loc-prvkey-enc-pass(Location Key<br> Encrypted<br> Passphrase)
end
subgraph operator[Operator 1-N]
op-sc(Operator Key <br> Smartcard)
op-sd(Operator Key<br> Plaintext SD)
op-ppr(Operator Key<br> Plaintext Paper)
end
generation --> operator
generation --> storage
generation --> location
```
**Requirements:**
- N Operators
- 1 Ceremony Laptop with Coreboot/Heads firmware
- 1 flash drive containing:
- Multi-party signed iso of AirgapOS
- DR Generation Script
- Shamirs Secret Sharing Binary
- N new Operator Yubikeys
- N new Location Key Yubikeys
- N\*2+1 new SD Cards
- N\*3 safety deposit bags
- 1 bottle of glitter nail polish
- 1 phone/camera of any kind
- Prefer offline device that can save images to SD card
- 1 Disposable inkjet printer with ink and paper
- Should have no wifi, or have wifi physically removed
**Steps:**
1. Boot Ceremony Laptop to Heads firmware menu
2. Drop to a debug shell
3. Insert and mount AirgapOS flash drive
4. Generate hashes of the AirgapOS flash drive contents
5. Multiple parties verify hashes are expected
6. Boot ISO file
7. Run _DR Generation Script_
8. Reboot system
9. Run _DR Reconstitution Ceremony_
10. Decrypt _DR Key Verification Challenge_
11. Shut down system
12. Print contents of _Shard Cards_ from generation script
13. Print contents of _Operator Cards_ from generation script
14. Seal Location Key artifacts
* Open new safety deposit box bag
* Insert printed backups of "Shard Cards"
* Insert respective Location Key Smartcard
* Cover seals of closed safety deposit bags in glitter nail polish
* Take and backup closeup pictures of all safety deposit bag seals
15. Seal _Operator Card_ Artifacts into "Inner bag"
* Open new safety deposit box bag
* Insert printed backup of _Operator Card_
* Insert _Operator Card_
* Cover seals of closed safety deposit bags in glitter nail polish
16. Seal _Operator Smartcard_
* Open new safety deposit box bag
* Insert _Inner bag_
* Insert _Operator Smartcard_
* Cover seals of closed safety deposit bags in glitter nail polish
17. Take closeup pictures of all safety deposit bag seals
18. Commit relevant artifacts to GitHub
- From _Public Card_:
- Public DR Key
- Public DR Shard Encryption Keys + signatures from the DR Key
- Public DR Operator Keys + signatures from the DR Key
- Images of the sealed bags
- Signatures over all of this content by multiple witness Personal PGP keys
19. Hand off sealed bags with shard material to DR Couriers.
20. Hand off sealed bags with Operator Key material to respective Operators.
## DR Reconstitution Ceremony
```mermaid
flowchart LR
subgraph storage[Online Storage]
shard-enc(Encrypted<br> Shards 1-N)
secrets-enc(Encrypted Secrets)
end
storage --> recovery
subgraph location1[Location 1]
loc1-prvkey-sc(Location 1<br>Key<br> Smartcard)
loc1-prvkey-enc-pass(Location 1<br>Key<br> Encrypted<br> Passphrase)
end
subgraph location2[Location 2]
loc2-prvkey-sc(Location 2<br>Private Key<br> Smartcard)
loc2-prvkey-enc-pass(Location 2<br>Key<br> Encrypted<br> Passphrase)
end
subgraph locationN[Location N]
locN-prvkey-sc(Location N<br>Key<br> Smartcard)
locN-prvkey-enc-pass(Location N<br>Key<br> Encrypted<br> Passphrase)
end
subgraph recovery[Recovery]
seckey(DR Private Key)
ssss[Shamir's Secret<br> Sharing System]
ssss --> seckey
seckey --> secrets-dec(Decrypted Secrets)
shard1(Shard 1) --> ssss
shard2(Shard 2) --> ssss
shardN(Shard N) --> ssss
end
location1 --> recovery
location2 --> recovery
locationN --> recovery
op1-sc(Operator 1<br> Smartcard) --> recovery
op2-sc(Operator 2<br> Smartcard) --> recovery
opN-sc(Operator N<br> Smartcard) --> recovery
```
**Requirements:**
- DR Key Public Key
- M-of-N Operators
- M-of-N Operator Key Yubikeys
- M-of-N Location Keys
- M-of-N Encrypted Location Key pins
- M-of-N Shards
**Steps:**
1. Boot Ceremony Laptop to Heads firmware menu
2. Drop to a debug shell
3. Insert and mount AirgapOS flash drive
4. Generate hash of AirgapOS ISO file and DR Reconstitution Script
5. Multiple parties verify hashes are expected
6. Boot ISO file
7. Run DR Reconstitution Script