Add blog posts no.4, minor FAQ update

This commit is contained in:
Christian Reitter 2024-01-04 15:04:08 +01:00
parent e1d7b23fd5
commit 947d22a5a3
3 changed files with 130 additions and 1 deletions

View File

@ -30,5 +30,5 @@ layout: default
</div>
{%- endif -%}
<br/>
</div>

View File

@ -0,0 +1,127 @@
---
layout: post
title: "Update #4 - Affected Software: bip3x Library"
author: ["Christian Reitter"]
date: 2024-01-04 13:00:00 +0000
---
We take a deepdive into the `bip3x` library's use of pseudo random number generators (PRNG) and related problems.
<div id="toc-container" markdown="1">
<h2 class="no_toc">Table of Contents</h2>
* placeholder
{:toc}
</div>
<br/>
## Introduction
When we discovered the "original" Milk Sad vulnerability pattern in `libbitcoin`, we naturally asked ourselves: which other wallet-related software has similar flaws in the random number generation path? Specifically, is there other software that consumes the MT19937-32 Mersenne Twister algorithm output, which could overlap with `libbitcoin` or Trust Wallet ranges of weak wallets? In this post, we'll look at one of those that came up after the original disclosure.
## bip3x Weaknesses
In September 2023, [Itamar Carvalho](https://github.com/itamarcps) reached out to us to make us aware of [publicly reported issues](https://github.com/edwardstock/bip3x/issues/6) in a BIP39 library called [bip3x](https://github.com/edwardstock/bip3x) that are similar with the Milk Sad vulnerability. Thank you!
Since `bip3x` is not a particularly well-known library, here are some basics:
* C++ library with Java bindings
* Public versions available since 2019
* Targets multiple architectures, including Android and Windows (cross-compiled)
### Mersenne Twister Usage
As outlined in [a public ticket](https://github.com/edwardstock/bip3x/issues/6), `bip3x` uses the very insecure MT19937 Mersenne Twister seeded with the local system time when compiling for the Windows target. This is a serious problem in case of any "production" usage of these builds beyond testing, as shown by our other research.
[Code](https://github.com/edwardstock/bip3x/blob/57e7c5c2854c58048c249389b8d469385156181f/src/bip3x/bip3x_mnemonic.cpp#L54-L55):
```c
#ifdef __MINGW32__
auto duration = std::chrono::high_resolution_clock::now().time_since_epoch();
static std::mt19937 rand(std::chrono::duration_cast<std::chrono::microseconds>(duration).count()
);
#else
// [...]
```
This functionality was introduced with version `2.1.1` in [February 2021](https://github.com/edwardstock/bip3x/commit/6454224ce7988d1c989cf20cec0e77cd4894ae0b).
Starting with version `2.2.0`, the library author acknowledged that this is an issue, and
* Added a [compile-time warning](https://github.com/edwardstock/bip3x/commit/588df274e6ea8581b32f1abd55853a14587049f8#diff-1e7de1ae2d059d21e1dd75d5812d5a34b0222cef273b7c3a2af62eb747f9d20aR34)
* Added a warning to the [README.MD](https://github.com/edwardstock/bip3x/commit/588df274e6ea8581b32f1abd55853a14587049f8#diff-b335630551682c19a781afebcf4d07bf978fb1f8ac04c6bf87428ed5106870f5R81-R82)
* Added an opt-in build option to use the safe OpenSSL random number generator (`-DUSE_OPENSSL_RANDOM=On`)
While these steps certainly improve the situation, we think that defaulting to insecure algorithms is still a bad design. The existing warnings can be overlooked or not be visible to developers and end users, depending on their usage of the library, which will lead to the loss of funds via _very practical_ on-chain attacks. We therefore recommend changing the code to use safe defaults.
### PCG PRNG
When compiling the `bip3x` library for non-Windows-targets, it uses a PRNG from the [PCG family](https://www.pcg-random.org/) of RNG algorithms by default. The OpenSSL random functions are available as an opt-in alternative, as outlined previously.
As far as we know, none of the currently available PCG algorithm variants are designed to be a Cryptographically Secure Pseudo Random Number Generator (CSPRNG), which should disqualify them from any usage to generate long-lived cryptographic key material such as BIP39 seeds.
The existing `bip3x` documentation can be understood to outline this:
> IMPORTANT: using c++ (mt19937, and PCGRand not works properly too) random generator is not secure. Use -Dbip3x_USE_OPENSSL_RANDOM=On while configuring to use OpenSSL random generator.
Note the `[...] and PCGRand not works properly too [...]` part of this sentence. However, the phrasing and placement of this documentation warning under a Windows-specific compilation variant makes it hard to understand. At the very least, it's not clear if the library authors are expressing general issues with the PCG-based RNG that apply for all compile targets.
Given this context, we were curious: how vulnerable are PCG-generated seeds in `bip3x` against practical attacks? Assessing this is complicated, but here is where we got so far:
#### Background
* The PCG implementation in `bip3x` [PCGRand.hpp](https://github.com/edwardstock/bip3x/blob/57e7c5c2854c58048c249389b8d469385156181f/include/bip3x/details/PCGRand.hpp) appears to be a custom re-implementation of PCG based on [this](https://gist.github.com/Leandros/6dc334c22db135b033b57e9ee0311553) GitHub gist. The corresponding [blog post](https://arvid.io/2018/07/02/better-cxx-prng/) by Arvid Gerstmann has more context on the origin of the code.
* Based on the used constants, variables and code, we think the used PCG algorithm is the `32-bit Output, 64-bit State: PCG-XSH-RR` variant described in the [PCG paper](https://www.pcg-random.org/pdf/hmc-cs-2014-0905.pdf) under section `6.3.1`.
* Relevant functions in the original code: [pcg_setseq_64_srandom_r()](https://github.com/imneme/pcg-c/blob/83252d9c23df9c82ecb42210afed61a7b42402d7/include/pcg_variants.h#L790C13-L798), [pcg_output_xsh_rr_64_32()](https://github.com/imneme/pcg-c/blob/83252d9c23df9c82ecb42210afed61a7b42402d7/include/pcg_variants.h#L156-L159), [pcg_setseq_64_step_r()](https://github.com/imneme/pcg-c/blob/83252d9c23df9c82ecb42210afed61a7b42402d7/include/pcg_variants.h#L588-L591).
* A [public comment](https://gist.github.com/Leandros/6dc334c22db135b033b57e9ee0311553?permalink_comment_id=3726134#gistcomment-3726134) suggests that this implementation differs from the official PCG algorithms in the essential step `m_state = oldstate * 6364136223846793005ULL + m_inc;` by not setting the lowest bit in `m_inc` to one. However, given that the `bip3x` calls the function in question only after the `seed()` seeding function which permanently applies the `| 1` bitwise operation on the `m_inc` variable, we think the practical behavior is functionally identical at this step.
#### Behavior
At first glance, the `32-bit Output, 64-bit State: PCG-XSH-RR` variant has the huge problem of an internal state that is limited to only 64-bit, which would give an upper limit of **64-bit entropy** in terms of starting positions. Attacking this state would be 2^32-times harder than attacking Mersenne Twister MT19937-32 with its **32-bit** of seeding state, but still be in the realm of brute-forcing, at least in the near future or for attackers with significant resources. For context, brute-forcing 40-bit of BIP39 key entropy in a similar situation was possible [within 30 hours in 2020](https://medium.com/@johncantrell97/how-i-checked-over-1-trillion-mnemonics-in-30-hours-to-win-a-bitcoin-635fe051a752) for less than 425$ in costs.
Looking closer, the used PCG variant also has a second 63-bit state-like parameter which selects a special sub-stream that extends upon the main state.`bip3x` uses and seeds this sub-stream index with random values as well. We're unclear if this effectively increases the upper theoretical bound to 127 bits of "seeding state" in the general case, but consider it a significant enough obstacle to prevent practical brute-force attacks of the type we've shown for MT19937-32.
In the [actual implementation](https://github.com/edwardstock/bip3x/blob/57e7c5c2854c58048c249389b8d469385156181f/include/bip3x/details/PCGRand.hpp#L36-L44), the `seed()` function consumes 4x `unsigned int` from `std::random_device`, and then combines two of them each to form `uint64_t m_state` and `uint64_t m_inc`:
```c
void seed(std::random_device &rd)
{
uint64_t s0 = uint64_t(rd()) << 31 | uint64_t(rd());
uint64_t s1 = uint64_t(rd()) << 31 | uint64_t(rd());
m_state = 0;
m_inc = (s1 << 1) | 1;
(void)operator()(); // calculate next PRNG state
m_state += s0;
(void)operator()(); // calculate next PRNG state
```
_(Comments are from us)_
For some reason, the code author decided to left-shift only by 31 bits (`<< 31`) instead of 32 bits.
This leads to two problems:
* The upper bit of `s0` stays unset -> only 63 bits of entropy are used for `m_state`
* The "bitwise OR" operation overlaps at bit index 31 of `s0` and `s1`, which biases their values in one position
To show this visually:
```
0xffffffff
0000 0000 0000 0000 0000 0000 0000 0000 1111 1111 1111 1111 1111 1111 1111 1111
0xffffffff << 31
0111 1111 1111 1111 1111 1111 1111 1111 1000 0000 0000 0000 0000 0000 0000 0000
^ unnused ^ used
bitwise OR of both:
0111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111
^ always 0 ^ increased chance of 1
```
We consider this random number handling to be an implementation flaw. However, assuming the rest of the entropy source handling is good, these flaws should not be serious enough to allow practical attacks. But which entropy source is relevant here?
In the _best_ case, `std::random_device` provides perfect 32 bit of actual hardware-backed non-guessable entropy per call. Taking into account the implementation details, this could lead to something in the order of roughly 2^125 different initial configurations of the PCG PRNG. If the PRNG state and sub-streams really are as independent as intended, which we didn't investigate, this complexity is impossible to brute-force blindly.
Unfortunately, given the loose guarantees of C++ with regards to the `unsigned int` bitsize and `std::random_device` randomness, it's also possible that far less non-guessable bits make it into the PCG start configuration, for example if `std::random_device` uses an insecure PRNG on its own or if `unsigned int` has only the minimally required 16-bit width. Therefore assumptions about the entropy input may not hold on all operating systems and compilers, which makes this construction susceptible to silently breaking in problematic ways.
To summarize, we see the use of this algorithm as unsafe and unsuited _for key generation_ and recommend against its use in `bip3x`. It _may_ have enough internal complexity to withstand remote brute-force attacks on _most_ modern systems, but there are better alternatives that are designed for cryptography. Especially when generating 18-word (192-bit) and 24-word (256-bit) BIP39 secrets, this PRNG will significantly decrease overall security margins for no good reason.
Please be aware that this is just a brief analysis we did to judge potential directions for our research. It is not a formal review, and you should not rely upon it for your own security.
## Summary & Outlook
In this post, we took a look at a software which may have contributed to some of the discovered weak wallets based on Mersenne Twister outputs. In its standard configuration, it uses another PRNG algorithm that is risky but likely not weak enough for wide-scale remote attacks.
In the upcoming posts, we'll show more wallet analysis of previously vulnerable funds and discuss other affected software.
<br/>

2
faq.md
View File

@ -59,6 +59,8 @@ The identified problem is limited to the entropy generation functionality. We're
### Is the vulnerability currently fixed in `libbitcoin-explorer`?
We are not aware of a fix. At the time of disclosure, our understanding is that the Libbitcoin team considers this not to be a vulnerability. See [this section](disclosure.html#libbitcoin-vendor-response) in our disclosure.
**Update**: `libbitcoin-explorer` [3.8.0](https://github.com/libbitcoin/libbitcoin-explorer/releases/tag/v3.8.0) fixed the issue by removing the problematic entropy generation command.
---
### Would a security patch fix this problem?