Add blog posts no.2 and no.3, customize Jekyll.

Thanks to Heiko Schaefer for proofreading and edit suggestions.

Technical changes:
Extend the image handling via jekyll-responsive-image and corresponding configuration & templates.
This requires ImageMagick dependencies in the Dockerfile for the rmagick plugin.
Add custom Jekyll Liquid filters for semi-automatic Bitcoin address and transaction formatting/linking.
Add CSS details for the figure handling.
This commit is contained in:
Christian Reitter 2023-12-06 12:34:58 +01:00
parent 062152cd76
commit e1d7b23fd5
14 changed files with 364 additions and 4 deletions

View File

@ -1,6 +1,6 @@
FROM ruby:3.2-alpine AS builder
LABEL stage=distrust-co-builder
RUN apk update && apk add g++ make
RUN apk update && apk add g++ make imagemagick imagemagick-dev imagemagick-libs
RUN mkdir -p /home
COPY Gemfile /home
COPY Gemfile.lock /home

View File

@ -2,4 +2,5 @@ source "https://rubygems.org"
# gem "jekyll-theme-console", path: "./_vendor/jekyll-theme-console"
gem "jekyll"
gem "jekyll-feed"
gem "jekyll-feed"
gem "jekyll-responsive-image"

View File

@ -33,6 +33,9 @@ GEM
webrick (~> 1.7)
jekyll-feed (0.17.0)
jekyll (>= 3.7, < 5.0)
jekyll-responsive-image (1.6.0)
jekyll (>= 2.0, < 5.0)
rmagick (>= 2.0, < 5.0)
jekyll-sass-converter (3.0.0)
sass-embedded (~> 1.54)
jekyll-watch (2.2.1)
@ -54,6 +57,7 @@ GEM
rb-inotify (0.10.1)
ffi (~> 1.0)
rexml (3.2.5)
rmagick (4.3.0)
rouge (4.1.2)
safe_yaml (1.0.5)
sass-embedded (1.63.6)
@ -73,6 +77,7 @@ PLATFORMS
DEPENDENCIES
jekyll
jekyll-feed
jekyll-responsive-image
BUNDLED WITH
2.4.17

View File

@ -39,6 +39,7 @@ footer: '2023'
plugins:
- jekyll-feed
- jekyll-responsive-image
# Build settings
@ -64,3 +65,68 @@ exclude:
- "*.conf"
- "README.md"
- "LICENSE"
responsive_image:
# Path to the image template.
template: _includes/responsive-image.html
# [Optional, Default: 85]
# Quality to use when resizing images.
default_quality: 90
# [Optional, Default: []]
# An array of resize configuration objects. Each object must contain at least
# a `width` value.
# Keep in sync with minima.scss width levels, in pixel
sizes:
- width: 600 # [Required] How wide the resized image will be.
quality: 85 # [Optional] Overrides default_quality for this size.
- width: 1150
# [Optional, Default: false]
# Rotate resized images depending on their EXIF rotation attribute. Useful for
# working with JPGs directly from digital cameras and smartphones
auto_rotate: false
# [Optional, Default: false]
# Strip EXIF and other JPEG profiles.
strip: true
# [Optional, Default: assets]
# The base directory where assets are stored. This is used to determine the
# `dirname` value in `output_path_format` below.
base_path: assets/images
# [Optional, Default: assets/resized/%{filename}-%{width}x%{height}.%{extension}]
# The template used when generating filenames for resized images. Must be a
# relative path.
#
# Parameters available are:
# %{dirname} Directory of the file relative to `base_path` (assets/sub/dir/some-file.jpg => sub/dir)
# %{basename} Basename of the file (assets/some-file.jpg => some-file.jpg)
# %{filename} Basename without the extension (assets/some-file.jpg => some-file)
# %{extension} Extension of the file (assets/some-file.jpg => jpg)
# %{width} Width of the resized image
# %{height} Height of the resized image
#
output_path_format: assets/images/resized/%{width}/%{basename}
# [Optional, Default: true]
# Whether or not to save the generated assets into the source folder.
save_to_source: false
# [Optional, Default: false]
# Cache the result of {% responsive_image %} and {% responsive_image_block %}
# tags. See the "Caching" section of the README for more information.
cache: false
#/ [Optional, Default: []]
# By default, only images referenced by the responsive_image and responsive_image_block
# tags are resized. Here you can set a list of paths or path globs to resize other
# images. This is useful for resizing images which will be referenced from stylesheets.
# extra_images:
# - assets/foo/bar.png
# - assets/bgs/*.png
# - assets/avatars/*.{jpeg,jpg}

View File

@ -0,0 +1,21 @@
{% capture srcset %}
{% for i in resized %}
/{{ i.path }} {{ i.width }}w,
{% endfor %}
{% endcapture %}
{% assign smallest = resized | sort: 'width' | first %}^
{% if figure -%}
<figure role="group">
{% if target_width -%}
<img style="max-width: {{ target_width }};" src="/{{ smallest.path }}" alt="{{ alt }}" srcset="{{ srcset | strip_newlines }} /{{ path }} {{ original.width }}w" sizes="(min-width: 1150px) 1150px, (min-width: 600px) 600px" class="{{ class }}" />
{% else -%}
<img src="/{{ smallest.path }}" alt="{{ alt }}" srcset="{{ srcset | strip_newlines }} /{{ path }} {{ original.width }}w" sizes="(min-width: 1150px) 1150px, (min-width: 600px) 600px" class="{{ class }}" />
{% endif -%}
<figcaption>{% if caption %}{{ caption }}{% else %}{{ alt }}{% endif %}</figcaption>
</figure>
{% else -%}
<img src="/{{ smallest.path }}" alt="{{ alt }}" srcset="{{ srcset | strip_newlines }} /{{ path }} {{ original.width }}w" sizes = "(min-width: 1150px) 1150px, (min-width: 600px) 600px" class="{{ class }}" />
{% endif -%}

View File

@ -11,8 +11,8 @@ layout: default
{%- if page.last_modified_at -%}
{%- assign mdate = page.last_modified_at | date_to_xmlschema -%}
<time class="dt-modified" datetime="{{ mdate }}" itemprop="dateModified">Updated: {{ mdate | date: date_format }}</time>
</br>
{%- endif -%}
</br>
{%- if page.author -%}
{% for author in page.author -%}{%- assign author_count = author_count | plus: 1 -%}{% endfor %}
{%- if author_count==1 -%}

10
_plugins/custom.rb Normal file
View File

@ -0,0 +1,10 @@
module LinkBitcoin
def BtcLinkTxUrlSliced(input)
"[#{input.slice(0,8)}..#{input.slice(-8,8)}](https://mempool.space/tx/#{input})"
end
def BtcLinkAddressUrlFull(input)
"[#{input}](https://mempool.space/address/#{input})"
end
end
Liquid::Template.register_filter(LinkBitcoin)

View File

@ -1,6 +1,6 @@
---
layout: post
title: "Research Update No. 1 - New bx Data, ETH, Service Changes"
title: "Update #1 - New bx Data, ETH, Service Changes"
author: ["Christian Reitter"]
date: 2023-11-23 00:00:00 +0000
last_modified_at: 2023-12-02 16:00:00 +0000
@ -41,6 +41,7 @@ Additional notes:
* We scanned the 15 word (160 bit) and 21 word (224 bit) BIP39 seed ranges, but found them empty, at least on the common paths. While `bx mnemonic-new` can create those variants of the BIP39 standard, they are not so common. This result is therefore plausible to us.
* In the [original writeup]({% link disclosure.md %}#searching-for-wallets---implementation), we outlined that many of the `3`-prefix BIP49 wallets in the 256 bit range likely belong to the same entity. One of the earliest transactions for this seems to be [3931b570..4f501910](https://mempool.space/tx/3931b570e562a58608f9b7f7291d12a40dbe617112c721077e441d264f501910) on 2018-10-03, and many similar transfers to weak wallets in this range happen within the next few days, some with identical transaction amounts.
<a id="bx_ec_new_keys"/>
## Breaking Weak bx 3.x ec-new Private Keys
During the initial research sprint towards understanding and reproducing the `bx` PRNG weakness, we focused on code paths and usage variants that run weak `bx seed` entropy through `bx mnemonic-new` to generate BIP39 mnemonics. BIP39 is the de facto standard for interoperability with other hardware and software wallets for many years now, and was the mechanism used by the affected wallet owners who kicked off our research, which is why we saw this as the most relevant and important variant.

View File

@ -0,0 +1,173 @@
---
layout: post
title: "Update #2 - Trust Wallet Ranges, Uncompressed Pubkeys"
author: ["Christian Reitter"]
date: 2023-12-06 11:00:00 +0000
---
While researching the weak entropy generated by `bx` using the Mersenne Twister algorithm, we learned fairly quickly that the generation algorithm is only a minor code change away from re-creating the weak wallets of the `Trust Wallet` software. Naturally, we spent some time in the last months to see which weak wallets we could summon from the cryptographic realms 🔮🪄.
There is a lot to tell about new discoveries that resulted from this, so we'll start by presenting some initial statistics and descriptions about the over 2700 weak wallet private keys in these new areas.
<div id="toc-container" markdown="1">
<h2 class="no_toc">Table of Contents</h2>
* placeholder
{:toc}
</div>
## New Research: Trust Wallet-like BIP39 Range
In our original technical writeup, we mentioned the Trust Wallet vulnerability and the [surprising similarities](]({% link disclosure.md %}#not-even-the-second-hack-mersenne-twister-use-in-trust-wallet)) between the `bx seed`-generated weak keys and the Trust Wallet-generated weak keys. There is just one minor algorithmic change in the Pseudo Random Number Generator (PRNG) steps to get weak entropy for one or the other.
So we started crunching away on the numbers to find the weak Trust Wallet accounts.
Here is a basic overview of discovered wallet usage on Bitcoin:
| BIP39 entropy bit length <br/>_mnemonic length_ | 128 bit<br/>_12 words_ | 192 bit <br/>_18 words_| 256 bit <br/>_24 words_|
| -- | -- | -- | -- | -- | -- | -- |
| `m/44'/0'/0'/0/0` path, compressed pubkey, P2PKH | 215 | 1 | 22 |
| `m/44'/0'/0'/0/0` path, uncompressed pubkey, P2PKH | 0 | 0 | 0 |
| `m/49'/0'/0'/0/0` path, P2WPKH | 1969 | 0 | 12 |
| `m/84'/0'/0'/0/0` path, P2SH-P2WPKH | 412 + 1 | 1 | 2 |
| -- | -- | -- | -- | -- | -- | -- |
| -- | -- | -- | -- | -- | -- | -- |
| sum of unique wallet private keys | 2580 | 2 | 36 |
<details markdown=1>
<summary><b>Data details</b> (click to unfold)</summary>
* Wallet generation: "Trust Wallet" style MT19937-32 PRNG, PRNG -> BIP39 -> BIP32.
* 15 wallets with 128 bit mnemonics used more than one known derivation path.
* The first deposit into this range happens 2018-04-13.
</details><br/>
Overall, we found **2618** wallets in this range so far.
As far as we know, the weak PRNG implementation that Trust Wallet temporarily used for new wallets only generates _128 bit (12 word) BIP39 mnemonics_, and many of the discovered wallets predate the introduction of the flaw into the Trust Wallet software. Therefore, it seems another wallet generation software uses exactly the same flawed PRNG method!
We expected this, since Ledger Donjon remarked on this as well in [their writeup](https://blog.ledger.com/Funds-of-every-wallet-created-with-the-Trust-Wallet-browser-extension-could-have-been-stolen/) from April 2023:
> During our investigations, we also noticed that a few addresses were vulnerable while they had been generated a long time before the Trust Wallet release. That probably means this vulnerability exists in some other wallet implementations which is concerning…
If we look at just the wallets based on 12 word mnemonics between the time of the Trust Wallet vulnerability disclosure and now, the picture is as follows. An (unclear) portion of the "outgoing" funds represents the thefts:
{% responsive_image_block %}
figure: true
path: assets/images/graphs/trustwallet_style_bip39_128bit_only_monthly_volume_btc_2022_2023_graph1.png
alt: "Historic aggregated usage of known 128 bit Trust Wallet-style Bitcoin wallets, focused on 6/2022 to 9/2023"
target_width: 950px
{% endresponsive_image_block %}
That's a decent amount of volume overall: somewhere on the order of 90 BTC which plausibly belong to Trust Wallet users were moved in and out of those wallets. The movements correspond to about 2400 individual transactions (not shown).
However, if we zoom out, things get even more interesting:
{% responsive_image_block %}
figure: true
path: assets/images/graphs/trustwallet_style_bip39_128bit_only_monthly_volume_btc_2018_2023_graph1.png
alt: "Historic aggregated usage of known 128 bit Trust Wallet-style Bitcoin wallets, 2018 to 2023"
target_width: 950px
{% endresponsive_image_block %}
From late 2018 to late 2021, this wallet range was very actively used, with on the order of **975 BTC** flowing in and out, distributed over >21300 individual transactions. The Trust Wallet-related funds on the right show the scale of this prior usage, although the Bitcoin price also changed drastically over this whole time period, which affected the amounts as well.
From what we know so far, the most plausible explanation for us is that some other, unknown, wallet software was flawed, and millions of dollars in Bitcoins were stored insecurely. However, no one noticed in time to steal them before they were spent normally 😵‍💫.
Based on our current data, we suspect the most costly Bitcoin part of the Trust Wallet hack happened on 2023-01-11, when about 200 consecutive transfers moved out about **50 BTC** worth within a 30 minute window. Please consider this as an early estimate, especially considering that the corresponding dollar value would be multiple times higher than the `approximately $170000 USD` figure [given](https://community.trustwallet.com/t/browser-extension-wasm-vulnerability-postmortem/750787) by the Trust Wallet team in April 2023 as part of their post-mortem.
<details markdown=1>
<summary><b>List of suspicious withdrawal transactions</b> (click to unfold)</summary>
Selection of ten most significant Bitcoin transactions in the suspicious time frame:
| Transaction | Volume | approx. USD @ tx time | Date |
| - | - | - | - |
| {{ "a2a028d97fe533a6a8ef098e3b70630a1ae97434d48b5218c81b07a26d469fb4" | BtcLinkTxUrlSliced }} | -16,060 BTC | -$280.036 | 2023-01-11 20:23:58 |
| {{ "185aa60cce80e85513560c847413e4fa0bfa29e61ff88ab2a386bffd53d3e739" | BtcLinkTxUrlSliced }} | -8,574 BTC | -$149.502 | 2023-01-11 20:23:58 |
| {{ "23a52b1d1c05238f7e4024016e3e26e100280295afef5ea6489a0c612717279a" | BtcLinkTxUrlSliced }} | -3,439 BTC | -$59.959 | 2023-01-11 20:23:58 |
| {{ "f774e14a59e8e74bc5cf4b654952370ae3306488f9a1c536421e0178f73b6efa" | BtcLinkTxUrlSliced }} | -2,133 BTC | -$37.195 | 2023-01-11 20:23:58 |
| {{ "d9781bc91fed709959f198a09f019883fd5626420cf10b9af07c54ddf6366c41" | BtcLinkTxUrlSliced }} | -1,579 BTC | -$27.532 | 2023-01-11 20:23:58 |
| {{ "caabba1b67a0c18ac0e483d6dc5376b25f46a4018a45cca9f8af69152c90d1c8" | BtcLinkTxUrlSliced }} | -1,346 BTC | -$23.475 | 2023-01-11 20:23:58 |
| {{ "8941e9c3f840a41da7e992d981cc4c2861a8b061656a00cd3677b413d6d0b1ce" | BtcLinkTxUrlSliced }} | -1,002 BTC | -$17.473 | 2023-01-11 20:23:58 |
| {{ "104a98c192de7dae1ed43a3d88bcb00d339ea0e7fac5aaf7e5d75ac231a9f14e" | BtcLinkTxUrlSliced }} | -0,897 BTC | -$15.643 | 2023-01-11 20:05:26 |
| {{ "66cfaa0c16748b4e4e266b99bda83a44c0e4ebd201a13b888c43a9c5789b99e8" | BtcLinkTxUrlSliced }} | -0,570 BTC | -$9.947 | 2023-01-11 20:23:58 |
| {{ "464fd34bc5ceb77fa9376e0537eb00cee07a66d37390fd7fdf6b90cbb53f3cc3" | BtcLinkTxUrlSliced }} | -0,524 BTC | -$9.138 | 2023-01-11 20:23:58 |
</details>
<br/>
There's more to tell here, and we'll talk more about this range of weak wallets in a future blog post.
## New Research: Trust Wallet-like ec-new Range
After finding clear evidence of other wallet software using the same flawed "Trust Wallet"-style of consuming Mersenne Twister MT19937 entropy, we extended our search to the special BIP39-less wallet generation mode that we saw used with `bx` [previously]({% link _posts/2023-11-22-research-update-1.md %}#bx_ec_new_keys). To describe this range, we're referring to it as _Trust Wallet_-like, in terms of Mersenne Twister output usage, and _`ec-new`_-like in the basic key generation pattern. However, note that the discovered wallets may not relate to either software, and mostly pre-date Trust Wallet's temporary use of the weak PRNG mechanism for wallet generation.
Without knowing what other wallet software was involved, this is the next best naming scheme we picked.
Discovered Bitcoin wallets:
| entropy bit length | 128 bit | 192 bit | 256 bit | 2048 bit |
| -- | -- | -- | -- | -- |
| number of wallets<br/> `m/` path, compressed pubkey, P2PKH | 1 | 84 | 1 | 2 |
| number of wallets<br/> `m/` path, uncompressed pubkey, P2PKH | 0 | 8 | 0 | 0 |
In summary, we found **96** of these wallets so far.
<details markdown=1>
<summary><b>Data details</b> (click to unfold)</summary>
* Wallet generation: "Trust Wallet" style MT19937-32 PRNG, PRNG -> BIP32.
* As with the `bx` `ec-new` range, we spotted an unusual outlier that went beyond the normal 256 bit of key size. In the `ec-new` BIP32 usage mode, there is no clear standard on how large or small private keys are allowed to be, but an 2048 bit secp256k1 key is still unusual. If the wallet owners had hoped for some additional security protection, they were clearly disappointed: this wallet is based on the same weak 32 bits of PRNG seeding as the others, unfortunately.
* We did not find any wallets in the following bit length range variations: 160 bit, 224 bit, 384 bit, 512 bit, 1024 bit, 4096 bit while searching for compressed public key P2PKH wallets on the base path.
</details><br/>
Looking at the on-chain facts for these wallets (discovered so far), we can see that:
* The first deposit into this range happens 2017-04-21.
* The highest overall available aggregated balance was available from mid-2019 to late-2019 with ca. **101.25** BTC.
* Overall, an estimated **110.85 BTC** total moved through the weak wallets of this range across their history (this number may miss or double-count some funds).
<br/><br/>
Movement of funds on discovered Bitcoin wallets:
{% responsive_image_block %}
figure: true
path: assets/images/graphs/trustwallet_style_ec_new_monthly_volume_graph1.png
alt: "Historic volume for the known 'Trust Wallet ec-new'-type Bitcoin wallets (aggregated)"
target_width: 800px
{% endresponsive_image_block %}
There are a series of 10 transactions around 2020-11-22 which abruptly move out near **93 BTC**, close to all of the remaining assets in this range at the time. We're unclear if this is a legitimate withdrawal or an early theft based on re-calculated weak private keys, but it looks very sudden and comprehensive.
It's possible that multiple large wallets belonged to the same person and they moved it out during Bitcoin's price historic price increase that year. A deliberate theft doesn't really fit the overall time line of the other weak wallet ranges - this is two years "too early" for the known weak wallet thefts, and there is no corresponding large movement of funds on other weak ranges we found during our initial checks.
Additionally, patterns in some of the stored amounts (2.75 - 3.0 BTC each) suggest that a significant percentage of them were controlled by a single source.
<details markdown=1>
<summary><b>List of fast withdrawal transactions</b> (click to unfold)</summary>
| Transaction | Volume | Date |
| - | - | - |
|{{ "2cca73bc90cb64c28775ab8c59e9b3e69afe2d7772a659f5a0cc5d38901033a2" | BtcLinkTxUrlSliced }} | -6.817 BTC | 2020-11-22 20:04 |
|{{ "d834b727cb821803126eb107928bab7c0bb73a7cc92076685b840a19cce083b6" | BtcLinkTxUrlSliced }} | -64.307 BTC | 2020-11-22 20:04 |
|{{ "8660e093dd7ad2ae2597c448a099117e9c516b73cd871f9a06cdf49b076bbc4c" | BtcLinkTxUrlSliced }} | -3.5 BTC | 2020-11-22 20:04 |
|{{ "c52d0e905cdfd4b89f6e29bcec44ff6b61d80c0e72bbe1d36762519207e1720e" | BtcLinkTxUrlSliced }} | -8.0 BTC | 2020-11-22 20:04 |
|{{ "08ed7b0f5b5588ba96f17215580a49a919ecc00e1393ea546f747a45870d9b2f" | BtcLinkTxUrlSliced }} | -6.035 BTC | 2020-11-22 20:04 |
|{{ "a9b32f33ce85b5b154f85fd2b454b7930b1579c89640a1930d07d02b06c7bd77" | BtcLinkTxUrlSliced }} | -0.023 BTC | 2020-11-22 20:07 |
|{{ "4bb53fda37654b5c49a2c545faab3da5d2334e72bcd51daeebc312a8d9c76edd" | BtcLinkTxUrlSliced }} | -0.017 BTC | 2020-11-22 20:07 |
|{{ "6b898c39865a5794744648c819ba244405d4b45a77176d1d9e58742c219e7a1c" | BtcLinkTxUrlSliced }} | -2.0 BTC | 2020-11-22 20:07 |
|{{ "93f9a63e9ddb7c840061a106271f8ddbe239402b43998aba0ed84faed802377c" | BtcLinkTxUrlSliced }} | -0.64 BTC | 2020-11-22 20:07 |
|{{ "1e4065feda31adcf952ee7c7ba789fe9c99fd962fdbc590d6187d4f760d460ac" | BtcLinkTxUrlSliced }} | -1.54 BTC | 2020-11-22 20:07 |
</details>
<br/>
## Uncompressed Public Keys on P2PKH
During recent work, we noticed that our wallet searches for Pay-To-Public-Key-Hash (P2PKH) addresses had assumed the public key to be in _compressed_ form (33 byte length). This is the modern and common way to generate P2PKH addresses from derived public keys. However, it is not the only way - there is a second canonical form which calculates the hash over the public key in _uncompressed_ form (65 byte length), which results in a different hash and therefore different address.
We've done some new searches to cover these variants and found a few previously missed wallets, mostly in `ec-new`-style ranges. The [previous blog post]({% link _posts/2023-11-22-research-update-1.md %}) has also been updated with the new data.
For some context on how those could be generated originally, the `bx` [ec-to-public](https://github.com/libbitcoin/libbitcoin-explorer/wiki/bx-ec-to-public) command in the special `bx ec-to-public --uncompressed` mode could have been involved, at least on the [bx ec key range]({% link _posts/2023-11-22-research-update-1.md %}#bx_ec_new_keys). Other wallet software may have similar legacy address encoding settings.
## Summary & Outlook
In this post, we provided some impact details relating to the publicly known Trust Wallet vulnerability, as well as new data on an older, not widely reported or researched wallet software vulnerability in the same BIP39 128 bit range. Similarly, we have shown some statistics and details of wallets in the related ec-new range, which most likely also come from a yet unknown wallet software vulnerability.
Finally, we described a less common but relevant Bitcoin Pay-To-Public-Key-Hash address variant that is useful to know for wallet search operations of this type.
We're working on new topics around the Trust Wallet-related weak wallet ranges. The next blog post will focus more on the development and data source side of our work.
Check out our [RSS]({% link feed.xml %}) feed if you want to get notified by your favorite reader application.
<br/>

View File

@ -0,0 +1,65 @@
---
layout: post
title: "Update #3 - Bloom Filter, Dataset, Canaries"
author: ["Christian Reitter"]
date: 2023-12-06 11:10:00 +0000
---
This research update has some information on the Bloom filter mechanism and public blockchain address data we used to find weak Bitcoin wallets. Using this technique, we were able to check several billion of potential wallets for actual usage on the blockchain without running a Bitcoin full node, or flooding other Bitcoin servers and APIs with excessive network requests.
We also describe some artificially created wallets that we've placed to track the real-world theft behavior in one of the weak ranges.
<div id="toc-container" markdown="1">
<h2 class="no_toc">Table of Contents</h2>
* placeholder
{:toc}
</div>
## Bloom Filter Explanation and Address Data Source
When searching through billions of algorithm-generated data chunks that could reveal a few interesting private keys, efficient filtering becomes very important. In our original publication, [we briefly described]({% link disclosure.md %}#searching-for-wallets---implementation) this as follows:
> We used a publicly available list of all Bitcoin addresses historically seen by the Bitcoin network and constructed a bloom filter with a very low false positive rate on the data set. Using this filter, we were able to do quick address lookups to query and discard many unused wallet candidates, for which the relevant derived accounts were never seen by the network, without doing costly lookups to a Bitcoin full node.
A [Bloom filter](https://en.wikipedia.org/wiki/Bloom_filter) is a special data structure that provides quick lookup checks against previously added elements. Unlike a [hash table](https://en.wikipedia.org/wiki/Hash_table) or other common lossless read-access-optimized list structures, the Bloom filter deliberately trades off some lookup accuracy for space-efficiency. This make lookup in RAM possible for datasets that would otherwise be too large. Depending on the settings used when creating the filter structure and inserting items, the lookup will falsely detect an item as being in the original set - a false positive - for a certain percentage of queries. In return for this negative effect, only a fraction of the original data footprint has to be kept in memory. This was very attractive for us for optimization reasons.
In the first days of our research, we experimented with a Python Proof-of-Concept to test out this data structure for our tasks. After converging on Rust as the main language for our tooling, the [bloomfilter](https://github.com/jedisct1/rust-bloom-filter) crate became our tool of choice. This library is very fast, but fairly minimal, and doesn't have a built-in mechanism to export and import pre-generated Bloom filter files from disk. For this reason, we wrote some serialization code to do this for us, as seen in [published code](https://git.distrust.co/milksad/lookup/src/branch/main/bloom-filter-generator) for the `bloom-filter-generator` and its [use](https://git.distrust.co/milksad/lookup/src/branch/main/mnemonic-hash-checker/src/bloom.rs) in the lookup server process. For the research code, we're using the [Rayon](https://github.com/rayon-rs/rayon) library to parallelize our worker threads, which are able to use a single Bloom filter object to avoid memory duplication, which is important when dealing with multiple dozen threads.
To check if the wallet addresses we derived from the generated weak private keys were previously used, we need a collection of addresses that were used on-chain, ideally covering every address ever seen publicly. For a blockchain like Bitcoin which has a long history and frequent changes of receive/change addresses, this is a lot of data. We considered using only addresses seen after a certain date (such as the first `bx` code commit with the vulnerable mechanism). But we had some resources to spare, and decided against this additional restriction, to ensure we wouldn't miss other wallet keys that were older than expected or from other generation sources.
The most comprehensive and up-to-date public collection of Bitcoin Mainnet addresses that we could find to build our filter is from `blockchair.com`, via [https://blockchair.com/dumps](https://blockchair.com/dumps). Due to download speed limits and the split nature of the data, we did not use this download source directly. Instead, we went with a derivative of this data.
User `LoyceV` from the `bitcointalk.org` forum distributes regularly updated data sets assembled from the individual `blockchair.com` data dump snippets via [http://alladdresses.loyce.club/](http://alladdresses.loyce.club/), as far as as we've understood from [public forum posts](https://bitcointalk.org/index.php?topic=5254914.0). This was just what we needed for Bitcoin, and a valuable resource to kickstart our research, so we're thankful it's publicly hosted without any barriers 👍.
Our `all_Bitcoin_addresses_ever_used_sorted.txt.gz` list snapshot from ca. 2023-08-01, which we used for our initial searches, comes in at ca. 42 Gigabytes in uncompressed form and has ca. 1.19 billion individual Bitcoin addresses. The corresponding Bloom filter that we built from it reduced this to ca. 7.3 Gigabytes in size (with a 0.00000000001 false positive factor for searches), which is far less data to keep in RAM. These numbers should explain why we are interested in a fast lookup mechanism with reduced memory footprint compared to the original data. Since false positives are still annoying to deal with in later processing stages, we've further reduced the false positive factor in our later research by `100x`, which has worked out quite well.
Going forward, we would like to extend our search to some other selected coins, but are still looking for recently updated, comprehensive data collections that are publicly available.
If you're aware of public and well-maintained address/pubkey/pubkey-hash collections for Ethereum and other popular coins, we would love to hear from you [directly]({% link index.md %}#contact)!
## Canary Wallet Observations
Very early into the `bx` vulnerability discovery, one of our team members deliberately moved small amounts of Bitcoin onto known vulnerable `bx seed -b 256 | bx mnemonic-new` generated wallet private keys. At this point, we already understood the main weakness and could deliberately generate specific weak keys, but did not yet have custom tooling to search through the vulnerable range. Setting up a some "canary" wallets with a few dollars in Bitcoin each was therefore a cheap and simple way to gather data on the behavior of attackers.
One of our questions was: are attacker now actively watching the vulnerable range for _new deposits_, and quickly acting upon them?
At least for the `bx` BIP39 range with 24 mnemonic words and our used paths, this was not the case initially. By the time of publication of this new blogpost, all of the four sub-wallets have been emptied, though:
| PRNG ID | derivation path | address | original deposit | theft transaction | theft date |
| -- | -- | -- | -- | -- |
| `0x000001f4` | `m/44'/0'/0'/0/0` | {{ "13KqxkrmsPKy8gyYwochCQTuPHC7Lp8bFU" | BtcLinkAddressUrlFull }} | $5 | {{ "ff8c6822846d835e5a476bf268ab4ddba396d476f0f1b5301eea62c6acfa9c3a" | BtcLinkTxUrlSliced }} | 2023-08-23 01:23 |
| `0x000001f4` | `m/0/0` | {{ "1NxkqwmsQMTqv4SrggPv4vGHDzJKR52S2f" | BtcLinkAddressUrlFull }} | $5 | {{ "256b6b987af466b4239048272534167a0e7d197f0c3fa716c1ba24fee3f3a851" | BtcLinkTxUrlSliced }} | 2023-08-27 12:20 |
| `0xffffffff` | `m/44'/0'/0'/0/0` | {{ "1HQR3nKaDahAFrPHMoDVdWiMNFGFb7cHA5" | BtcLinkAddressUrlFull }} | $5 | {{ "48354a8bee5cb71eccb725b501f43e6351823a1d4d6dcdd1033214335b18a3d5" | BtcLinkTxUrlSliced }} | 2023-09-30 09:07 |
| `0xffffffff` | `m/0/0` | {{ "16pQhPkBa5puwEzudZVyKtsrugLtA87cy" | BtcLinkAddressUrlFull }} | $1 | {{ "8d09a736a442f87f7f31c691c068a8e526f67093250720de83b028c4ed1f03cd" | BtcLinkTxUrlSliced }} | 2023-10-01 22:16 |
Considering the date of deposit after the main 2023-07-12 theft, low per-wallet funds and theft dates, the thieves sweeping the funds are likely not related to the main attacker. It's still interesting to see that even a weak wallet with as little as $1 in BTC gets emptied sooner or later. The sharks are clearly in the water now 🦈.
Note that the `m/0/0` derivation path we used is an older pattern, and rare - we haven't found other `bx`-generated Bitcoin wallets in this range. Attackers may have looked into some of these unusual paths more exhaustively just for these particular wallet PRNG IDs after discovering some usage via the more common M44 P2PKH standard path pattern.
## Summary & Outlook
In this post, we introduced a combination of data structure and data set that we successfully used to look up large numbers of addresses.
Additionally, we listed some previously internal information about deliberately created weak wallets and related theft patterns.
We still have a long backlog of research topics to present here. We'll try to get the next post ready before the holidays 🎁
Check out our [RSS]({% link feed.xml %}) feed if you want to get notified by your favorite reader application.
<br/>

View File

@ -204,3 +204,21 @@ blockquote {
background: rgba(0, 0, 0, 0.3490196078);
padding: 14px 25px 1px 25px;
}
figure > img {
display: block;
margin-left: auto;
margin-right: auto;
}
figcaption {
text-align: center;
margin-top: 0.5em;
margin-bottom: 0.5em;
}
figure {
margin-bottom: 1em;
margin-left: 0;
margin-right: 0;
}

Binary file not shown.

After

Width:  |  Height:  |  Size: 29 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 30 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 24 KiB