rust-bitcoin-unsafe-fast/hashes
Andrew Poelstra 1b3a9d3580
Merge rust-bitcoin/rust-bitcoin#1990: Introduce the `small-hash` feature for `bitcoin_hashes`
f2c5f19557 Introduce the `small-hash` feature for `bitcoin_hashes` (Alekos Filini)

Pull request description:

  When enabled this feature swaps the hash implementation of sha512, sha256 and ripemd160 for a smaller (but also slower) one.

  On embedded processors (Cortex-M4) it can lead to up to a 52% size reduction, from around 37KiB for just the `process_block` methods of the three hash functions to 17.8KiB.

  The following numbers were collected on `aarch64-unknown-linux-gnu` with `cargo 1.72.0-nightly`.

  ## Original

  ```
  RUSTFLAGS='--cfg=bench -C opt-level=z' cargo bench
  ```

  ```
  test hash160::benches::hash160_10                 ... bench:          33 ns/iter (+/- 1) = 303 MB/s
  test hash160::benches::hash160_1k                 ... bench:       2,953 ns/iter (+/- 187) = 346 MB/s
  test hash160::benches::hash160_64k                ... bench:     188,480 ns/iter (+/- 11,595) = 347 MB/s
  test hmac::benches::hmac_sha256_10                ... bench:          33 ns/iter (+/- 2) = 303 MB/s
  test hmac::benches::hmac_sha256_1k                ... bench:       2,957 ns/iter (+/- 104) = 346 MB/s
  test hmac::benches::hmac_sha256_64k               ... bench:     192,022 ns/iter (+/- 6,407) = 341 MB/s
  test ripemd160::benches::ripemd160_10             ... bench:          25 ns/iter (+/- 1) = 400 MB/s
  test ripemd160::benches::ripemd160_1k             ... bench:       2,288 ns/iter (+/- 93) = 447 MB/s
  test ripemd160::benches::ripemd160_64k            ... bench:     146,823 ns/iter (+/- 1,102) = 446 MB/s
  test sha1::benches::sha1_10                       ... bench:          41 ns/iter (+/- 0) = 243 MB/s
  test sha1::benches::sha1_1k                       ... bench:       3,844 ns/iter (+/- 70) = 266 MB/s
  test sha1::benches::sha1_64k                      ... bench:     245,854 ns/iter (+/- 10,158) = 266 MB/s
  test sha256::benches::sha256_10                   ... bench:          35 ns/iter (+/- 0) = 285 MB/s
  test sha256::benches::sha256_1k                   ... bench:       3,063 ns/iter (+/- 15) = 334 MB/s
  test sha256::benches::sha256_64k                  ... bench:     195,729 ns/iter (+/- 2,880) = 334 MB/s
  test sha256d::benches::sha256d_10                 ... bench:          34 ns/iter (+/- 1) = 294 MB/s
  test sha256d::benches::sha256d_1k                 ... bench:       3,071 ns/iter (+/- 107) = 333 MB/s
  test sha256d::benches::sha256d_64k                ... bench:     188,614 ns/iter (+/- 8,101) = 347 MB/s
  test sha512::benches::sha512_10                   ... bench:          21 ns/iter (+/- 0) = 476 MB/s
  test sha512::benches::sha512_1k                   ... bench:       1,714 ns/iter (+/- 36) = 597 MB/s
  test sha512::benches::sha512_64k                  ... bench:     110,084 ns/iter (+/- 3,637) = 595 MB/s
  test sha512_256::benches::sha512_256_10           ... bench:          22 ns/iter (+/- 1) = 454 MB/s
  test sha512_256::benches::sha512_256_1k           ... bench:       1,822 ns/iter (+/- 70) = 562 MB/s
  test sha512_256::benches::sha512_256_64k          ... bench:     116,231 ns/iter (+/- 4,745) = 563 MB/s
  test siphash24::benches::siphash24_1ki            ... bench:       1,072 ns/iter (+/- 41) = 955 MB/s
  test siphash24::benches::siphash24_1ki_hash       ... bench:       1,102 ns/iter (+/- 42) = 929 MB/s
  test siphash24::benches::siphash24_1ki_hash_u64   ... bench:       1,064 ns/iter (+/- 41) = 962 MB/s
  test siphash24::benches::siphash24_64ki           ... bench:      69,957 ns/iter (+/- 2,712) = 936 MB/
  ```

  ```
  0000000000005872 t _ZN84_$LT$bitcoin_hashes..ripemd160..HashEngine$u20$as$u20$bitcoin_hashes..HashEngine$GT$5input17hc4800746a9da7ff4E
  0000000000007956 t _ZN81_$LT$bitcoin_hashes..sha256..HashEngine$u20$as$u20$bitcoin_hashes..HashEngine$GT$5input17hf49345f65130ce9bE
  0000000000008024 t _ZN14bitcoin_hashes6sha2568Midstate10const_hash17h57317bc8012004b4E.llvm.441255102889972912
  0000000000010528 t _ZN81_$LT$bitcoin_hashes..sha512..HashEngine$u20$as$u20$bitcoin_hashes..HashEngine$GT$5input17h9bc868d4392bd9acE
  ```

  Total size: 32380 bytes

  ## With `small-hash` enabled

  ```
  RUSTFLAGS='--cfg=bench -C opt-level=z' cargo bench --features small-hash
  ```

  ```
  test hash160::benches::hash160_10                 ... bench:          52 ns/iter (+/- 3) = 192 MB/s
  test hash160::benches::hash160_1k                 ... bench:       4,817 ns/iter (+/- 286) = 212 MB/s
  test hash160::benches::hash160_64k                ... bench:     319,572 ns/iter (+/- 11,031) = 205 MB/s
  test hmac::benches::hmac_sha256_10                ... bench:          54 ns/iter (+/- 2) = 185 MB/s
  test hmac::benches::hmac_sha256_1k                ... bench:       4,846 ns/iter (+/- 204) = 211 MB/s
  test hmac::benches::hmac_sha256_64k               ... bench:     319,114 ns/iter (+/- 4,451) = 205 MB/s
  test ripemd160::benches::ripemd160_10             ... bench:          27 ns/iter (+/- 0) = 370 MB/s
  test ripemd160::benches::ripemd160_1k             ... bench:       2,358 ns/iter (+/- 150) = 434 MB/s
  test ripemd160::benches::ripemd160_64k            ... bench:     154,573 ns/iter (+/- 3,954) = 423 MB/s
  test sha1::benches::sha1_10                       ... bench:          41 ns/iter (+/- 1) = 243 MB/s
  test sha1::benches::sha1_1k                       ... bench:       3,700 ns/iter (+/- 243) = 276 MB/s
  test sha1::benches::sha1_64k                      ... bench:     231,039 ns/iter (+/- 13,989) = 283 MB/s
  test sha256::benches::sha256_10                   ... bench:          51 ns/iter (+/- 3) = 196 MB/s
  test sha256::benches::sha256_1k                   ... bench:       4,823 ns/iter (+/- 182) = 212 MB/s
  test sha256::benches::sha256_64k                  ... bench:     299,960 ns/iter (+/- 17,545) = 218 MB/s
  test sha256d::benches::sha256d_10                 ... bench:          52 ns/iter (+/- 2) = 192 MB/s
  test sha256d::benches::sha256d_1k                 ... bench:       4,827 ns/iter (+/- 323) = 212 MB/s
  test sha256d::benches::sha256d_64k                ... bench:     302,844 ns/iter (+/- 15,796) = 216 MB/s
  test sha512::benches::sha512_10                   ... bench:          34 ns/iter (+/- 1) = 294 MB/s
  test sha512::benches::sha512_1k                   ... bench:       3,002 ns/iter (+/- 123) = 341 MB/s
  test sha512::benches::sha512_64k                  ... bench:     189,767 ns/iter (+/- 10,396) = 345 MB/s
  test sha512_256::benches::sha512_256_10           ... bench:          34 ns/iter (+/- 1) = 294 MB/s
  test sha512_256::benches::sha512_256_1k           ... bench:       2,996 ns/iter (+/- 198) = 341 MB/s
  test sha512_256::benches::sha512_256_64k          ... bench:     192,024 ns/iter (+/- 8,181) = 341 MB/s
  test siphash24::benches::siphash24_1ki            ... bench:       1,081 ns/iter (+/- 65) = 947 MB/s
  test siphash24::benches::siphash24_1ki_hash       ... bench:       1,083 ns/iter (+/- 63) = 945 MB/s
  test siphash24::benches::siphash24_1ki_hash_u64   ... bench:       1,084 ns/iter (+/- 63) = 944 MB/s
  test siphash24::benches::siphash24_64ki           ... bench:      67,237 ns/iter (+/- 4,185) = 974 MB/s
  ```

  ```
  0000000000005384 t _ZN81_$LT$bitcoin_hashes..sha256..HashEngine$u20$as$u20$bitcoin_hashes..HashEngine$GT$5input17hae341658cf9b880bE
  0000000000005608 t _ZN14bitcoin_hashes9ripemd16010HashEngine13process_block17h3276b13f1e9feef8E.llvm.13618235596061801146
  0000000000005616 t _ZN14bitcoin_hashes6sha2568Midstate10const_hash17h3e6fbef64c15ee00E.llvm.7326223909590351031
  0000000000005944 t _ZN81_$LT$bitcoin_hashes..sha512..HashEngine$u20$as$u20$bitcoin_hashes..HashEngine$GT$5input17h321a237bfbe5c0bbE
  ```

  Total size: 22552 bytes

  ## Conclusion

  On `aarch64` there's overall a ~30% improvement in size, although ripemd160 doesn't really shrink that much (and its performance also aren't impacted much with only a 6% slowdown). sha512 and sha256 instead are almost 40% slower with `small-hash` enabled.

  I don't have performance numbers for other architectures, but in terms of size there was an even larger improvements on `thumbv7em-none-eabihf`, with a 52% size reduction overall:

  ```
     Size          Crate Name
  25.3KiB bitcoin_hashes <bitcoin_hashes[fe467ef2aa3a1470]::sha512::HashEngine as bitcoin_hashes[fe467ef2aa3a1470]::HashEngine>::input
   6.9KiB bitcoin_hashes <bitcoin_hashes[fe467ef2aa3a1470]::sha256::HashEngine as bitcoin_hashes[fe467ef2aa3a1470]::HashEngine>::input
   4.8KiB bitcoin_hashes <bitcoin_hashes[fe467ef2aa3a1470]::ripemd160::HashEngine as bitcoin_hashes[fe467ef2aa3a1470]::HashEngine>::input
  ```

  vs

  ```
    Size          Crate Name
  9.5KiB bitcoin_hashes <bitcoin_hashes[974bb476ef905797]::sha512::HashEngine as bitcoin_hashes[974bb476ef905797]::HashEngine>::input
  4.5KiB bitcoin_hashes <bitcoin_hashes[974bb476ef905797]::ripemd160::HashEngine>::process_block
  3.8KiB bitcoin_hashes <bitcoin_hashes[974bb476ef905797]::sha256::HashEngine as bitcoin_hashes[974bb476ef905797]::HashEngine>::input
  ```

  I'm assuming this is because on more limited architectures the compiler needs to use more instructions to move data in and out of registers (especially for sha512 which ideally would benefit from 64-bit registers), so reusing the code by moving it into functions saves a lot of those instructions.

  Also note that the `const_hash` method on `sha256` causes the compiler to emit two independent implementations. I haven't looked into the code yet, maybe there's a way to merge them so that the non-const `process_block` calls into the const fn.

  -----

  Note: commits are unverified right now because I don't have the keys available, I will sign them after addressing the review comments.

ACKs for top commit:
  apoelstra:
    ACK f2c5f19557
  tcharding:
    ACK f2c5f19557

Tree-SHA512: 1d5eb56324c458660e2571e8cf59895dc31dae9c5427c7ed36f8a0e81ca2e9a0f39026f56b6803df03635cc8b66aee3bf5182d51ab8972d169d56bcfec33771c
2023-08-17 16:34:10 +00:00
..
contrib CI: Pin serde_json for MSRV build 2023-07-12 15:50:18 +10:00
embedded hashes/embedded: Add script dir and README 2023-07-18 10:27:48 +10:00
extended_tests/schemars schemars: Add pinning docs 2023-07-18 10:27:48 +10:00
src Merge rust-bitcoin/rust-bitcoin#1990: Introduce the `small-hash` feature for `bitcoin_hashes` 2023-08-17 16:34:10 +00:00
CHANGELOG.md Improve hashes::Error 2023-05-25 13:25:13 +10:00
Cargo.toml Introduce the `small-hash` feature for `bitcoin_hashes` 2023-08-16 14:19:17 +02:00
README.md hashes: Remove stale status badge 2023-05-25 14:34:28 +10:00

README.md

Bitcoin Hashes Library

This is a simple, no-dependency library which implements the hash functions needed by Bitcoin. These are SHA1, SHA256, SHA256d, SHA512, and RIPEMD160. As an ancilliary thing, it exposes hexadecimal serialization and deserialization, since these are needed to display hashes anway.

Documentation

Minimum Supported Rust Version (MSRV)

This library should always compile with any combination of features on Rust 1.48.0.

To build with the MSRV you will need to pin serde (if you have either the serde or the schemars feature enabled)

# serde 1.0.157 uses syn 2.0 which requires edition 2021
cargo update -p serde --precise 1.0.156

before building. (And if your code is a library, your downstream users will need to run these commands, and so on.)

Contributions

Contributions are welcome, including additional hash function implementations.

Githooks

To assist devs in catching errors before running CI we provide some githooks. If you do not already have locally configured githooks you can use the ones in this repository by running, in the root directory of the repository:

git config --local core.hooksPath githooks/

Alternatively add symlinks in your .git/hooks directory to any of the githooks we provide.

Running Benchmarks

We use a custom Rust compiler configuration conditional to guard the bench mark code. To run the bench marks use: RUSTFLAGS='--cfg=bench' cargo +nightly bench.