We are trying to get rid of the `serde_derive` dependency from our
dependency graph.
Stop using default features for the `schemars` dependency which includes
`schemars_derive` which depends on `serder_derive`.
Manually implement `schemars::JsonSchema` instead of deriving it.
f2c5f19557 Introduce the `small-hash` feature for `bitcoin_hashes` (Alekos Filini)
Pull request description:
When enabled this feature swaps the hash implementation of sha512, sha256 and ripemd160 for a smaller (but also slower) one.
On embedded processors (Cortex-M4) it can lead to up to a 52% size reduction, from around 37KiB for just the `process_block` methods of the three hash functions to 17.8KiB.
The following numbers were collected on `aarch64-unknown-linux-gnu` with `cargo 1.72.0-nightly`.
## Original
```
RUSTFLAGS='--cfg=bench -C opt-level=z' cargo bench
```
```
test hash160::benches::hash160_10 ... bench: 33 ns/iter (+/- 1) = 303 MB/s
test hash160::benches::hash160_1k ... bench: 2,953 ns/iter (+/- 187) = 346 MB/s
test hash160::benches::hash160_64k ... bench: 188,480 ns/iter (+/- 11,595) = 347 MB/s
test hmac::benches::hmac_sha256_10 ... bench: 33 ns/iter (+/- 2) = 303 MB/s
test hmac::benches::hmac_sha256_1k ... bench: 2,957 ns/iter (+/- 104) = 346 MB/s
test hmac::benches::hmac_sha256_64k ... bench: 192,022 ns/iter (+/- 6,407) = 341 MB/s
test ripemd160::benches::ripemd160_10 ... bench: 25 ns/iter (+/- 1) = 400 MB/s
test ripemd160::benches::ripemd160_1k ... bench: 2,288 ns/iter (+/- 93) = 447 MB/s
test ripemd160::benches::ripemd160_64k ... bench: 146,823 ns/iter (+/- 1,102) = 446 MB/s
test sha1::benches::sha1_10 ... bench: 41 ns/iter (+/- 0) = 243 MB/s
test sha1::benches::sha1_1k ... bench: 3,844 ns/iter (+/- 70) = 266 MB/s
test sha1::benches::sha1_64k ... bench: 245,854 ns/iter (+/- 10,158) = 266 MB/s
test sha256::benches::sha256_10 ... bench: 35 ns/iter (+/- 0) = 285 MB/s
test sha256::benches::sha256_1k ... bench: 3,063 ns/iter (+/- 15) = 334 MB/s
test sha256::benches::sha256_64k ... bench: 195,729 ns/iter (+/- 2,880) = 334 MB/s
test sha256d::benches::sha256d_10 ... bench: 34 ns/iter (+/- 1) = 294 MB/s
test sha256d::benches::sha256d_1k ... bench: 3,071 ns/iter (+/- 107) = 333 MB/s
test sha256d::benches::sha256d_64k ... bench: 188,614 ns/iter (+/- 8,101) = 347 MB/s
test sha512::benches::sha512_10 ... bench: 21 ns/iter (+/- 0) = 476 MB/s
test sha512::benches::sha512_1k ... bench: 1,714 ns/iter (+/- 36) = 597 MB/s
test sha512::benches::sha512_64k ... bench: 110,084 ns/iter (+/- 3,637) = 595 MB/s
test sha512_256::benches::sha512_256_10 ... bench: 22 ns/iter (+/- 1) = 454 MB/s
test sha512_256::benches::sha512_256_1k ... bench: 1,822 ns/iter (+/- 70) = 562 MB/s
test sha512_256::benches::sha512_256_64k ... bench: 116,231 ns/iter (+/- 4,745) = 563 MB/s
test siphash24::benches::siphash24_1ki ... bench: 1,072 ns/iter (+/- 41) = 955 MB/s
test siphash24::benches::siphash24_1ki_hash ... bench: 1,102 ns/iter (+/- 42) = 929 MB/s
test siphash24::benches::siphash24_1ki_hash_u64 ... bench: 1,064 ns/iter (+/- 41) = 962 MB/s
test siphash24::benches::siphash24_64ki ... bench: 69,957 ns/iter (+/- 2,712) = 936 MB/
```
```
0000000000005872 t _ZN84_$LT$bitcoin_hashes..ripemd160..HashEngine$u20$as$u20$bitcoin_hashes..HashEngine$GT$5input17hc4800746a9da7ff4E
0000000000007956 t _ZN81_$LT$bitcoin_hashes..sha256..HashEngine$u20$as$u20$bitcoin_hashes..HashEngine$GT$5input17hf49345f65130ce9bE
0000000000008024 t _ZN14bitcoin_hashes6sha2568Midstate10const_hash17h57317bc8012004b4E.llvm.441255102889972912
0000000000010528 t _ZN81_$LT$bitcoin_hashes..sha512..HashEngine$u20$as$u20$bitcoin_hashes..HashEngine$GT$5input17h9bc868d4392bd9acE
```
Total size: 32380 bytes
## With `small-hash` enabled
```
RUSTFLAGS='--cfg=bench -C opt-level=z' cargo bench --features small-hash
```
```
test hash160::benches::hash160_10 ... bench: 52 ns/iter (+/- 3) = 192 MB/s
test hash160::benches::hash160_1k ... bench: 4,817 ns/iter (+/- 286) = 212 MB/s
test hash160::benches::hash160_64k ... bench: 319,572 ns/iter (+/- 11,031) = 205 MB/s
test hmac::benches::hmac_sha256_10 ... bench: 54 ns/iter (+/- 2) = 185 MB/s
test hmac::benches::hmac_sha256_1k ... bench: 4,846 ns/iter (+/- 204) = 211 MB/s
test hmac::benches::hmac_sha256_64k ... bench: 319,114 ns/iter (+/- 4,451) = 205 MB/s
test ripemd160::benches::ripemd160_10 ... bench: 27 ns/iter (+/- 0) = 370 MB/s
test ripemd160::benches::ripemd160_1k ... bench: 2,358 ns/iter (+/- 150) = 434 MB/s
test ripemd160::benches::ripemd160_64k ... bench: 154,573 ns/iter (+/- 3,954) = 423 MB/s
test sha1::benches::sha1_10 ... bench: 41 ns/iter (+/- 1) = 243 MB/s
test sha1::benches::sha1_1k ... bench: 3,700 ns/iter (+/- 243) = 276 MB/s
test sha1::benches::sha1_64k ... bench: 231,039 ns/iter (+/- 13,989) = 283 MB/s
test sha256::benches::sha256_10 ... bench: 51 ns/iter (+/- 3) = 196 MB/s
test sha256::benches::sha256_1k ... bench: 4,823 ns/iter (+/- 182) = 212 MB/s
test sha256::benches::sha256_64k ... bench: 299,960 ns/iter (+/- 17,545) = 218 MB/s
test sha256d::benches::sha256d_10 ... bench: 52 ns/iter (+/- 2) = 192 MB/s
test sha256d::benches::sha256d_1k ... bench: 4,827 ns/iter (+/- 323) = 212 MB/s
test sha256d::benches::sha256d_64k ... bench: 302,844 ns/iter (+/- 15,796) = 216 MB/s
test sha512::benches::sha512_10 ... bench: 34 ns/iter (+/- 1) = 294 MB/s
test sha512::benches::sha512_1k ... bench: 3,002 ns/iter (+/- 123) = 341 MB/s
test sha512::benches::sha512_64k ... bench: 189,767 ns/iter (+/- 10,396) = 345 MB/s
test sha512_256::benches::sha512_256_10 ... bench: 34 ns/iter (+/- 1) = 294 MB/s
test sha512_256::benches::sha512_256_1k ... bench: 2,996 ns/iter (+/- 198) = 341 MB/s
test sha512_256::benches::sha512_256_64k ... bench: 192,024 ns/iter (+/- 8,181) = 341 MB/s
test siphash24::benches::siphash24_1ki ... bench: 1,081 ns/iter (+/- 65) = 947 MB/s
test siphash24::benches::siphash24_1ki_hash ... bench: 1,083 ns/iter (+/- 63) = 945 MB/s
test siphash24::benches::siphash24_1ki_hash_u64 ... bench: 1,084 ns/iter (+/- 63) = 944 MB/s
test siphash24::benches::siphash24_64ki ... bench: 67,237 ns/iter (+/- 4,185) = 974 MB/s
```
```
0000000000005384 t _ZN81_$LT$bitcoin_hashes..sha256..HashEngine$u20$as$u20$bitcoin_hashes..HashEngine$GT$5input17hae341658cf9b880bE
0000000000005608 t _ZN14bitcoin_hashes9ripemd16010HashEngine13process_block17h3276b13f1e9feef8E.llvm.13618235596061801146
0000000000005616 t _ZN14bitcoin_hashes6sha2568Midstate10const_hash17h3e6fbef64c15ee00E.llvm.7326223909590351031
0000000000005944 t _ZN81_$LT$bitcoin_hashes..sha512..HashEngine$u20$as$u20$bitcoin_hashes..HashEngine$GT$5input17h321a237bfbe5c0bbE
```
Total size: 22552 bytes
## Conclusion
On `aarch64` there's overall a ~30% improvement in size, although ripemd160 doesn't really shrink that much (and its performance also aren't impacted much with only a 6% slowdown). sha512 and sha256 instead are almost 40% slower with `small-hash` enabled.
I don't have performance numbers for other architectures, but in terms of size there was an even larger improvements on `thumbv7em-none-eabihf`, with a 52% size reduction overall:
```
Size Crate Name
25.3KiB bitcoin_hashes <bitcoin_hashes[fe467ef2aa3a1470]::sha512::HashEngine as bitcoin_hashes[fe467ef2aa3a1470]::HashEngine>::input
6.9KiB bitcoin_hashes <bitcoin_hashes[fe467ef2aa3a1470]::sha256::HashEngine as bitcoin_hashes[fe467ef2aa3a1470]::HashEngine>::input
4.8KiB bitcoin_hashes <bitcoin_hashes[fe467ef2aa3a1470]::ripemd160::HashEngine as bitcoin_hashes[fe467ef2aa3a1470]::HashEngine>::input
```
vs
```
Size Crate Name
9.5KiB bitcoin_hashes <bitcoin_hashes[974bb476ef905797]::sha512::HashEngine as bitcoin_hashes[974bb476ef905797]::HashEngine>::input
4.5KiB bitcoin_hashes <bitcoin_hashes[974bb476ef905797]::ripemd160::HashEngine>::process_block
3.8KiB bitcoin_hashes <bitcoin_hashes[974bb476ef905797]::sha256::HashEngine as bitcoin_hashes[974bb476ef905797]::HashEngine>::input
```
I'm assuming this is because on more limited architectures the compiler needs to use more instructions to move data in and out of registers (especially for sha512 which ideally would benefit from 64-bit registers), so reusing the code by moving it into functions saves a lot of those instructions.
Also note that the `const_hash` method on `sha256` causes the compiler to emit two independent implementations. I haven't looked into the code yet, maybe there's a way to merge them so that the non-const `process_block` calls into the const fn.
-----
Note: commits are unverified right now because I don't have the keys available, I will sign them after addressing the review comments.
ACKs for top commit:
apoelstra:
ACK f2c5f19557
tcharding:
ACK f2c5f19557
Tree-SHA512: 1d5eb56324c458660e2571e8cf59895dc31dae9c5427c7ed36f8a0e81ca2e9a0f39026f56b6803df03635cc8b66aee3bf5182d51ab8972d169d56bcfec33771c
546c0122d7 Add simd sha256 intrinsics for x86 machines (sanket1729)
Pull request description:
This is my first time dabbling into architecture specific code and simd. The algorithm is a word to word translation of the C code from 4899efc81d/sha256-x86.c .
Some benchmarks:
With simd
```
test sha256::benches::sha256_10 ... bench: 11 ns/iter (+/- 0) = 909 MB/s
test sha256::benches::sha256_1k ... bench: 712 ns/iter (+/- 2) = 1438 MB/s
test sha256::benches::sha256_64k ... bench: 45,597 ns/iter (+/- 189) = 1437 MB/s
```
Without simd
```
test sha256::benches::sha256_10 ... bench: 47 ns/iter (+/- 0) = 212 MB/s
test sha256::benches::sha256_1k ... bench: 4,243 ns/iter (+/- 17) = 241 MB/s
test sha256::benches::sha256_64k ... bench: 271,263 ns/iter (+/- 1,610) = 241 MB/s
```
ACKs for top commit:
apoelstra:
ACK 546c0122d7
tcharding:
ACK 546c0122d7
Tree-SHA512: 7167c900b77e63cf38135a3960cf9ac2615f73b2ef7020a12b5cc3f4c047910063ba9045217b9ecfa70f7de1eb0f02f2674f291bd023a853bad2b9162fae831e
When enabled this feature swaps the hash implementation of sha512,
sha256 and ripemd160 for a smaller (but also slower) one.
On embedded processors (Cortex-M4) it can lead to up to a 52% size
reduction, from around 37KiB for just the `process_block` methods of the
three hash functions to 17.8KiB.
We have just released the `hex-conservative` crate, we can now use it.
Do the following:
- Depend on `hex-conservative` in `bitcoin` and `hashes`
- Re-export `hex-conservative` as `hex` from both crate roots.
- Remove all the old hex code from `hashes`
- Fix all the import statements (makes up the bulk of the lines changed
in this patch)
We are trying to make error types stable on the way to v1.0
The current `hashes::Error` is a "general" enum error type with a single
variant, better to use a struct and make the error usecase specific.
Improve the `hashes::Error` by doing:
- Make it a struct
- Rename to `FromSliceError`
- Move it to the crate root (remove `error` module)
Includes usage in `bitcoin`.
Whether or not every file needs an explicit license comment is out of
scope for this patch; in the `bitcoin` crate we use SPDX identifiers
because they are a single line with no loss of "benefit" over any longer
form.
Use SPDX identifiers in `hashes`. Drop the mention of re-licensing code
from Apache to CC0-1 (because the original code was written by Andrew
as well as the copied code then if the argument ever comes up it can be
easily countered).
The Rust API guidelines state that macros should be evocative of the
output, which is a sensible recommendation. We already had this for
`hash_newtype!` macro but didn't for sha256t version.
This changes the macro to have this syntax:
```rust
sha256t_hash_newtype! {
// Order of these structs is fixed.
/// Optional documentation details here. Summary is auto-generated.
/*pub*/ struct Tag = raw(MIDSTATE_BYTES, LEN);
/// Documentation here
#[hash_newtype(forward)] // optional, default is backward
/*pub*/ struct HashType(/* attributes allowed here */ _);
}
```
Closes#1427
Computing hashes in const fn is useful for esily creating tags for
`sha256t`. This adds `const fn` implementation for `sha256::Hash` and
the algorithm for computing midstate of tagged hash in `const` context.
Part of #1427
The `hashes` module contains a bunch of arrays, mostly formatted with 8
hex bytes on a line; add `rustfmt::skip` to keep the formatting of
arrays as is.
Remove the exclude for the `hashes` crate. Do not run the formatter,
that will be done as a separate patch to aid review.
Currently we have an associated type on hash types `Inner` with
accompanying methods `into_inner`, `from_inner`, `as_inner`. Also, we
provide a way to create new wrapped hash types. The use of 'inner'
becomes ambiguous with the addition of wrapped types because the inner
could be the inner hash type or the `Inner` byte array of the inner
wrapped hash type.
In an effort to make the API more clear and uniform do the following:
- Rename `Inner` -> `Bytes`
- Rename `*_inner` -> `*_byte_array`
- Rename the inner hash to/from methods to `*_raw_hash`
Correct method prefix `into_` -> `to_` because theses methods convert
owned `Copy` types.
Add the trait Bound `Copy` to the `Bytes` type because we rely on this
trait bound for the conversion methods to be correctly named according
to convention.
Because of the dependency hole created by `secp256k1` this patch changes
the secp dependency to a git tag dependency that includes changes to the
hashes calls required so that we can get green lights on CI in this
repo.
Currently we have a few things mixed up in the feature gating of
`hashes`.
Observe that:
- `io::Write` is not related to allocation.
- "std" should be able to enable "alloc".
Improve feature gating by doing:
- Enable "alloc" from "std" and simplify `cfg` codebase wide.
- Enable "internals/alloc" from "alloc".
- Fix feature gating to use the minimal requirement i.e., "alloc".
Remove `FromHex` from hash and script types
- Remove the `FromHex` implementation from hash types and `ScriptBuf`
- Remove the `FromStr` implementation from `ScriptBuf` because it does not
roundtrip with `Display`.
- Implement a method `from_hex` on `ScriptBuf`.
- Implement `FromStr` on hash types using a fixed size array.
This leaves `FromHex` implementations only on `Vec` and fixed size arrays.
The `ToHex` trait was replaced by either simple `Display`/`LowerHex`
where appropriate or `DisplayHex` from `bitcoin_internals` which is
faster.
This change replaces the usages and removes the trait.
`bitcoin-internals` contains a more performant implementation of hex
encoding than what `bitcoin_hashes` uses internally. This switches the
implementations for formatting trait implementations as a step towards
moving over completely.
The public macros are also changed to delegate to inner type which is
technically a breaking change but we will break the API anyway and the
consuers should only call the macro on the actual hash newtypes where
the inner types already have the appropriate implementations.
Apart from removing reliance on internal hex from public API this
reduces duplicated code generated and compiled. E.g. if you created 10
hash newtypes of SHA256 the formatting implementation would be
instantiated 11 times despite being the same.
To do all this some other changes were required to the hex
infrastructure. Mainly modifying `put_bytes` to accept iterator (so that
`iter().rev()` can be used) and adding a new `DisplayArray` type. The
iterator idea was invented by Tobin C. Harding, this commit just adds a
bound check and generalizes over `u8` and `&u8` returning iterators.
While it may seem that `DisplayByteSlice` would suffice it'd create and
initialize a large array even for small arrays wasting performance.
Knowing the exact length `DisplayArray` fixes this.
Another part of refactoring is changing from returning `impl Display` to
return `impl LowerHex + UpperHex`. This makes selecting casing less
annoying since the consumer no longer needs to import `Case` without
cluttering the API with convenience methods.
We would like to bring the `bitcoin_hashes` crate into the
`rust-bitcoin` repository.
Import `bitcoin_hashes` into `rust-bitocin/hashes`, doing so looses all
the commit history from the original crate but if we archive the
original repository then the history will be preserved. We maintain the
same version number obviously and in the changelog we note the change of
repository.
Commit hash that was tip of `bitcoin_hashes` at time of import:
commit 54c16249e06cc6b7870c7fc07d90f489d82647c7
Includes making `embedded` and `fuzzing` per-crate i.e., move them into
`bitcoin` as hashes includes these also.
NOTE: Does _not_ enable fuzzing for `hashes` in CI.
Notes on CI:
Attempts to merge in the github actions from the hashes crate however reduces
coverage by not running hashes tests for beta toolchain. Some additional
work could be done to improve the CI to increase efficiency without
reducing coverage. Leaving for another day.