1ca55ac77d chacha20_poly1305: inline simd functions (Nick Johnson)
30920c4d84 chacha20_poly1305: drop mutable requirement (Nick Johnson)
415945cd2b chacha20_poly1305: avoid duplicate block work (Nick Johnson)
33dc1b95fa chacha20_poly1305: swap tuple for array (Nick Johnson)
dadd1d7224 chacha20_poly1305: remove alignment (Nick Johnson)
36d45bf360 chacha20_poly1305: remove mod operator (Nick Johnson)
Pull request description:
Inspired by JeremiahR's new benchmark on the chacha20 module, I dug in and found a few tweaks (and a larger refactor) which help a lot with performance.
The tweaks are broken down by commit. One of the relatively fruitful ones (5% increase) was refactoring the U32x4's rotation methods to use hardcoded matches instead of the mod operator. Others saw more modest performance gains.
The big change though was to *not* calculate two blocks for each keystream call (oops). I refactored the logic to handle the offset state (which is technically not required for BIP324, but as seen here, can be confusing to not have) and now we always calculate just the required amount of a keystream. This about doubles the performance.
For the curious, the SIMD performance isn't very impressive at this point, just a modest increase. But I have some hope that the experimental core library U32x4 will bring a nice bump (it is implemented with some unsafe hacking) and be an easy refactor.
ACKs for top commit:
apoelstra:
ACK 1ca55ac77db698f3816d8b7ed4051ddb5a579a29; successfully ran local tests
tcharding:
ACK 1ca55ac77d
Tree-SHA512: 8db1c6144d172775164859ffd0d97021f90d9123d06b5cccde21800aec4dcea75a5753d547933288ae233caf82e92849f894fb7019537d188baf10fbf7019684
There is a new lint error on nightly-2025-04-25 "variables can be used
directly in the `format!` string".
Exclude the lint to allow the existing syntax in `format!` strings.
* Inline all the block operations to give the compiler the best chance
to optimize the SIMD instruction usage. This squeezed out another
percent or two on the benchmarks when comparing target-cpu=native
builds to standard.
* The get_keystream method exposes the keystream for a block for
special case scenarios. Generally the cipher state should only be
updated with teh apply_keystream method.
* The keystream function is creating two state block on every call,
but just to handle a corner case. Break up the function
into separate methods so that the corner case is handled by itself,
avoiding unnecessary work most of the time.
* Handle offset state internally. While not strictly necessary due to
the cipher's use in BIP324, it makes the library much easier to work
with (the bug above would probably have been avoided) if the cipher
handles the offset state.
Recent clippy nightly update introduces warnings about precedence. While
ours are, IMO, clear the lint docs have some cases that are not so I
don't think we should ignore this lint. Specifically I could easily miss
this one
1 << 2 + 3 equals 32, while (1 << 2) + 3 equals 7
ref: https://rust-lang.github.io/rust-clippy/master/#/precede
Add parenthesis to explicitly show precedence. Refactor only no logic
changes.
There is a loose convention in Rust to not use `test_` prefix. The
reason being that `cargo test` outputs 'test <test name>' using the
prefix makes the output stutter.
This patch smells a bit like code-churn but having the prefix in some
places and not others is confusing to new contributors and is leading me
to explain this many times now. Lets just fix it.
Remove the prefix unless doing so breaks the code.
There is a range of different wordings used in the docs of constructor
type functions.
Change all to start with `Constructs a new` or `Constructs an empty`.
In functions that act like constructors there is a mixture of the usage
of `creates` and `constructs`.
Replace all occurrences of `creates` with `constructs` in the first line
of docs of constructor like functions.