1fea098dfb Support unsized `R` and `W` in consensus encode/decode (Dawid Ciężarkiewicz)
a24a3b0194 Forward `consensus_decode` to `consensus_decode_from_finite_reader` (Dawid Ciężarkiewicz)
9c754ca4de Take Writer/Reader by `&mut` in consensus en/decoding (Dawid Ciężarkiewicz)
Pull request description:
Fix#1020 (see more relevant discussion there)
This definitely makes the amount of generics compiler
has to generate by avoding generating the same functions
for `R`, `&mut R`, `&mut &mut R` and so on.
old:
```
> ls -al target/release/deps/bitcoin-07a9dabf1f3e0266
-rwxrwxr-x 1 dpc dpc 9947832 Jun 2 22:42 target/release/deps/bitcoin-07a9dabf1f3e0266
> strip target/release/deps/bitcoin-07a9dabf1f3e0266
> ls -al target/release/deps/bitcoin-07a9dabf1f3e0266
-rwxrwxr-x 1 dpc dpc 4463024 Jun 2 22:46 target/release/deps/bitcoin-07a9dabf1f3e0266
```
new:
```
> ls -al target/release/deps/bitcoin-07a9dabf1f3e0266
-rwxrwxr-x 1 dpc dpc 9866800 Jun 2 22:44 target/release/deps/bitcoin-07a9dabf1f3e0266
> strip target/release/deps/bitcoin-07a9dabf1f3e0266
> ls -al target/release/deps/bitcoin-07a9dabf1f3e0266
-rwxrwxr-x 1 dpc dpc 4393392 Jun 2 22:45 target/release/deps/bitcoin-07a9dabf1f3e0266
```
In the unit-test binary itself, it saves ~100KB of data.
I did not expect much performance gains, but turn out I was wrong(*):
old:
```
test blockdata::block::benches::bench_block_deserialize ... bench: 1,072,710 ns/iter (+/- 21,871)
test blockdata::block::benches::bench_block_serialize ... bench: 191,223 ns/iter (+/- 5,833)
test blockdata::block::benches::bench_block_serialize_logic ... bench: 37,543 ns/iter (+/- 732)
test blockdata::block::benches::bench_stream_reader ... bench: 1,872,455 ns/iter (+/- 149,519)
test blockdata::transaction::benches::bench_transaction_deserialize ... bench: 136 ns/iter (+/- 3)
test blockdata::transaction::benches::bench_transaction_serialize ... bench: 51 ns/iter (+/- 8)
test blockdata::transaction::benches::bench_transaction_serialize_logic ... bench: 5 ns/iter (+/- 0)
test blockdata::transaction::benches::bench_transaction_size ... bench: 3 ns/iter (+/- 0)
```
new:
```
test blockdata::block::benches::bench_block_deserialize ... bench: 1,028,574 ns/iter (+/- 10,910)
test blockdata::block::benches::bench_block_serialize ... bench: 162,143 ns/iter (+/- 3,363)
test blockdata::block::benches::bench_block_serialize_logic ... bench: 30,725 ns/iter (+/- 695)
test blockdata::block::benches::bench_stream_reader ... bench: 1,437,071 ns/iter (+/- 53,694)
test blockdata::transaction::benches::bench_transaction_deserialize ... bench: 92 ns/iter (+/- 2)
test blockdata::transaction::benches::bench_transaction_serialize ... bench: 17 ns/iter (+/- 0)
test blockdata::transaction::benches::bench_transaction_serialize_logic ... bench: 5 ns/iter (+/- 0)
test blockdata::transaction::benches::bench_transaction_size ... bench: 4 ns/iter (+/- 0)
```
(*) - I'm benchmarking on a noisy laptop. Take this with a grain of salt. But I think
at least it doesn't make anything slower.
While doing all this manual labor that will probably generate conflicts,
I took a liberty of changing generic type names and variable names to
`r` and `R` (reader) and `w` and `W` for writer.
ACKs for top commit:
RCasatta:
ACK 1fea098dfb tested in downstream lib, space saving in compiled code confirmed
apoelstra:
ACK 1fea098dfb
Tree-SHA512: bc11994791dc97cc468dc9d411b9abf52ad475f23adf5c43d563f323bae0da180c8f57f2f17c1bb7b9bdcf523584b0943763742b81362880206779872ad7489f
We allocated a new vector when serializing a `Witness`. That was
inefficient and unnecessary. Use `serialize_seq` to feed the witness
elements directly into the serializer.
Optimize `Witness` serialization by removing the allocation.
Fix#1020 (see more relevant discussion there)
This definitely makes the amount of generics compiler
has to generate by avoding generating the same functions
for `R`, &mut R`, `&mut &mut R` and so on.
old:
```
> ls -al target/release/deps/bitcoin-07a9dabf1f3e0266
-rwxrwxr-x 1 dpc dpc 9947832 Jun 2 22:42 target/release/deps/bitcoin-07a9dabf1f3e0266
> strip target/release/deps/bitcoin-07a9dabf1f3e0266
> ls -al target/release/deps/bitcoin-07a9dabf1f3e0266
-rwxrwxr-x 1 dpc dpc 4463024 Jun 2 22:46 target/release/deps/bitcoin-07a9dabf1f3e0266
```
new:
```
> ls -al target/release/deps/bitcoin-07a9dabf1f3e0266
-rwxrwxr-x 1 dpc dpc 9866800 Jun 2 22:44 target/release/deps/bitcoin-07a9dabf1f3e0266
> strip target/release/deps/bitcoin-07a9dabf1f3e0266
> ls -al target/release/deps/bitcoin-07a9dabf1f3e0266
-rwxrwxr-x 1 dpc dpc 4393392 Jun 2 22:45 target/release/deps/bitcoin-07a9dabf1f3e0266
```
In the unit-test binary itself, it saves ~100KB of data.
I did not expect much performance gains, but turn out I was wrong(*):
old:
```
test blockdata::block::benches::bench_block_deserialize ... bench: 1,072,710 ns/iter (+/- 21,871)
test blockdata::block::benches::bench_block_serialize ... bench: 191,223 ns/iter (+/- 5,833)
test blockdata::block::benches::bench_block_serialize_logic ... bench: 37,543 ns/iter (+/- 732)
test blockdata::block::benches::bench_stream_reader ... bench: 1,872,455 ns/iter (+/- 149,519)
test blockdata::transaction::benches::bench_transaction_deserialize ... bench: 136 ns/iter (+/- 3)
test blockdata::transaction::benches::bench_transaction_serialize ... bench: 51 ns/iter (+/- 8)
test blockdata::transaction::benches::bench_transaction_serialize_logic ... bench: 5 ns/iter (+/- 0)
test blockdata::transaction::benches::bench_transaction_size ... bench: 3 ns/iter (+/- 0)
```
new:
```
test blockdata::block::benches::bench_block_deserialize ... bench: 1,028,574 ns/iter (+/- 10,910)
test blockdata::block::benches::bench_block_serialize ... bench: 162,143 ns/iter (+/- 3,363)
test blockdata::block::benches::bench_block_serialize_logic ... bench: 30,725 ns/iter (+/- 695)
test blockdata::block::benches::bench_stream_reader ... bench: 1,437,071 ns/iter (+/- 53,694)
test blockdata::transaction::benches::bench_transaction_deserialize ... bench: 92 ns/iter (+/- 2)
test blockdata::transaction::benches::bench_transaction_serialize ... bench: 17 ns/iter (+/- 0)
test blockdata::transaction::benches::bench_transaction_serialize_logic ... bench: 5 ns/iter (+/- 0)
test blockdata::transaction::benches::bench_transaction_size ... bench: 4 ns/iter (+/- 0)
```
(*) - I'm benchmarking on a noisy laptop. Take this with a grain of salt. But I think
at least it doesn't make anything slower.
While doing all this manual labor that will probably generate conflicts,
I took a liberty of changing generic type names and variable names to
`r` and `R` (reader) and `w` and `W` for writer.
Clippy emits:
warning: this expression creates a reference which is immediately
dereferenced by the compiler
As suggested, remove the additional reference.
Use cargo to upgrade from edition 2015 to edition 2018.
cargo fix --edition
No manual changes made. The result of the command above is just to fix
all the use statements (add `crate::`) and fix the fully qualified path
formats i.e., `::Foo` -> `crate::Foo`.
We do this all over the place in rust-lightning, and its probably
the most common thing to do with a `Witness` so I figured I'd
upstream the util method to do this. It also avoids an allocation
compared to the naive approach of `SerializedSignature.to_vec()`
with two pushes, which is nice.
In this library we specifically do not use rustfmt and tend to favour
terse statements that do not use extra lines unnecessarily. In order to
help new devs understand the style modify code that seems to use an
unnecessary number of lines.
None of these changes should reduce the readability of the code.
I think it is more natural to write Wintess::new() followed by Witness::push()
then Witness::default(). In any case, there is no harm in having additional constructors.
Witness struct is in place of the Vec<Vec<u8>> we have before this commit.
from_vec() and to_vec() methods are provided to switch between this type and Vec<Vec<u8>>
Moreover, implementation of Default, Iterator and others allows to have similar behaviour but
using a single Vec prevent many allocations during deserialization which in turns results in
better performance, even 20% better perfomance on recent block.
last() and second_to_last() allows to access respective element without going through costly Vec
transformation