diff --git a/early_research_code/python-bloom-filter-util/README.md b/early_research_code/python-bloom-filter-util/README.md index 03274ef..35387e8 100644 --- a/early_research_code/python-bloom-filter-util/README.md +++ b/early_research_code/python-bloom-filter-util/README.md @@ -13,7 +13,7 @@ thirdaddress The resulting bloom filter can be "checked against" with an address, and will respond whether that address exists in the bloom filter set or not. -It's important to keep in mind that bloom filters are probabilistic data structures and as such result in false positives usually at a rate of ~1%, which can be adjusted for by increasing the data set size, but at typical parameters which result from an optimized bloom filter, balancing false positives and size, 1% is the usual rate we encounter. +It's important to keep in mind that bloom filters are probabilistic data structures and as such result in false positives at a certain rate, which can be adjusted for by increasing the data set size. Adjust this depending on your workload. If you check millions or billions or addresses against a filter and cannot tolerate more than a few false positives, we recommend setting an appropriately small false positive factor. ## Generate bloom filter `python bloom-util.py create --filter_file filter.pkl --addresses_file addresses.txt` @@ -31,6 +31,9 @@ $ Address fourthaddress is not in the filter This is experimental, unmaintained code. Use only as research inspiration. +Specifically, we make no security guarantees. +Deserializing malicious filters may be problematic, for example. + ## License Licensed under either of `Apache License, Version 2.0` or `MIT` license at your option.