Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add sbbf to benches #126

Closed
wants to merge 4 commits into from
Closed

add sbbf to benches #126

wants to merge 4 commits into from

Conversation

ozgrakkurt
Copy link

Hey!

I implemented a split block bloom filter and wanted to add it to benchmarks here to see how it performs.

I would be happy to add a safe wrapper around it and add it to the filters here and also add false positive rate benchmarks as well.

@crepererum
Copy link
Member

So I think we could eventually merge sbbf, however I have a couple of asks before we do so:

  1. move the safe impl from the benchmarks to an actual module
  2. make the dependency optional (and hence only build the module if the feature is enabled)
  3. I would be interested in the outcome of the benchmark run on your machine (mostly for my own interest)
  4. some documentation would be nice. Ideally you would write a short "how it works" (see other filter) but I see how this quickly gets really complicated, so if you don't have the time, at least write a few sentences and quote the paper / references that where used during the upstream implementation so that people know where to look
  5. could you release your crate under crates.io? This way we don't have to depend on some GIT dependency.

Side note: I would also be willing to accept the upstream code within this very repo here as some kind of donation, but I'm also OK if you wanna keep it freestanding. Not sure what your plans for your repo / crate are :)

Thanks for the interesting stuff btw 💪

@ozgrakkurt
Copy link
Author

Thanks for interest, I'll do the points 1,2, 4 and 5.

I have Ryzen 5900x on my pc and it is 10x as fast as the current partitioned bloom filter implementation in this repo for both inserts and queries. It is 2.5x as fast as HashSet for queries and 8.5x for inserts.

On macbook m1, it is about 10-12x as fast as the bloom filter on this repo. 20x as fast as HashSet on inserts and 4-5x on queries.

Not sure if it has the exact same false positive properties as that one, I just put some number that is supposed to produce something close. Also not sure about the ram usage.

I pretty much did it for hobby purposes, I plan to use it on other projects maybe integrate into parquet implementations since it has the same exact properties as the parquet spec.

@ozgrakkurt ozgrakkurt closed this by deleting the head repository Jul 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants