Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't obtain results using Rust implementation #13

Open
paulbricman opened this issue Jul 8, 2021 · 2 comments
Open

Can't obtain results using Rust implementation #13

paulbricman opened this issue Jul 8, 2021 · 2 comments

Comments

@paulbricman
Copy link

paulbricman commented Jul 8, 2021

I'm roughly using the following code:

let query_emb: Vec<f32>;
let doc_emb: Vec<Vec<f32>>; // contains 3 document embeddings

...

let mut lsh = LshMem::new(10, 30, 512).srp().unwrap();
let _x = lsh.store_vecs(&doc_emb[..]);
let result = lsh.query_bucket(&query_emb).unwrap();
println!("lsh-rs: {:?}", result);

Unfortunately, the result is empty. I'm testing the same query and documents with ngt-rs and I get some results (I'm looking for an alternative to ngt-rs which runs on windows). Is this a problem of using better parameters?

@paulbricman
Copy link
Author

paulbricman commented Jul 8, 2021

It seems like it, messing with n_projections and n_hash_tables make it sometimes return results. Do you know of effective heuristics for choosing values for the two? I plan on working with 100-10000 candidate vectors of dimension 512, but was just testing with 3 of them.

@ritchie46
Copy link
Owner

ritchie46 commented Jul 11, 2021

Here is a presentation I have on the subject:
LSH.pdf

And a notebook with some theory notebook

Most important is understanding the gap amplification. The latest plot in the notebook. You can choose K and L and thereby tuning the collision probability for a certain similarity value.

P.S. you can play around with the python version of this crate in the notebook:

https://pypi.org/project/floky/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants