-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Try reverse index for queries/linclust with bins #11
Comments
Linclust for sketchlib Find group of queries which share k-mer in a bin |
roaringbitmap might be appropriate here to store the samples in which each hash is present |
Another way of storing the data would be to have each sketch bin stored as a dictionary, with the key as the 14-bits of the bin value (not transposed) and values as the samples which had that bin. Then I think you could do a fast distance query for a new sample by finding matching bins and adding the values from each match.
I think the efficiency of the 'adding the values from each match' would determine whether this is faster or slower than the default method here. Starting with sparse vectors of integers (i.e. just those samples where there is a match) probably makes sense.
The text was updated successfully, but these errors were encountered: