Feature: Excluding vectors from search #40

iNDicat0r · 2017-03-14T21:23:28Z

Imagine a set of 1000 vectors, each vector is linked to an entity(images in my case) and lets say i perform a knn lookup with k = 5, now i would like to examine the scenario of dynamically excluding vectors based on some criteria.

Is this even possible?

iNDicat0r · 2017-03-14T21:31:45Z

An example would be:
12 cars, 4 red(2auto, 2 manual), 4 blue(2auto, 2 manual), 4 green (2auto, 2 manual).

My query vector is actually a manual car but i would like to exclude the red ones from search.

mdouze · 2017-03-14T21:38:43Z

Hi!

First, I think it is misleading to use semantic analogies. Faiss is not a SQL database engine with symbolic representations like red / manual

There is no easy way of restricting elements from the search. The reason is because Faiss mainly relies on scanning strings of codes and computing distances. During the scan, it accesses the corresponding ID only if the code corresponds to a potentially interesting item. Skipping over some of the codes would require to do some ID-dependent operation, which slows the processing down.

The most sensible thing to do is to query Faiss with a larger k than needed and filter out the irrelevant results post-hoc.

jegou · 2017-03-14T21:40:34Z

@iNDicat0r: this is currently not implemented, since this would probably require some specific callback function to test whether an entry satisfies the external constraints, which could be of any kind and therefore out of the scope of a core library.

We already have encountered this problem, and discussed about two possible turn-around strategies:

requesting a larger k' (>>k), and then filtering the short-list of size k. As mentioned Matthijs, this should not affect too much the speed if the requested k' << N, since our heap is quite efficient.
you can create multiple indexes, each per condition. Obviously this can work only for specific conditions. A typical example for which this works is when you want to filter based on the date (you just need to create an index per day).

iNDicat0r · 2017-03-14T21:54:22Z

Gentlemen thanks for the quick responses. I completely understand that we are not talking about b-trees and sql like searches here, however there are scenarios where accuracy might be more important than efficiency. In any case the multiple index approach sounds applicable in my case(doing a classification first and use the classification label as an index???!!)

qinjian623 · 2021-02-01T03:33:14Z

Same scenario like @iNDicat0r .
We got different attributes from data.
With pre-building multiple indexes, it may save more GPU memo.

Otherwise, we need build [ number of attributes * categories of each attribute ] indexes.

For example,
Car with attributes:

brand = {Toyota, Ford ...}
color = {Blue, White ...}
type = {truck, sedan, suv ...}

Sometimes, we only want all Toyotas, or all blue ones.

mdouze closed this as completed Mar 14, 2017

mdouze added the wontfix label Mar 14, 2017

tiru1930 mentioned this issue Mar 7, 2018

installation error inlining failed in call to always_inline ‘__m128 _mm_hadd_ps(__m128, __m128)’: target specific option mismatch #360

Closed

1 task

mdouze mentioned this issue Jul 19, 2018

can faiss build index with someting like category id? #525

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature: Excluding vectors from search #40

Feature: Excluding vectors from search #40

iNDicat0r commented Mar 14, 2017

iNDicat0r commented Mar 14, 2017

mdouze commented Mar 14, 2017

jegou commented Mar 14, 2017

iNDicat0r commented Mar 14, 2017

qinjian623 commented Feb 1, 2021 •

edited

Loading

Feature: Excluding vectors from search #40

Feature: Excluding vectors from search #40

Comments

iNDicat0r commented Mar 14, 2017

iNDicat0r commented Mar 14, 2017

mdouze commented Mar 14, 2017

jegou commented Mar 14, 2017

iNDicat0r commented Mar 14, 2017

qinjian623 commented Feb 1, 2021 • edited Loading

qinjian623 commented Feb 1, 2021 •

edited

Loading