Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add filter_policy to pgvector integration #820

Merged
merged 22 commits into from
Jul 5, 2024

Conversation

vblagoje
Copy link
Member

@vblagoje vblagoje commented Jun 19, 2024

Why:

Adds flexible filtering options to pgvector integration. By introducing a filter policy option (replace or merge), developers can now control how runtime filters are applied relative to initialization time filters.

What:

  • Added filter_policy parameter with options replace and merge across multiple retrievers to control filter behavior dynamically.
  • Unit tests were updated or added to validate the new functionality.

How can it be used:

  • Dynamic Filter Behavior Adjustment: Users can decide whether to completely override the initial filters set during the retriever's initialization (replace) or merge them with runtime filters, with the latter taking precedence (merge).

  • Complex Search Scenarios:

    • In cases where the context of a query might dictate altering pre-set filters without discarding them, the merge option allows for an additive approach.
    • For strict query contexts that require ignoring initial filters, the replace option offers a clean slate for filters at runtime.

How did you test it:

  • Unit tests were enhanced or newly created to cover both replace and merge scenarios for the filter_policy parameter.
  • Tests ensure that filter logic is correctly applied based on the policy setting, whether it merges runtime filters with initial filters or replaces them entirely.
  • Additional test cases should be considered for complex filter merge scenarios to ensure priority and override mechanisms function as expected.

@vblagoje vblagoje requested a review from a team as a code owner June 19, 2024 12:42
@vblagoje vblagoje requested review from anakin87 and removed request for a team June 19, 2024 12:42
@github-actions github-actions bot added integration:pgvector type:documentation Improvements or additions to documentation labels Jun 19, 2024
@@ -139,7 +144,11 @@ def run(

:returns: List of Documents similar to `query_embedding`.
"""
filters = filters or self.filters
if self.filter_policy == "merge" and filters:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the docstrings of the run method, I would mention that the way runtime filters are applied depends on the filter_policy chosen at document store initialization.

@vblagoje
Copy link
Member Author

Good feedback, thanks @anakin87 , will apply across the board

vblagoje and others added 3 commits June 19, 2024 16:26
Copy link
Member

@anakin87 anakin87 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also add a test as requested/done in #823 (comment)?

@vblagoje
Copy link
Member Author

Could you also add a test as requested/done in #823 (comment)?

Yes, already on it - for all PRs 😓

@anakin87
Copy link
Member

Could you also add a test as requested/done in #823 (comment)?

Yes, already on it - for all PRs 😓

I understand your pain.
Perhaps you could create a utility function to handle filters in the Haystack repository and test it only there once and for all.

@anakin87 anakin87 self-requested a review June 20, 2024 08:37
@vblagoje
Copy link
Member Author

@anakin87 we need to use the already released https://github.com/deepset-ai/haystack/blob/main/haystack/document_stores/types/filter_policy.py instead of literal, across the board as well. I'll add that change here.

@vblagoje
Copy link
Member Author

Please delay review until a new patch release of 2.2.x is released. We need these two PRs to be included in a patch release first.

@vblagoje
Copy link
Member Author

vblagoje commented Jul 4, 2024

Ready for review now @anakin87

  • Since you last looked at this PR we introduced apply_filter_policy function that is used by all Retrievers in this PR and we thoroughly tested that function in that PR. We then just applied it here in run method
  • Please note if FilterPolicy type is used consistently both in init and in run, and that it is documented well

@vblagoje
Copy link
Member Author

vblagoje commented Jul 5, 2024

Stefano recommended one more small improvement. Please hold your review until a new commit with it is added here

Copy link
Member

@anakin87 anakin87 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please incorporate the suggestions and merge 🙂

vblagoje and others added 5 commits July 5, 2024 11:20
…rievers/pgvector/keyword_retriever.py

Co-authored-by: Stefano Fiorucci <[email protected]>
…rievers/pgvector/embedding_retriever.py

Co-authored-by: Stefano Fiorucci <[email protected]>
…rievers/pgvector/keyword_retriever.py

Co-authored-by: Stefano Fiorucci <[email protected]>
@vblagoje
Copy link
Member Author

vblagoje commented Jul 5, 2024

@anakin87 one last look 🙏

Copy link
Member

@anakin87 anakin87 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good

@vblagoje vblagoje merged commit cf792d7 into main Jul 5, 2024
7 checks passed
@vblagoje vblagoje deleted the filter-policy-pgvector branch July 5, 2024 10:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
integration:pgvector type:documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add filter_policy init parameter to all retrievers
2 participants