Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: WhereDocument filter with $and, $or, $contains and $not_contains filters #96

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

iwilltry42
Copy link
Contributor

Top-Level WhereDocument structs are ANDed.
A WhereDocument can be a singular operator like $contains or $not_contains, which require a Value to be set.
$or and $and require nested WhereDocuments

I'm sure this could be done without a struct and using some map[string]any and reflect logic, but this feels cleaner, though breaking.

Let me know what you think!

Copy link
Owner

@philippgille philippgille left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoa nice, I didn't expect this! 👍

Adding support for more ways to filter is on my TODO list since the very beginning, but I didn't have the need for it yet, so I never looked into it more closely.

From a first look the approach seems good to me.

Regarding potential other approaches: What if the Query method allows passing just a string, which needs to be formatted as JSON, and internally we keep the structs exactly like you implemented, with JSON struct tags, so with one deserialize step before doing the validating and matching? It makes the filter a bit more Chroma-like (which is not a must, but nice to have). But the downside is that you could get the query completely wrong without knowing at compile time.
And I guess when making the deserializing easy, the JSON wouldn't be very concise. Like it could be {"op":"$and","docs":[{"op":"$contains", "val":"foo"}, {"op":"$not_contains", "val":"bar"}]} to query for docs that contain "foo" but not "bar".
To get rid of the fixed fields, and write only {"$and":[{"$contains":"foo"},{"$not_contains":"bar"}]} (nicely concise, very Chroma-like), it would require custom unmarshaling. But no reflection. And I think most/all of the logic that you implemented would stay exactly the same.

(Just an idea to discuss, not to request a change yet! I think using proper types also has its advantages)

@iwilltry42
Copy link
Contributor Author

Happy to surprise you 😅

I actually would vote against using a JSON string as a function parameter - I feel like accepting such is more on the user interface's side, e.g. in a CLI or whatever UI.
As a library/module I think it should use proper types.
Though we could provide a QueryJSON (I'm bad at naming) func that implements your suggestion.

But that's all up to you :)

FWIW I'm not sure about the Value and WhereDocuments fields - we could use any and accept either there to reduce it to a single field.
That would be more abstract.

@philippgille
Copy link
Owner

philippgille commented Sep 1, 2024

Sorry for the late response! The breaking change was holding me back from merging because all other changes since the last release were without breaking changes. And for releasing a version before merging, the decision regarding the Vertex options passing held me back. 😬

Both are done now, so I'm open to some breaking changes (not only from this PR but also other changes) in the next release.


Having a separate method for passing the query as JSON is a good idea 👍. I'll look into adding that as soon as this one is merged.
Regarding the Value and WhereDocuments as any I'll think a bit about it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants