Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Search #18

Open
gryphonmyers opened this issue May 20, 2020 · 7 comments
Open

Search #18

gryphonmyers opened this issue May 20, 2020 · 7 comments

Comments

@gryphonmyers
Copy link

On the topic of Complex Queries, as described in the wiki, I think this is fairly important feature to include in any library api, as people tend to need to search for their data. With no form of partial matching available, it may be difficult to provide a good user experience as a client would be limited to predefined options, or rigid search behavior that is only able to render exact matches. I do also agree with the decision to keep the query syntax simple and rooted in basic HTTP syntax though.

How would you feel about supporting basic globbing syntax in query values (or even just the Kleene Star, as that would be the most useful) in field values?

Pros:

  • Easily understood
  • Easy to implement

Cons:

  • Rather limited functionality
  • Character encoding complexity
    • Support for globbing characters would need to be accompanied by a percent encoding recommendation / requirement.
    • How to distinguish between wildcards and literal characters?
@sampsyo
Copy link
Member

sampsyo commented May 20, 2020

Yes, this is a great point. It does seem like we need some way to do basic search—it will be hard to build useful interfaces on top of the API otherwise.

To think about some other options here:

  • Something like the beets query syntax. It's flexible and extensible—you can add new "kinds" of queries to the same syntax easily, and it naturally extends to full Boolean logic. However, it was designed to be convenient for humans to write and is probably not a good choice for an API.
  • Just substring queries. That is, you specify a list of fields and strings that the fields must contain. Basically the same as globbing where every search looks like *this*. The advantage would be that there's no "in-band signaling": clients would not, for example, need to put the wildcards in themselves or figure out how to escape wildcards in the string that a user typed. The disadvantage, of course, is that it's less flexible; for example, you can't match at the beginning or end of a string.
  • Something a bit more extensible. For example, we could provide arbitrary query "types" that could be extended in the future—and only specify basic case-insensitive substring queries for now. Then, if globs seem useful to do on top of this, then we could standardize new types.

I'm starting to think that starting simple (just substrings) and building in a path for extensibility in the future might be the wise way to go. Does that make sense?

@gryphonmyers
Copy link
Author

gryphonmyers commented May 20, 2020

Yes, this is a great point. It does seem like we need some way to do basic search—it will be hard to build useful interfaces on top of the API otherwise.

To think about some other options here:

  • Something like the beets query syntax. It's flexible and extensible—you can add new "kinds" of queries to the same syntax easily, and it naturally extends to full Boolean logic. However, it was designed to be convenient for humans to write and is probably not a good choice for an API.
  • Just substring queries. That is, you specify a list of fields and strings that the fields must contain. Basically the same as globbing where every search looks like *this*. The advantage would be that there's no "in-band signaling": clients would not, for example, need to put the wildcards in themselves or figure out how to escape wildcards in the string that a user typed. The disadvantage, of course, is that it's less flexible; for example, you can't match at the beginning or end of a string.
  • Something a bit more extensible. For example, we could provide arbitrary query "types" that could be extended in the future—and only specify basic case-insensitive substring queries for now. Then, if globs seem useful to do on top of this, then we could standardize new types.

I'm starting to think that starting simple (just substrings) and building in a path for extensibility in the future might be the wise way to go. Does that make sense?

This makes a lot of sense. Do you have specific ideas about how to go about supporting new query types? I suppose it could just be a matter of using different param keys, e.g. ?filter[artist]=Blue performs an exact match, ?search[artist]=Blue does a substring search (*Blue*), then perhaps something like ?beetsquery[artist]=Blue exposes the full beets query syntax. To me, this sounds like a clean solution as the scope of each param is clearly defined and separated, and allowed to establish its own input restrictions. Also less potential for breaking changes down the road, because the feature set for each param should remain fixed even if new query behaviors are added to the spec.

@sampsyo
Copy link
Member

sampsyo commented May 21, 2020

Sure; that seems cool! We could even consider keeping the filter namespace constant and just adding "qualifiers" to the field names to get different behavior, i.e., ?filter[artist]=Blue for exact matches and ?filter[search:artist]=Blue or similar for substring queries. This would perhaps simplify the client and server logic a bit—if you want to know all the criteria to use for filtering, just gather up all the filter keys (as distinct, for example, from the sort keys).

@govynnus
Copy link
Member

govynnus commented Sep 2, 2020

I've been thinking a bit about AURA client UIs recently and realised that most clients will probably want a single main search box. This makes me think that we could have something like ?search=foo that looks at all fields and matches substrings, like @gryphonmyers' suggestion but without specifying a field. To get results for all of tracks, albums and artists would still involve the client making 3 separate requests, but that probably isn't too bad.

I quite like the idea of filtering and searching being a bit separate, rather than search being a qualifier of filter. You could have a client that works just on filtering a whole collection (tracks, albums or artists) until the user gets what they want, or a client that gets a list of search results and then allows the user to filter (kind of like shopping websites).

I'm also wondering about the possibility of allowing optional regular expressions for filters/searches for people who want more control over queries, or to allow clients to decide where wildcards should go. However the latter would raise the problem of escaping user input as @sampsyo mentioned earlier. Also I feel like regex might be quite complicated in terms of having a standard syntax.

@gryphonmyers
Copy link
Author

gryphonmyers commented Sep 2, 2020 via email

@govynnus
Copy link
Member

govynnus commented Sep 2, 2020

I think in a URL parameter you could pass it through encodeURIComponent(), but I haven't actually tried it. Like you say the big issue would be consistent implementation, which makes me think it's not such a good idea.

I agree at least basic substring and case-insensitive matching is needed on filters. If the client only cared about certain fields then they could do something like ?filter[substring:title]=sub&filter[substring:artist]=sub&filter[....., but doing that for all fields seems a bit unwieldy. It is a good point that some fields don't need to be searched (like integer fields, musicbrainz ids and mimetypes), which would leave 7 'searchable' fields for tracks, 3 for albums and 1 for artists.

Of course matching 7 fields rather than, say, 3 is going to be more expensive but it's very easy for the server to extract the required information from the URL. For filters the server needs to look through each parameter, see if it matches the filter[...] pattern, and possibly figure out if it's substring, case-insensitive, or something else. Also probably a lot of back-ends will have some kind of in-built ability to match multiple fields at once, but I don't know how much of a difference that makes.

Looking forward to your ideas.

@sampsyo
Copy link
Member

sampsyo commented Sep 2, 2020

One option might be to have two separate options: a standard query interface using filter[title]=..., etc., that essentially encode SQL "WHERE" clauses, and a separate search that is much fuzzier—it could match all fields and use case-insensitive substrings, but it could also attempt to do an implementation-defined "smart" search that guesses what the user was really after. The former would have a clearly defined meaning; the semantics of the latter would be undefined and left up to the server to allow variability in how fuzzy searching works. Would that make sense?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants