-
-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
What to do with searchsorted* functions #24883
Labels
domain:search & find
The find* family of functions
Comments
This was referenced Dec 3, 2017
#25133 moves these functions to the |
This was referenced Dec 31, 2017
Looks like the general opinion is that we should keep these functions as they are. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
As part of #10593, I have tried to find a way to unify
searchsorted
,searchsortedfirst
andsearchsortedlast
with the rest offind*
functions. In the Search & Find Julep, I suggested replacing them withfind*
methods dispatching on aSorted
/SortedVector
wrapper which would indicate that the input vector is sorted (see also JuliaCollections/DataStructures.jl#290 for a similar but more powerful type).For
searchsorted
itself, this plan would work quite well. It could be replaced withfind(equalto(needle), Sorted(haystack))
, which would just call a specialization offind
which would return a range instead of aVector
, while still following the genericAbstractArray
API.The situation is more complex for
searchsortedfirst
andsearchsortedlast
. I'll detail the problems for the former, knowing that they are equivalent for the latter.searchsortedfirst
returns "the first value in a greater than or equal to x, according to the specified order". With the introduction of theSorted
wrapper, the specified order is the one passed when constructing the wrapper, which indicates how the underlying vector is sorted. So far, so good. But what kind of predicate can we use to reflect this order?searchsortedfirst(needle, haystack)
withfindfirst(greaterthan(needle), Sorted(haystack))
, but thengreaterthan
cannot have any meaning on its own: it would depend on the order used byhaystack
, and "greater" could even mean "lower" ifrev=true
.findfirst(sortedafter(needle), Sorted(haystack))
, which would better reflect that the predicate has no intrinsic meaning. But then we lose the generality of the syntax: it cannot be used on an unsorted vector to find the first value greater thanx
, which defeats the idea of unifying functions.Then there are other subtleties, but these are less serious:
greaterthan
should actually begreaterthanorequalto
andsortedafter
besortedatorafter
to accurately reflect the behavior of the function.searchsortedfirst
has an internal variant which allow passing a start and end index indicating the range to be searched, butfindfirst
only accepts a start index. This feature is used by the sparse matrix code. It could be replaced with array views, or we could just have a method which is more flexible than the otherfindfirst
methods.searchsortedfirst(x)
returnslength(x)+1
when no match is found, whilefindfirst
currently returns0
, and may well returnnothing
in the future (Find Julep: issue with sentinel values Juleps#47). This doesn't sound like an issue, as the caller can easily replace0
ornothing
withlength(x)+1
without a significant performance cost if needed.I would appreciate any help, especially from people who use these functions (I don't). If fixing this in time for 0.7 is too complex, we could unexport these functions (which are needed for sparse matrices), and move them e.g. to SortingAlgorithms.
The text was updated successfully, but these errors were encountered: