Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Remove find type assertion to allow other iterables #16110

Merged
merged 3 commits into from
May 4, 2016
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 5 additions & 4 deletions base/array.jl
Original file line number Diff line number Diff line change
Expand Up @@ -777,7 +777,7 @@ function findprev(testf::Function, A, start::Integer)
end
findlast(testf::Function, A) = findprev(testf, A, length(A))

function find(testf::Function, A::AbstractArray)
function find(testf::Function, A)
# use a dynamic-length array to store the indexes, then copy to a non-padded
# array for the return
tmpI = Array(Int, 0)
Expand All @@ -791,9 +791,10 @@ function find(testf::Function, A::AbstractArray)
I
end

function find(A::AbstractArray)
nnzA = countnz(A)
I = similar(A, Int, nnzA)
function find(A)
cA = typeof(A) <: AbstractArray ? A : collect(A)
Copy link
Member

@nalimilan nalimilan Apr 29, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think calling collect is a good idea. For example, find(1:10000000) is going to kill your computer, while the current code works perfectly fine. People will have to call collect manually if they want to work on iterables on which one can only pass over once.

So just remove the ::AbstractArray assertion and replace similar(Int, nnzA) with Vector{Int}(nnzA), and you should be fine.

(BTW: it's always better to avoid branches on types inside functions, and create a separate method which calls the more specific one after calling collect.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I had conditioned on the type to avoid unnecessarily collecting. (Side note: in your example, UnitRange <: AbstractArray, so it wouldn't get collected. But I get what you mean.) Great idea regarding Vector, I'll do that. Thanks so much for the feedback! 😄

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Side note: in your example, UnitRange <: AbstractArray, so it wouldn't get collected. But I get what you mean.)

Good catch. Anyway, you get the idea.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have to get in my daily dose of pedantry somehow 😄

nnzA = countnz(cA)
I = similar(cA, Int, nnzA)
count = 1
for (i,a) in enumerate(A)
if a != 0
Expand Down
8 changes: 8 additions & 0 deletions test/arrayops.jl
Original file line number Diff line number Diff line change
Expand Up @@ -362,6 +362,14 @@ a = [0,1,2,3,0,1,2,3]
@test findprev(isodd, [2,4,5,3,9,2,0], 7) == 5
@test findprev(isodd, [2,4,5,3,9,2,0], 2) == 0

# find with general iterables
s = "julia"
@test find(s) == [1,2,3,4,5]
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test conflicts directly with #16024. I also think that this is bizarre behavior:

julia> find("foo\0bar")
6-element Array{Int64,1}:
 1
 2
 3
 5
 6
 7

Sure, that's what falls out of the definition, but how is this a useful or meaningful operation?

Copy link
Sponsor Member

@StefanKarpinski StefanKarpinski May 6, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential solutions:

  • change a != 0 to !isequal(a, 0); won't find -0.0
  • change a != 0 to !(isequal(a, 0) | isequal(a, -0.0)); weird and maybe inefficient
  • change a != 0 to a != convert(eltype(A), 0)

The last retains the bizarre behavior for strings of finding the indices of all characters but '\0'; the former two would at least lose that weird behavior by definition, which is nice, I guess.

Copy link
Member Author

@ararslan ararslan May 6, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah crap, I didn't realize this was getting the non-null indices; I thought it was getting all indices. (Though in retrospect it should have been obvious that it would do this. 😕) I definitely should have added a test for that. This behavior isn't what I had intended.

Would it make sense to add a separate method for strings that just does something like collect(1:endof(s))? For things like Base.UTF8proc.GraphemeIterator it would still do the string-to-int comparison and return all indices. That would only become an issue if the NotComparableError thing happens.

What do you think about that?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I vote for either keeping the current behaviour, or adding a method just to raise an error. Do we have a use case for this (it didn't work until now...)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't have a use case in mind for find("string"); I was primarily interested in opening up find(testf::Function, A) for strings. Returning all indices for a string makes sense to me though, at least more so than returning indices of non-null bytes. I suppose that isn't so bad, just kind of weird.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but returning all valid indices isn't the definition of find, so I don't think it's a good idea to use it for that. If needed, we could imagine using the new function from #16260 for that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, find finds the indices of all numerically nonzero values; the only reason why it currently gets non-null indices for strings is that 0 == '\0' is true. If/when integer-character comparison goes away, it will fail for strings anyway. I'd be fine making find("string") a MethodError. What do you think, @StefanKarpinski?

@test find(c -> c == 'l', s) == [3]
g = graphemes("日本語")
@test find(g) == [1,2,3]
@test find(isascii, g) == Int[]

## findn ##

b = findn(ones(2,2,2,2))
Expand Down