Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add intersect(::AbstractRange, ::AbstractVector) #41769

Merged
merged 1 commit into from
Aug 29, 2021

Conversation

barucden
Copy link
Contributor

@barucden barucden commented Aug 3, 2021

Closes #41759

@barucden
Copy link
Contributor Author

barucden commented Aug 3, 2021

@IanButterworth Cannot refer to AbstractVector from range.jl. Should I just move the definitions to array.jl (originally I thought it is more fitting in range.jl)?

@IanButterworth
Copy link
Member

Moving to array.jl sounds good. Perhaps stylistically you could flip the args so the first arg makes most sense for the new location, but that's very minor

function intersect(vec::AbstractVector, r::AbstractRange)
    common = Iterators.filter(x -> x  r, vec)
    seen = Set{eltype(vec)}(common)
    return vectorfilter(_shrink_filter!(seen), common)
end
intersect(r::AbstractRange, vec::AbstractVector) = intersect(vec, r)

@barucden
Copy link
Contributor Author

barucden commented Aug 3, 2021

Done. Thanks!

@IanButterworth
Copy link
Member

Just copying the benchmarks over from #41759


julia> @btime intersect(1:typemax(Int), [1,3])
  302.226 ns (8 allocations: 688 bytes)
2-element Vector{Int64}:
 1
 3

julia> @btime Base.intersect([1,2,3,4,5], [1,3])
  654.689 ns (15 allocations: 1.11 KiB)
2-element Vector{Int64}:
 1
 3

julia> @btime Base.intersect(1:typemax(Int), 1:2:3)
  0.044 ns (0 allocations: 0 bytes)
1:2:3

@simeonschaub
Copy link
Member

We don't currently define an explicit method for intersect(::AbstractRange, ::AbstractRange), so you will need to define that as well to avoid ambiguities.

@oscardssmith
Copy link
Member

Something interesting is that this could be about 2x faster if we used a Set implimentation closer to the one in Dictionaries.jl's. specifically of you use the version of a set that uses insertion order, this could just return the elements of the set as a list without an extra allocation.

@barucden
Copy link
Contributor Author

barucden commented Aug 3, 2021

We don't currently define an explicit method for intersect(::AbstractRange, ::AbstractRange), so you will need to define that as well to avoid ambiguities.

@simeonschaub I don’t immediately see what would be the approach. Should it be just a fall-back method with a naive implementation? Like looping through the shorter range and checking if each element is in the longer range? Also, what ambiguities can occur now that could not occur before this method?

Something interesting is that this could be about 2x faster if we used a Set implimentation closer to the one in Dictionaries.jl's. specifically of you use the version of a set that uses insertion order, this could just return the elements of the set as a list without an extra allocation.

@oscardssmith Thanks for the suggestion. Is there a set implementation which guarantees the insertion order in Julia? I could not find it. Or do you propose to write it first (maybe based on the mentioned Dictionaries.jl package)?

@simeonschaub
Copy link
Member

Should it be just a fall-back method with a naive implementation? Like looping through the shorter range and checking if each element is in the longer range?

Yes, seems reasonable.

Is there a set implementation which guarantees the insertion order in Julia?

No, not currently. That would be a much bigger project, so I wouldn't bother for now.

@barucden
Copy link
Contributor Author

barucden commented Aug 4, 2021

Rebased and added intersect(::AbstractRange, ::AbstractRange). @simeonschaub Would you check if this test is okay, please?

base/range.jl Outdated Show resolved Hide resolved
@barucden
Copy link
Contributor Author

barucden commented Aug 4, 2021

There's still one more error but I am not sure it's related.

base/range.jl Outdated Show resolved Hide resolved
@IanButterworth
Copy link
Member

@simeonschaub @oscardssmith is this RTM?

@oscardssmith
Copy link
Member

No objections here.

@IanButterworth
Copy link
Member

IanButterworth commented Aug 10, 2021

Some return type comparison and benchmarks vs. master

using BenchmarkTools
for a in [1:4, 1:1.0:4, 10:4:30, collect(1:4)]
    for b in [2:4, 2:1.0:4, 14:4:30, collect(2:6)]
        r = Base.intersect(a, b)
        print(a, "\t", b, "\t", typeof(r),"\t", r, "\t")
        @btime Base.intersect($a, $b)
    end
end
a b Master return PR Return Master PR
1:4 2:4 2:4 2:4 0.010 ns (0 allocations: 0 bytes) 0.010 ns (0 allocations: 0 bytes)
1:4 2:1.0:4 [2, 3, 4] [2.0, 3.0, 4.0] 376.512 ns (16 allocations: 1.12 KiB) 66.507 ns (2 allocations: 144 bytes)
1:4 14:4:30 14:4:13 14:4:13 0.010 ns (0 allocations: 0 bytes) 0.010 ns (0 allocations: 0 bytes)
1:4 [2,3,4,5,6] [2, 3, 4] [2, 3, 4] 364.560 ns (14 allocations: 1.03 KiB) 156.247 ns (7 allocations: 608 bytes)
1:1.0:4 2:4 [2.0, 3.0, 4.0] [2.0, 3.0, 4.0] 422.495 ns (16 allocations: 1.16 KiB) 72.938 ns (2 allocations: 144 bytes)
1:1.0:4 2:1.0:4 [2.0, 3.0, 4.0] [2.0, 3.0, 4.0] 416.785 ns (15 allocations: 1.12 KiB) 87.419 ns (2 allocations: 144 bytes)
1:1.0:4 14:4:30 Float64[] Float64[] 406.565 ns (15 allocations: 1.08 KiB) 44.098 ns (1 allocation: 64 bytes)
1:1.0:4 [2,3,4,5,6] [2.0, 3.0, 4.0] [2, 3, 4] 421.245 ns (14 allocations: 1.06 KiB) 234.930 ns (7 allocations: 608 bytes)
10:4:30 2:4 10:4:9 10:4:9 0.010 ns (0 allocations: 0 bytes) 0.010 ns (0 allocations: 0 bytes)
10:4:30 2:1.0:4 Int64[] Float64[] 401.403 ns (16 allocations: 1.12 KiB) 41.474 ns (1 allocation: 64 bytes)
10:4:30 14:4:30 14:4:30 14:4:30 23.391 ns (0 allocations: 0 bytes) 21.577 ns (0 allocations: 0 bytes)
10:4:30 [2,3,4,5,6] Int64[] Int64[] 402.547 ns (14 allocations: 1.03 KiB) 88.290 ns (5 allocations: 464 bytes)
[1,2,3,4] 2:4 [2, 3, 4] [2, 3, 4] 335.186 ns (13 allocations: 976 bytes) 157.491 ns (7 allocations: 608 bytes)
[1,2,3,4] 2:1.0:4 [2, 3, 4] [2, 3, 4] 345.344 ns (13 allocations: 1008 bytes) 228.194 ns (7 allocations: 608 bytes)
[1,2,3,4] 14:4:30 Int64[] Int64[] 332.333 ns (13 allocations: 976 bytes) 84.778 ns (5 allocations: 464 bytes)
[1,2,3,4] [2,3,4,5,6] [2, 3, 4] [2, 3, 4] 333.701 ns (12 allocations: 944 bytes) 341.855 ns (12 allocations: 944 bytes)

I didn't expect this PR to speed this up so much!

# master
julia> @btime intersect(1:1.0:4, 2:4)
  407.815 ns (16 allocations: 1.16 KiB)
3-element Vector{Float64}:
 2.0
 3.0
 4.0

# PR
julia> @btime intersect(1:1.0:4, 2:4)
  67.816 ns (2 allocations: 144 bytes)
3-element Vector{Float64}:
 2.0
 3.0
 4.0

@IanButterworth
Copy link
Member

Is it acceptable that the return type has changed in some of the examples above?

@oscardssmith
Copy link
Member

Imo, yes, but I'll put a triage label on it for good measure.

@oscardssmith oscardssmith added the triage This should be discussed on a triage call label Aug 11, 2021
@JuliaLang JuliaLang deleted a comment from IanButterworth Aug 11, 2021
@rfourquet
Copy link
Member

Is it acceptable that the return type has changed in some of the examples above?

Yes would be worth triaging, currently returning something of the same eltype as the first argument doesn't seem documented but might be a useful property? But if needed, it seems simple enough to update this PR to preserve this.

@IanButterworth
Copy link
Member

IMO I think it's ok as it's the type promotion of the eltype of the two args, which makes sense, and that could just be marked with a !!! compat

@barucden
Copy link
Contributor Author

Just a note: the return type has changed for the AbstractRange-AbstractVector version too, and there's no promotion -- the vector's eltype determines the result eltype. From Ian's table:

a b Master return PR Return
1:1.0:4 [2,3,4,5,6] [2.0, 3.0, 4.0] [2, 3, 4]

Someone might dislike this inconsistency (promotion & no promotion) between the AbstractRange-AbstractRange and AbstractRange-AbstractVector version.

base/array.jl Outdated Show resolved Hide resolved
base/array.jl Outdated Show resolved Hide resolved
@JeffBezanson
Copy link
Member

From triage: 👍 this is good. We think all the methods (of intersect and union) should be consistent with return type, so they should all promote.

@JeffBezanson JeffBezanson removed the triage This should be discussed on a triage call label Aug 12, 2021
@IanButterworth
Copy link
Member

Ping for review & merge @oscardssmith @simeonschaub @rfourquet

@vchuravy
Copy link
Member

Needs a rebase and a squash of all commits :)

@barucden barucden force-pushed the intersect-range-vector branch 3 times, most recently from b95c68e to 59a7817 Compare August 28, 2021 13:25
Also adds:
 - `intersect(::AbstractRange, ::AbstractRange)`
 - `intersect(::AbstractRange, ::AbstractRange)`

Closes JuliaLang#41759

Co-authored-by: Ian Butterworth <[email protected]>
Co-authored-by: Jeff Bezanson <[email protected]>
@IanButterworth
Copy link
Member

The FreeBSD failure appears unrelated

@IanButterworth IanButterworth merged commit 897c8e6 into JuliaLang:master Aug 29, 2021
@barucden
Copy link
Contributor Author

Thank you all for your help.

@barucden barucden deleted the intersect-range-vector branch August 29, 2021 19:13
@IanButterworth
Copy link
Member

No, thank you @barucden! 👍🙂

@vtjnash
Copy link
Member

vtjnash commented Sep 3, 2021

Seems like this might have destroyed performance of intersect on a BitSet: https://github.com/JuliaCI/NanosoldierReports/blob/master/benchmark/by_date/2021-09/03/report.md. I think because we used to know that the intersection of a BitSet and anything else must fit into a BitSet, so we would prefer the narrower type (since we know the type-intersection is always sufficient), while now we copy it into a Set (i.e. always preferring the wider type).

Would you be willing to look into that?

@barucden
Copy link
Contributor Author

barucden commented Sep 9, 2021

If my understanding is correct, you propose to make a special method intersect(s1::BitSet, s2) and make it return BitSet instead of Set{promote_type(eltype(s1), eltype(s2))}. I think this is related to what @rfourquet mentioned.

Wouldn't it, however, violate the decision

We think all the methods (of intersect and union) should be consistent with return type, so they should all promote.

from triage?

@IanButterworth
Copy link
Member

Seems like this needs some more triage given #41769 (comment)

@IanButterworth IanButterworth added the triage This should be discussed on a triage call label Sep 16, 2021
intersect(s::AbstractSet, itr, itrs...) = intersect!(intersect(s, itr), itrs...)
function intersect(s::AbstractSet, itr, itrs...)
T = promote_eltype(s, itr, itrs...)
return intersect!(Set{T}(s), itr, itrs...)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bug --- it means we give a different container type for 3 arguments than 2 (and a different type than 1.7).
On 1.7:

julia> intersect(BitSet([1,2]), [1,2], [2])
BitSet with 1 element:
  2

julia> intersect(BitSet([1,2]), [1,2])
BitSet with 2 elements:
  1
  2

on master:

julia> intersect(BitSet([1,2]), [1,2], [2])
Set{Int64} with 1 element:
  2

julia> intersect(BitSet([1,2]), [1,2])
BitSet with 2 elements:
  1
  2

barucden added a commit to barucden/julia that referenced this pull request Sep 27, 2021
A bug was introduced for 3 arguments version of `intersect` in JuliaLang#41769.
The container type always changed to `Set`:

```
julia> intersect(BitSet([1,2]), [1,2], [2])
Set{Int64} with 1 element:
  2
```

This is an attempt to return to the original behavior:

```
julia> intersect(BitSet([1,2]), [1,2], [2])
BitSet with 1 element:
  2
```
barucden added a commit to barucden/julia that referenced this pull request Oct 2, 2021
A bug was introduced for 3 arguments version of `intersect` in JuliaLang#41769.
The container type always changed to `Set`:

```
julia> intersect(BitSet([1,2]), [1,2], [2])
Set{Int64} with 1 element:
  2
```

This is an attempt to return to the original behavior:

```
julia> intersect(BitSet([1,2]), [1,2], [2])
BitSet with 1 element:
  2
```
barucden added a commit to barucden/julia that referenced this pull request Oct 2, 2021
A bug was introduced for 3 arguments version of `intersect` in JuliaLang#41769.
The container type always changed to `Set`:

```
julia> intersect(BitSet([1,2]), [1,2], [2])
Set{Int64} with 1 element:
  2
```

This is an attempt to return to the original behavior:

```
julia> intersect(BitSet([1,2]), [1,2], [2])
BitSet with 1 element:
  2
```
fredrikekre pushed a commit that referenced this pull request Oct 10, 2021
A bug was introduced for 3 arguments version of `intersect` in #41769.
The container type always changed to `Set`:

```
julia> intersect(BitSet([1,2]), [1,2], [2])
Set{Int64} with 1 element:
  2
```

This is an attempt to return to the original behavior:

```
julia> intersect(BitSet([1,2]), [1,2], [2])
BitSet with 1 element:
  2
```
@oscardssmith oscardssmith removed the triage This should be discussed on a triage call label Feb 3, 2022
LilithHafner pushed a commit to LilithHafner/julia that referenced this pull request Feb 22, 2022
LilithHafner pushed a commit to LilithHafner/julia that referenced this pull request Feb 22, 2022
A bug was introduced for 3 arguments version of `intersect` in JuliaLang#41769.
The container type always changed to `Set`:

```
julia> intersect(BitSet([1,2]), [1,2], [2])
Set{Int64} with 1 element:
  2
```

This is an attempt to return to the original behavior:

```
julia> intersect(BitSet([1,2]), [1,2], [2])
BitSet with 1 element:
  2
```
LilithHafner pushed a commit to LilithHafner/julia that referenced this pull request Mar 8, 2022
LilithHafner pushed a commit to LilithHafner/julia that referenced this pull request Mar 8, 2022
A bug was introduced for 3 arguments version of `intersect` in JuliaLang#41769.
The container type always changed to `Set`:

```
julia> intersect(BitSet([1,2]), [1,2], [2])
Set{Int64} with 1 element:
  2
```

This is an attempt to return to the original behavior:

```
julia> intersect(BitSet([1,2]), [1,2], [2])
BitSet with 1 element:
  2
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

intersect could be more efficient for mixtures of UnitRange and Vector
9 participants