Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make setdiff not call setdiff! ? #34675

Closed
wants to merge 2 commits into from
Closed

Conversation

oxinabox
Copy link
Contributor

@oxinabox oxinabox commented Feb 5, 2020

Calling setdiff! on copy(x) seems like its not desirable.

@rfourquet
Copy link
Member

But the call to symdiff! maintains the semantics:

julia> symdiff([1, 2, 1])
[2]

So symdiff(s) is not equivalent to a copy.

base/abstractset.jl Outdated Show resolved Hide resolved
@oxinabox oxinabox changed the title setdiff(x) = copy(x) make setdiff not call setdiff! ? Feb 5, 2020
@rfourquet
Copy link
Member

symdiff(s) is currently also not equal to union(s)...

@oxinabox
Copy link
Contributor Author

oxinabox commented Feb 5, 2020

symdiff(s) is currently also not equal to union(s)...

Ok, I have no idea what setdiff with a single argument is supposed to do.

For that matter, i have no idea what union is suposed to do.
Both of these are binary operations.

@KristofferC
Copy link
Sponsor Member

Chesterton's fence says hello :P

@oxinabox
Copy link
Contributor Author

oxinabox commented Feb 5, 2020

Chesterton's fence says hello :P

😂
Chesterton and Cunningham would not have gotten on.

Though perhaps it is poor form of me to open PRs to get answers

@rfourquet
Copy link
Member

I have no idea what setdiff with a single argument is supposed to do.

The docstring of setdiff makes this quite clearly defined, but I realize we have conflated setdiff and symdiff!

For symdiff, there is also a logical reason for the current behavior, but it's far from explicit in the docstring (hint: it's a consequence of the "the multiplicity of elements matters"). Basically, as a set is the symmetric difference of all its element (taken individually as sets), the symmetric difference of multiple sets is equal to the symmetric difference of all the individual elements of all the sets. The end result is that you can put all the elements of all the sets in a multiset, and reject those which have even multiplicity, and set multiplicity 1 to the remaining ones. The behavior for non-set collections (in Julia) is just the generalisation of this same algorithms. There was not much discussions about the semantics we wanted for symdiff on non-set collections (which can have repeated elements), but at least it has an internal consistency and is not totally arbitrary.

@StefanKarpinski
Copy link
Sponsor Member

StefanKarpinski commented Feb 5, 2020

Oh man, that is most definitely not what I would expect symdiff on a set with multiple copies of the same value to do. Is there any other function set operation where the number of copies of a value in any one set matters? It violates the standard definition/identity for symdiff(A, B) which is

symdiff(A, B) == setdiff(A  B, A  B)

@StefanKarpinski
Copy link
Sponsor Member

By that same logic, shouldn't setdiff([1, 2, 2, 3], [2, 3]) produce [1, 2] instead of [1]?

@rfourquet
Copy link
Member

My impression is that the current choices are made for efficiency. If I'm not mistaken, the general rule for these set operations is that they return a collection without duplicates. To have a second rule "semantically uniquifies all the arguments before processing" (i.e. doesn't consider inputs as "multisets") could make sense; this is what happens currently for setdiff but not to symdiff. And it doesn't matter for intersect and union.

For setdiff!(s::Set, itr), the definition is straightforward, and so is the implementation: just remove from s elements from itr. (All other setdiff[!] methods probably rely on this primary one).

But for symdiff!(s::Set, itr), the straightforward implementation is to add or remove x from s depending on whether x is already in s for all x in itr. To semantically consider only distinct elements from itr would require first to convert itr to a Set.

I vaguely remember searching for precedent in other languages on this, without success. So I went for the faster implementation, which also has its own logic explained in my previous post, based on the associativity of the operation. This was also what symdiff!(::BitSet, itr) was doing before my change in #23528.

@oxinabox
Copy link
Contributor Author

oxinabox commented Feb 5, 2020

I realize we have conflated setdiff and symdiff

Oops, clearly I really messed up this PR. Since I thought I was changing setdiff.

Its been educational but anything coming Friday m it needs to go elsewhere

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants