Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] allow sorting Dict/Set values in show #33744

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

rfourquet
Copy link
Member

It can clearly be convenient to have a dict or set printed in sorted order, but doing so by default would give the wrong idea about the order of iteration. Cf. e.g. #7153 for some discussion. But wouldn't it be great to have an opt-in way? With this PR together with #29249, you can!

julia> Base.active_repl.options.iocontext[:sorted] = true;

julia> Dict(i => i^2 for i=1:10)
Dict{Int64,Int64} with 10 entries:
  1  => 1
  2  => 4
  3  => 9
  4  => 16
  5  => 25
  6  => 36
  7  => 49
  8  => 64
  9  => 81
  10 => 100

Needs to be updated for other show methods, and for Set, but I wanted to check first whether this has a chance before putting more work.

@rfourquet rfourquet added needs tests Unit tests are required for this change needs docs Documentation for this change is required needs news A NEWS entry is required for this change domain:display and printing Aesthetics and correctness of printed representations of objects. domain:collections Data structures holding multiple items, e.g. sets labels Nov 1, 2019
@clarkevans
Copy link
Member

clarkevans commented Jul 7, 2020

This may be helpful for regression test outputs that test on the display output, as we do for NarrativeTest.jl. It'd be nice to also have a flag to sort them headless operation (e.g. test harness output).


if sorted
try # sorting fails when elements are not comparable, collect can fail too
t = sort!(collect(t))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can have a lazy dict type that maps all typemin(Int):typemax(Int) to something. Materializing this as a vector is quite dangerous.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Maybe the :sorted attribute should get an Int instead of a Bool, serving as a threshold for the maximum size of the dict below which it's sorted.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another possibility is a dictionary variant of https://github.com/JuliaArrays/MappedArrays.jl where the mapping function is super slow. It is perhaps better to use a variant of Iterators.take that takes a timeout.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again good point! I'm starting to wonder whether this is really feasable in the general case...

@rfourquet rfourquet added the needs decision A decision on this change is needed label Jul 12, 2020
@StefanKarpinski
Copy link
Sponsor Member

I think what we really ought to do here is switch to an ordered Dict implementation, which would decouple the output from the vagries of the internal hashing.

@rfourquet
Copy link
Member Author

switch to an ordered Dict implementation

This would solve some use-cases but not the one where one wishes to see the keys in sorted order (as opposed to in insertion order).

@clarkevans
Copy link
Member

clarkevans commented Jul 17, 2020

I think what we really ought to do here is switch to an ordered Dict implementation, which would decouple the output from the vagries of the internal hashing.

This is wise; could it happen before the next LTS?. This will make regression testing and working with JSON data more predictable. Order preserving dictionaries have become the norm the least few years; since Python version 3.5, standard dict objects preserve insertion order in the reference implementation. Moreover, many JSON implementations assume key order of the incoming data; moreover, some implementations silently fail if the key-order of serialized objects differs from expected input.

@tkf
Copy link
Member

tkf commented Jul 17, 2020

@clarkevans Maybe you can try Dictionaries.jl? It has Python-inspired hash dict implementation andyferris/Dictionaries.jl#13. It works with 1.0 already.

Using this in the wild is important if we are going to have it in Base since I think this is the strongest (or more like the only) candidate implementation (although the interface itself probably has to be changed to fit with AbstractDict if we want it within 1.x).

@PallHaraldsson
Copy link
Contributor

PallHaraldsson commented Apr 26, 2022

Even though @StefanKarpinski states:

I think what we really ought to do here is switch to an ordered Dict implementation

and:

Order preserving dictionaries have become the norm the least few years; since Python version 3.5

it seems it will not happen (with unordered Swiss Dict/table merged). I see Go's map is unordered, so ordered is not strictly the norm, but it's however randomized order.

So, this PR is still relevant if ordered is not going to happen, as I proposed, but I'm now a bit conflicted with Swiss that much faster:

https://brianlovin.com/hn/29848295

Choosing OrderedDict was a no brainer for a language like Python.

You can't beat say Swiss Tables (a modern C++ unordered map with excellent performance) with the approach taken in OrderedDict [..]

Swiss Tables is SIMD. In a single CPU instruction it can examine an entire cacheline of RAM [..]
To make it fly though, you can't preserve order, the information needed isn't stored anywhere and storing it would make it slower and bigger.

https://www.hyrumslaw.com/

To compare with Go:
https://stackoverflow.com/questions/18342784/how-to-iterate-through-a-map-in-golang-in-order

https://medium.com/swlh/an-ordered-map-in-go-436634692381

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain:collections Data structures holding multiple items, e.g. sets domain:display and printing Aesthetics and correctness of printed representations of objects. needs decision A decision on this change is needed needs docs Documentation for this change is required needs news A NEWS entry is required for this change needs tests Unit tests are required for this change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants