BTreeMap/BTreeSet drain & retain #66747

ssomers · 2019-11-25T15:13:56Z

Attempt at providing and testing drain and retain members on BTreeMap and BTreeSet as requested in rust-lang/rfcs#1338.

drain is relatively easy and therefore reliable.

I'm not quite sure that retain is trustworthy, given how noob I am, how much unsafe code there is all around, and that I don't quite understand why #58431 happened, the code comments there, or how it can be tested, although in theory this code respects the order introduced there.

According to this new benchmark, retain is worth the trouble even if you end up retaining nothing:

test btree::set::build_and_clear                         ... bench:       3,949 ns/iter (+/- 68)
test btree::set::build_and_drain                         ... bench:       3,945 ns/iter (+/- 68)
test btree::set::build_and_drop                          ... bench:       3,980 ns/iter (+/- 34)
test btree::set::build_and_into_iter                     ... bench:       4,045 ns/iter (+/- 75)
test btree::set::build_and_pop_all                       ... bench:       5,090 ns/iter (+/- 64)
test btree::set::build_and_remove_all                    ... bench:       6,835 ns/iter (+/- 50)
test btree::set::build_and_retain_nothing                ... bench:       4,741 ns/iter (+/- 140)

Most of the time in these is spent building the set (I don't know how to separate that off) but _retain_nothing is faster than _pop_all (which is also unstable), and way faster than _remove_all (a stable way to naively implement retain). That's because it's optimized to remove from leaf nodes as long as they remain sufficiently full. It's probably easy to extend that to leaf nodes that are the root node and are therefore allowed to underflow, and it's probably possible when stealing, but I tried to minimize changes in existing code.

rust-highfive · 2019-11-25T15:13:59Z

r? @rkruppe

(rust_highfive has picked a reviewer for you, use r? to override)

src/liballoc/collections/btree/map.rs

hanna-kruppe · 2019-11-25T18:49:18Z

I don't know the BTree code at all and the stuff added here seems somewhat subtle, so I don't feel confident reviewing this PR. Perhaps @Gankra can review or suggest another reviewer?

ssomers · 2019-12-01T20:27:14Z

This 2nd commit doesn't change drain or retain, but increases code reuse and (just a wee bit) iterator performance.

Instead of only retain sharing the guts of RangeMut::next_unchecked, all 7 iterators share the same 2 functions defined by 1 macro. This changes a detail in the iterators over immutable maps (i.e. Range::next_unchecked and Range::next_back_unchecked): they use ptr::read to copy the source handle, like all the rest does, instead of an implicit copy. This is some 5% faster (PS probably the 5% is due to the order swapping or coincidence, because I now think ptr::read is exactly the same as copy). There's no change in unsafe tags because these two functions were already (erroneously? prophetically?) tagged unsafe.

Secondly, this swaps the order in which the two immutable and owned iterators perform into_kv versus the tree descend, matching the correction in the mutable iterators by #58431.

Comparing all benchmarks that compiled on the original code, with the same nightly build:

cargo benchcmp --threshold 4 old2.txt new2.txt
 name                                           old2.txt ns/iter  new2.txt ns/iter  diff ns/iter   diff %  speedup
 btree::map::find_rand_100                      17                18                           1    5.88%   x 0.94
 btree::map::find_seq_10_000                    42                38                          -4   -9.52%   x 1.11
 btree::map::first_and_last_10k                 50                52                           2    4.00%   x 0.96
 btree::map::insert_seq_100                     44                46                           2    4.55%   x 0.96
 btree::map::iter_1000                          2,885             2,659                     -226   -7.83%   x 1.08
 btree::map::iter_100000                        359,550           343,650                -15,900   -4.42%   x 1.05
 btree::map::iter_20                            39                43                           4   10.26%   x 0.91
 btree::map::iter_mut_20                        40                42                           2    5.00%   x 0.95
 btree::set::build_and_clear                    4,676             3,318                   -1,358  -29.04%   x 1.41
 btree::set::build_and_drop                     4,714             3,306                   -1,408  -29.87%   x 1.43
 btree::set::build_and_into_iter                4,731             3,306                   -1,425  -30.12%   x 1.43
 btree::set::build_and_pop_all                  5,832             5,045                     -787  -13.49%   x 1.16
 btree::set::build_and_remove_all               7,404             6,084                   -1,320  -17.83%   x 1.22
 btree::set::difference_random_10k_vs_100       54,058            57,564                   3,506    6.49%   x 0.94
 btree::set::intersection_100_neg_vs_100_pos    23                24                           1    4.35%   x 0.96
 btree::set::intersection_random_100_vs_100     604               642                         38    6.29%   x 0.94
 btree::set::intersection_random_10k_vs_100     2,360             2,256                     -104   -4.41%   x 1.05
 btree::set::intersection_staggered_100_vs_100  621               560                        -61   -9.82%   x 1.11
 btree::set::intersection_staggered_10k_vs_10k  59,895            54,085                  -5,810   -9.70%   x 1.11
 btree::set::is_subset_100_vs_100               414               436                         22    5.31%   x 0.95
 btree::set::is_subset_100_vs_10k               1,259             1,353                       94    7.47%   x 0.93
 btree::set::is_subset_10k_vs_10k               41,193            43,501                   2,308    5.60%   x 0.95
WARNING: benchmarks in new but not in old: btree::set::build_and_drain, btree::set::build_and_retain_nothing

As you see, there's a serious and reproducible boost to the build_and_* tests, for which I have no explanation at all.

ssomers · 2019-12-01T20:42:00Z

By the way, the original, more modest commit is at https://github.com/ssomers/rust/tree/%2342849_v1

Gankra · 2019-12-02T16:15:48Z

Do we need to do anything special to check this against miri? BTreeMap has historically been very good at running afoul of it.

Gankra · 2019-12-03T16:17:45Z

ok yeah sorry i don't have the spare bandwidth for this review

Centril · 2019-12-03T17:59:14Z

r? @scottmcm

cuviper

drain is relatively easy and therefore reliable.

I'm not quite sure that retain is trustworthy, given how noob I am, how much unsafe code there is all around, and that I don't quite understand why #58431 happened, the code comments there, or how it can be tested, although in theory this code respects the order introduced there.

For this reason I would suggest splitting up this PR -- the easy parts can be easily merged, while the hard parts can get a more focused review.

cuviper · 2019-12-03T18:14:02Z

src/liballoc/collections/btree/map.rs

+ /// assert!(a.is_empty());
+ /// ```
+ #[unstable(feature = "btree_drain_retain", issue = "42849")]
+ pub fn drain(&mut self) -> IntoIter<K, V> {


This should return a new Drain type, just like HashMap::drain does, even if that's still just a newtype wrapping IntoIter.

I would expect BTreeSet::drain to remove a range of keys, much like Vec::drain removes a range of indices. There are ways to implement this that are more efficient than repeated calls to remove, but they are pretty complex. Nevertheless, I think it would be a mistake to commit to this signature for drain, since range deletion is quite useful.

I had the same expectation about drain and asked in #42849. I guess the signature should be discussed first in an RFC for drain specifically?

cuviper · 2019-12-03T18:14:45Z

src/liballoc/collections/btree/set.rs

+ /// assert!(a.is_empty());
+ /// ```
+ #[unstable(feature = "btree_drain_retain", issue = "42849")]
+ pub fn drain(&mut self) -> IntoIter<T> {


Also needs a new Drain type.

cuviper · 2019-12-03T18:20:54Z

src/liballoc/collections/btree/map.rs

+ let root2 = unsafe { ptr::read(&root).into_ref() };
+ let front = first_leaf_edge(root1);
+ let back = last_leaf_edge(root2);
+ IntoIter { front, back, length }


Even simpler: mem::replace(self, BTreeMap::new()).into_iter()

All right for the BTreeMap::new(), but isn't the mem::forget(self); in into_iter going to wreck this?

No, into_iter is intentionally deconstructing the map, just as it always would. It will only affect the one you pull out of self, and the new() one will remain untouched.

However, the fact that this is so simple reduces its value, as anyone could easily write that replace(...).into_iter() already. A drain(range) or drain(start, end) would be more compelling, and not so trivial either.

drain(start, end) seems an odd alternative to me - even Vec and VecDeque have a RangeBounds argument, while they don't have a range members.

I do have my own odd alternative though: as a member of RangeMut, but that simply cannot adjust length. So rather, a member of IterMut, but that can't be initialized with bounds. Oh boy, this is hard...

Centril · 2019-12-03T18:35:09Z

Musical chairs 🥁 r? @cuviper

ssomers · 2019-12-03T21:36:14Z

For this reason I would suggest splitting up this PR -- the easy parts can be easily merged, while the hard parts can get a more focused review.

Turns out that (probably) none of the 3 parts are easy. In anticipation of a choice over the signature of drain, I could do one PR for retain only, or instead:

sharing/opening up the existing iterator code in preparation of retain
adding retain on top of that

or instead:

adding retain without touching any existing code
sharing (some of) the then 7 instances of iterator code

I have no idea what's best.

cuviper · 2019-12-03T21:56:39Z

I have no idea what's best.

It's not an easy judgement call, but the goal is to reduce the review footprint, especially when it involves so much unsafe code.

If "sharing/opening up" existing code would show any benefit in its own, especially if it yields an overall simplification, then that sounds like a good first step.

ssomers · 2019-12-05T10:41:24Z

Here's my plan:

renamed the branch to #42849_v2
extract the "sharing/opening up" of tree navigation refactoring as a new PR Bundle and document 6 BTreeMap navigation algorithms #67073; but such that it can be used for drain_filter, which seems to mean perceived performance will drop.
when merged, a drain_filter method as a new PR
one day, if I manage to implement it at all, reopen this PR with an efficient range-based drain (since all comments here are in fact about drain)

rust-highfive assigned hanna-kruppe Nov 25, 2019

rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Nov 25, 2019

jonas-schievink added A-collections Area: std::collections. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. labels Nov 25, 2019

ssomers commented Nov 25, 2019

View reviewed changes

src/liballoc/collections/btree/map.rs Outdated Show resolved Hide resolved

BTreeMap/BTreeSet drain & retain and share the now 7 similar algorithms

acda9d1

ssomers force-pushed the #42849 branch from 7849be2 to acda9d1 Compare December 1, 2019 20:14

rust-highfive assigned scottmcm and unassigned hanna-kruppe Dec 3, 2019

cuviper reviewed Dec 3, 2019

View reviewed changes

rust-highfive assigned cuviper and unassigned scottmcm Dec 3, 2019

ssomers closed this Dec 5, 2019

ssomers deleted the #42849 branch December 5, 2019 07:50

ssomers mentioned this pull request Mar 29, 2020

Tracking Issue for {BTreeMap,BTreeSet}::extract_if #70530

Open

5 tasks

mbrubeck mentioned this pull request Sep 8, 2020

Map::retain rust-lang/rfcs#1338

Closed

2 tasks

ssomers mentioned this pull request Jan 16, 2021

Tracking Issue for {BTreeMap,BTreeSet}::drain #81074

Open

3 tasks

ssomers mentioned this pull request Feb 10, 2022

drain() doc of vector (and others) is still unclear #92765

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BTreeMap/BTreeSet drain & retain #66747

BTreeMap/BTreeSet drain & retain #66747

ssomers commented Nov 25, 2019 •

edited by mbrubeck

Loading

rust-highfive commented Nov 25, 2019

hanna-kruppe commented Nov 25, 2019

ssomers commented Dec 1, 2019 •

edited

Loading

ssomers commented Dec 1, 2019

Gankra commented Dec 2, 2019

Gankra commented Dec 3, 2019

Centril commented Dec 3, 2019

cuviper left a comment

cuviper Dec 3, 2019

ecstatic-morse Dec 3, 2019 •

edited

Loading

ssomers Dec 3, 2019

cuviper Dec 3, 2019

cuviper Dec 3, 2019

ssomers Dec 3, 2019

cuviper Dec 3, 2019

ssomers Dec 5, 2019

Centril commented Dec 3, 2019

ssomers commented Dec 3, 2019

cuviper commented Dec 3, 2019

ssomers commented Dec 5, 2019 •

edited

Loading

BTreeMap/BTreeSet drain & retain #66747

BTreeMap/BTreeSet drain & retain #66747

Conversation

ssomers commented Nov 25, 2019 • edited by mbrubeck Loading

rust-highfive commented Nov 25, 2019

hanna-kruppe commented Nov 25, 2019

ssomers commented Dec 1, 2019 • edited Loading

ssomers commented Dec 1, 2019

Gankra commented Dec 2, 2019

Gankra commented Dec 3, 2019

Centril commented Dec 3, 2019

cuviper left a comment

Choose a reason for hiding this comment

cuviper Dec 3, 2019

Choose a reason for hiding this comment

ecstatic-morse Dec 3, 2019 • edited Loading

Choose a reason for hiding this comment

ssomers Dec 3, 2019

Choose a reason for hiding this comment

cuviper Dec 3, 2019

Choose a reason for hiding this comment

cuviper Dec 3, 2019

Choose a reason for hiding this comment

ssomers Dec 3, 2019

Choose a reason for hiding this comment

cuviper Dec 3, 2019

Choose a reason for hiding this comment

ssomers Dec 5, 2019

Choose a reason for hiding this comment

Centril commented Dec 3, 2019

ssomers commented Dec 3, 2019

cuviper commented Dec 3, 2019

ssomers commented Dec 5, 2019 • edited Loading

ssomers commented Nov 25, 2019 •

edited by mbrubeck

Loading

ssomers commented Dec 1, 2019 •

edited

Loading

ecstatic-morse Dec 3, 2019 •

edited

Loading

ssomers commented Dec 5, 2019 •

edited

Loading