Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite Merge to take two HighlightEvent Iterators #3695

Conversation

the-mikedavis
Copy link
Member

@the-mikedavis the-mikedavis commented Sep 4, 2022

Currently, the HighlightEvent merge function overlays spans (Vec<(usize, std::ops::Range<usize>)> on an Iterator of HighlightEvent. The current merge doesn't handle overlapping spans well though, specifically with LSP diagnostics:

  • diagnostic ranges may be discarded if they overlap
  • ghost text may appear when a language server sends overlapping diagnostics at the end of a document

This merge function takes two HighlightEvent Iterators which may themselves have overlapping highlights and produces a new Iterator with highlights from right overlaid on left. Also included is a new Iterator that properly subslices overlapping ranges in a Vec<(usize, std::ops::Range<usize>) which can be passed to the merge function and satisfies all of merge's invariants and assumptions.

The new merge is more complicated and it looks like we get dinged for the extra VecDeques but otherwise the performance looks comparable. I couldn't find any noticeable changes in the flamegraph except for two new tiny slivers for VecDeque::extend.

Fixes #2248
I rebased #2857 on this and it works well and actually the types work out much nicer than that branch's merge function.

helix-core/src/syntax.rs Outdated Show resolved Hide resolved
@the-mikedavis the-mikedavis force-pushed the md-highlight-event-stream-merge branch 4 times, most recently from 36e4955 to 8584189 Compare September 5, 2022 22:36
@the-mikedavis the-mikedavis marked this pull request as draft September 6, 2022 15:13
@kirawi kirawi added A-tree-sitter Area: Tree-sitter A-core Area: Helix core improvements S-experimental Status: Ongoing experiment that does not require reviewing and won't be merged in its current state. labels Sep 13, 2022
@the-mikedavis the-mikedavis removed the A-tree-sitter Area: Tree-sitter label Sep 17, 2022
* Refactor Merge to take two HighlightEvent Iterators
* Add an Iterator for subslicing overlapping spans
In these cases, right or left has finished, respectively. We drain
right or left's queue and emit those events but previously we forgot
to then move the iterator to the next, discarding the Source being
emitted in these branches.

The result was duplicated emitted text after the last diagnostic
in a file, or after the last selection.
The ranges for highlight spans are inclusive, so the span in right
may be discarded if its end is equal to left's start.

For a graphical example:

        |----------------| L
    |---| R

Here R ends as L starts. In this case we discard R.

Without this change, we subsliced R and made it into a Source
which starts at L and ends at L which fails the min_width_1
invariant.

Also included is a change that eagerly discards the following
HighlightEnd events in the iterator and queued HighlightStart
events. This case is technically covered by the outer
`loop { match { .. } }` but it's better to do it eagerly: we don't
need to branch on all of the possibilities, just blindly empty
the right_queue and drop all HighlightEnds.
This change adds two counters which are incremented for every queued
HighlightStart and decremented with every dequeued HighlightEnd. This is
used to track the invariant that HighlightStarts and HighlightEnds are
balanced in the left iterator. In the right iterator, the same checks
are made except that the counter is also used to balance the number
of HighlightStarts and HighlightEnds when the right iterator is being
stopped before having finished.

Without this change, the first test case from this commit would be
missing a HighlightEnd.

Also, the second test case would provoke a panic attempting to match on
`(None, Some(HighlightEnd))`. This has been fixed by the change which
allows a `(None, Some(event))` pattern. Output-wise, that change allows
a `HighlightStart, HighlightEnd` sequence from `right` to end up in the
emitted events. These are slightly wasteful but are otherwise benign.
The given test case reflects a panic from running on the rainbow
brackets branch.
Rather than transforming the input `Vec<Span>` into an iterator,
this version of SpanIter keeps the Vec as-is and iterates through
it manually with an index. This is necessary because in cases where
many spans overlap one another, the SpanIter implementation needs to
look arbitrarily far ahead in the Vec to find all ranges which start
on the same point.

The basic structure of this version is the same as the prior version.
The following are the differences:

* The position in the Vec is tracked manually with an index rather
  than automatically with an Iterator.
* Subslicing applies to all spans with the same starting point.
* All spans with the same start point are emitted simultaneously.

The new version covers a real breakage noticed from diagnostics
produced by rust-analyzer.
Selection highlights are already known to be satisfy all of the
invariants in syntax::merge so syntax::span_iter is pure overhead.

This change introduces another iterator with a naive flat_map
implementation that splits Spans into HighlightEvents without checking
if any spans are overlapping or sorted.

We wrap the iterator in a new type to avoid any performance penalty
of a `Box<dyn Iterator<Item = HighlightEvent>>` (which uses dynamic
dispatch).
This change covers an edge-case that can arise in the SpanIter from
subslicing spans. Spans which are subsliced may start after some
other span later in the span Vec (see the new test case).

To handle this, we re-sort the span Vec after subslicing any spans.

Since the subslice only affects the starts of some spans and otherwise
leaves the ordering unchanged, we accomplish the sort by finding any
spans which should come before the subsliced spans and rotate them
to come first. This preserves the ordering in the subsliced spans and
in the rotated spans but ensures that the global ordering of the Vec
remains correct after the subslice.
The span iterators and tests took up a significant portion of the
syntax module but were fairly self-contained.

In addition, the Span type was an informal

    (usize, std::ops::Range<usize>)

which was a bit tricky to deal with:

* the type was long to write out for all producers/consumers
* std::ops::Range<usize> does not implement Copy

Plus we don't end up using any Range methods except its Ord/PartialOrd
implementations.

Given its first-class usage for diagnostics and selections, it seems
appropriate to separate it into its own struct type and module.
@the-mikedavis
Copy link
Member Author

I think I've debugged this pretty thoroughly but it's still a somewhat risky change: bugs in SpanIter (which should all be squashed now) manifest as duplicated "ghost" text. I'll be driving this branch for the foreseeable future to see if I can find any rendering bugs. But this branch could probably use some time to "simmer".

It's also probably going to be pretty impossible to review 😅. My desk is completely covered in number-line drawings of every possible scenario for overlapping ranges. I think I have test cases for most scenarios though and the ascii art is very pretty 🙂

@the-mikedavis the-mikedavis added S-waiting-on-review Status: Awaiting review from a maintainer. E-testing-wanted Call for participation: Experimental features suitable for testing and removed S-experimental Status: Ongoing experiment that does not require reviewing and won't be merged in its current state. labels Sep 22, 2022
@the-mikedavis the-mikedavis marked this pull request as ready for review September 22, 2022 16:08
Copy link
Member

@pascalkuthe pascalkuthe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed the SpanIter locally and found/fixed some bugs and implemented some performance optimizations.
I did the changes locally because it would be really annoying to do trough the GH webui.
I opened at PR against your fork (the-mikedavis#3) that I will refer to during my review comments.

The algorithm was a bit unapproachable at first.
However it looked like the perfect candidate for property testing, because you clearly defined some properties that the resulting HighlightEvents must uphold.
I used proptest to implement this in 924e92c.
I used proptest instead of quickeck, because it makes writing reduces much nicer which is relevant because the input data also needs to uphold some variants.I converted the one existing use of quickcheck to use proptest instead.

The resulting testt almost instantly reduced some crashes. It took me quite a while to fix this because there were actually multiple bugs (see review comments).

With the suggested changes from my branch I let the proptest run ~50 million different random range inputs and found no more test failures. To find the source of the bugs (and fix them) I also had to really get into the details of the algorithm.
Therefore both from a qualitative as well as quantitative point of view I am now very confident that the updated version works as intended.

I haven't looked at the merge implementation yet, I think that will also be a good fit for proptesting.

fn cmp(&self, other: &Self) -> std::cmp::Ordering {
// Sort by range: ascending by start and then ascending by end for ties.
if self.start == other.start {
self.end.cmp(&other.end)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be sorting the end in descending order (its specified like that in the documentation below and that also makes more sense because produces less events). Fixed in 447270a

// `i` starts on the index after the last subsliced span.
loop {
match self.spans.get(i) {
Some(span) if span.start < intersect => {
Copy link
Member

@pascalkuthe pascalkuthe Sep 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to compare span.end aswell to maintain the descending order of span ends (or even simpler just compare the entire span), in case a new span starts exactly at the intersect. Fixed in 276e58b

// If this span needs to be subsliced, consume the
// left part of the subslice and leave the right.
self.range_ends.push(intersect);
span.start = intersect;
Copy link
Member

@pascalkuthe pascalkuthe Sep 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

intersect may be larger than the spans end if multiple spans start at the same point.
This will lead to spans that have their end moved past their start.
Lead to an assertion failure in debug builds or messed up highlighting in release builds.
This is the main bug I found. Its fixed in f73a14e.
I added a pretty detailed description of the bug to the commit message (including an example) so look there for details. I also added unit tests to cover these situations.

Note: If the ranges are sorted in ascending order (as was the case originally) this bug is already fixed by 276e58b. However sorting in descending order will emit less events so I think its preferable. Furthermore with this change the entire span portioning can be split into a separate function (done in c64cf12) which I also see as a plus

Edit: Discard the Note above note this is not true. Ascending order would still need a (different) fix for this edgecase. Descending order looks at the longest span first and moves the start of the span past the end which leads to easy to detect assertions failures. In ascending order the algorithm instead does not detect the interception at all and treats all spans at the same position as tough they all end before any interception would occur. This produces incorrect Highlight Events as a result. However this is currently not caught by property testing because all invariants still hold. Last time I look at ascending order I was only using property testing and hence taught it was fixed. But the unit tests I have added for this edge-case since should fail for ascending order too (I think).

// Ensure range-ends are sorted ascending. Ranges which start at the
// same point may be in descending order because of the assumed
// sort-order of input ranges.
self.range_ends.sort_unstable();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorting range_ends in reverse (meaning that the smallest values are at the end) allows some pretty nice performance optimizations by replacing the retain above with a simple pop off the end.
Done in c64cf12


/// Converts a Vec of spans into an [Iterator] over [HighlightEvent]s
///
/// This implementation does not resolve overlapping spans. Zero-width spans are
Copy link
Member

@pascalkuthe pascalkuthe Sep 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Zero-width spans were actually not eliminated but it was trivial to implement. (See 36d44f5)

@the-mikedavis the-mikedavis marked this pull request as draft September 25, 2022 13:29
@the-mikedavis the-mikedavis added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from a maintainer. E-testing-wanted Call for participation: Experimental features suitable for testing labels Sep 25, 2022
@the-mikedavis
Copy link
Member Author

This approach adds a lot of complexity which ends up not being necessary: diagnostics can be handled as a special case and there isn't currently another producer of Vec<(usize, std::ops::Range<usize>)> that has overlapping spans or needs multiple highlights per range. (Selections are non-overlapping by nature and rainbow highlights end up not overlapping and only ever need one color each.)

Plus this branch has some rendering bugs when there are overlaps.

We can revisit the Merge change if it ends up being necessary later down the line but for now it doesn't look necessary to take two iterators

@the-mikedavis the-mikedavis deleted the md-highlight-event-stream-merge branch October 5, 2022 16:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-core Area: Helix core improvements S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Characters rendering multiple times
4 participants