Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large files test and fixes #4002

Closed

Conversation

keyboardDrummer
Copy link
Member

@keyboardDrummer keyboardDrummer commented May 13, 2023

Problem

When many edits were made in close succession, parsing was being done for each of these edits, despite new edits having already been made thus making the parses obsolete.

Changes

  • Move all work that was done in OpenDocument and UpdateDocument to separate threads. These two methods are marked as [Serial] which causes them to never execute in parallel. By not letting them do any work, new calls can come in which trigger ongoing work to be cancelled and queued work not be started, thereby reducing wasted effort.

For example, if request A,B,C are sent, previously work would be done for A, then B would come and work, then C would come in and work. Now, A comes in and works, during which B and C come in triggering a cancellation of work on A, and allowing B and C to be combined into a BC change which only has to be processed once.

Previously, much of the work that is done for a request, such a parsing and resolution, was already done asynchronously, but it turned out there was still significant potential synchronous work done by OpenDocument and UpdateDocument, such as updating the text buffer and migrating positions in the IDE state.

  • Do not immediately process incoming document changes, but first wait 20 milliseconds for new changes to come in. This timer resets whenever new changes arrive. Once particular work such as parsing has been started, it cannot be cancelled for some time, so it can be better to wait a small amount of time to see if more changes are incoming, and only once they are not, start the work.
  • Improve the performance of migration code by reusing a particular position related cache more often. In particular, only a single ChangeProcessor is created per processed TextDocumentContentChangeEvent.
  • Cache the lookup of the project file for five seconds.
  • Handle cancellation while parsing includes correctly, to prevent [internal error] Parser exception: The operation was canceled. from being emitted.

Caveats

  • Significant complexity is introduced in DocumentManager through concurrent code, increasing the chance of bugs and reducing maintainability.
  • I noticed that even when UpdateDocument returns quickly, it can still take time before it is called again even though more client updates have already been sent. I couldn't figure out what causes this. It seems the LSP library code is using Tasks and the default ThreadPool to schedule UpdateDocument calls. Maybe we are working a particular Task for too long without marking it as LongRunning, so there are no threads available in the ThreadPool, or we're triggering slow garbage collection. Neither options seem too plausible so it would be great to investigate more.
  • TextBuffer.ApplyTextChange is still O(N) complexity while it could be made O(1).
  • Our parser creates a string that contains the entire source file, which can use a lot of memory and may trigger excessive garbage collection, while it should instead work solely through streams.
  • Our parser should have cancellation support, so we can cancel mid parse and then don't have to wait to start processing when receiving incoming changes.

Testing

  • Add two stress tests that perform many fast and some slow edits on very large files that include very large files. The slow edits one was able to reproduce the [internal error] Parser exception: The operation was canceled. diagnostic, while the many fast edits one would time out without the changes in this PR.

By submitting this pull request, I confirm that my contribution is made under the terms of the MIT license.

@keyboardDrummer keyboardDrummer marked this pull request as ready for review May 16, 2023 12:23
@alex-chew
Copy link
Contributor

I was pretty upset about the size of this PR until I realized that 80,001 of the 81,099 added lines are intentionally enormous test files. :)

(Regardless, it will take me some time to review this PR since it touches so many parts of the codebase)

Copy link
Contributor

@alex-chew alex-chew left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I started reviewing this PR before realizing that it also includes the changes from #3970. I'm going to pause my review until that one's merged to avoid re-reviewing changes.

Source/DafnyCore/DafnyFile.cs Outdated Show resolved Hide resolved
Source/DafnyCore/DafnyMain.cs Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants