Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resolve concurrency bug in TestNotificationReceiver, improving test stability #5310

Conversation

keyboardDrummer
Copy link
Member

@keyboardDrummer keyboardDrummer commented Apr 9, 2024

Description

How has this been tested?

  • No idea how to test this

By submitting this pull request, I confirm that my contribution is made under the terms of the MIT license.

@MikaelMayer
Copy link
Member

What is the queue being cleared? I don't understand how the only modified consuming code would not loop forever. It looks like even if another thread added an element, before GetLast would return it, the consuming code would get it at the next iteration of the loop.

@keyboardDrummer
Copy link
Member Author

keyboardDrummer commented Apr 10, 2024

What is the queue being cleared?

I'm not sure I understand the question. If the queue has elements, then a call to TestNotificationReceiver.AwaitNextNotificationAsync() will return the first of those, but I'm guessing you understood that based on the code.

The method TestNotificationReceiver.GetLast returns the last received element, and clears the queue so any subsequent interaction with that TestNotificationReceiver through AwaitNextNotificationAsync will return elements newer than that last received element.

I don't understand how the only modified consuming code would not loop forever. It looks like even if another thread added an element, before GetLast would return it, the consuming code would get it at the next iteration of the loop.

I have trouble parsing that. The first sentence is saying you think the consuming code will loop forever, and the second seems to say you think it will not loop. Seems like you have a not too many in the first one.

In any case, there is consuming code that was not modified, that is affected by this change:

var fileVerificationStatus = verificationStatusReceiver.GetLast(v => v.Uri == documentId.Uri);
// During the call to GetLast, a notification came in after computing the result, but before clearing the queue
// So fileVerificationStatus is now the second-to-last notification
// And verificationStatusReceiver.AwaitNextNotificationAsync() 
// will not not complete unless another notification comes in
if (fileVerificationStatus != null && fileVerificationStatus.Version == documentId.Version) {
  while (fileVerificationStatus.Uri != documentId.Uri || !fileVerificationStatus.NamedVerifiables.All(FinishedStatus)) {
    fileVerificationStatus = await verificationStatusReceiver.AwaitNextNotificationAsync(cancellationToken.Value);
  }
}

Consuming code 2:

var result = diagnosticsReceiver.History.GetLast(d => d.Uri == documentItem.Uri);
// A notification came in during the call to GetLast, but it was not considered so GetLast returned null, 
// and then the queue was cleared
// the while is entered because result is null
while (result == null) {
  // The following await never returns because the queue was emptied
  var diagnostics = await diagnosticsReceiver.AwaitNextNotificationAsync(CancellationToken);
  if (diagnostics.Uri == documentItem.Uri) {
    result = diagnostics;
  }
  logger.LogInformation(
    $"GetLastDiagnosticsParams didn't find the right diagnostics after getting status {status}. Waited to get these diagnostics: {diagnostics.Stringify()}");
}

@keyboardDrummer keyboardDrummer enabled auto-merge (squash) April 10, 2024 09:11
@keyboardDrummer keyboardDrummer changed the title Resolve potential concurrency bug in TestNotificationReceiver, improving test stability Resolve concurrency bug in TestNotificationReceiver, improving test stability Apr 10, 2024
Copy link
Member

@MikaelMayer MikaelMayer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to think more about this change. Let's first consider two points.

while (notifications.Size > 0) {
_ = notifications.Dequeue(CancellationToken.None);
}
return History.LastOrDefault(predicate);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm completely perplexed by the new logic here, but I was before by the old logic. If Notifications is a concurrent queue, how do you ensure that GetLast actually returns the last one ever, if that's the intended meaning? And what's the relationship between notifications and History?

Also, how do you prevent concurrent threads from adding something to the notification queue between the end of the while to the beginning of the return statement?

Copy link
Member Author

@keyboardDrummer keyboardDrummer Apr 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm completely perplexed by the new logic here, but I was before by the old logic.

🤣

I think the confusion comes from GetLast having a difficult to explain contract. I considered inlining it, since it does not have many usages, but that would lead to a little code duplication.

GetLast does not return the last ever notification. GetLatest would be a better name.

History is all notifications that have been received. TestNotificationReceiver implicitly remembers an index into that history, starting at index 0. Notifications is a list of notifications starting at that cursor and extending into the infinite future. You can call notifications.Dequeue to return the notification located at the cursor, and bump the cursor up by one.

Also, how do you prevent concurrent threads from adding something to the notification queue between the end of the while to the beginning of the return statement?

I don't prevent that, but the entire while is a performance optimization so the clients don't iterate through 'old' notifications. If what you describe happens, then the clients may inspect an old notification, after which they'll call AwaitNextNotificationAsync again to get a newer one.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I found some client code that does require the queue to be empty after calling GetLast. For example this code:

var diagnostics1 = diagnosticsReceiver.GetLast(documentItem);
Assert.Equal(4, diagnostics1.Length);
ApplyChange(ref documentItem, ((7, 25), (10, 17)), "");
var diagnostics2 = await GetNextDiagnostics(documentItem);

The GetNextDiagnostics call returns the wrong result if the queue was not at the position of diagnostics1 after calling diagnosticsReceiver.GetLast. I've updated the code again to account for that.

@@ -212,14 +212,15 @@ protected ClientBasedLanguageServerTest(ITestOutputHelper output, LogLevel dafny

public async Task<PublishDiagnosticsParams> GetLastDiagnosticsParams(TextDocumentItem documentItem, CancellationToken cancellationToken, bool allowStale = false) {
var status = await WaitUntilAllStatusAreCompleted(documentItem, cancellationToken, allowStale);
var result = diagnosticsReceiver.History.LastOrDefault(d => d.Uri == documentItem.Uri);
var result = diagnosticsReceiver.GetLast(d => d.Uri == documentItem.Uri);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two meanings of Last, it would be good to disambiguate them.

  1. Last = the last available
  2. Last = the one after which there will be none.

You could perhaps rename the first one as LastAvailable, and the second one as Final

@@ -212,14 +212,15 @@ protected ClientBasedLanguageServerTest(ITestOutputHelper output, LogLevel dafny

public async Task<PublishDiagnosticsParams> GetLastDiagnosticsParams(TextDocumentItem documentItem, CancellationToken cancellationToken, bool allowStale = false) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
public async Task<PublishDiagnosticsParams> GetLastDiagnosticsParams(TextDocumentItem documentItem, CancellationToken cancellationToken, bool allowStale = false) {
public async Task<PublishDiagnosticsParams> GetLastestDiagnosticsParams(TextDocumentItem documentItem, CancellationToken cancellationToken, bool allowStale = false) {

Comment on lines +32 to +33
notificationHistory.Add(request);
notifications.Enqueue(request);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the lock, I see how if notifications is a suffix of history, and then the postcondition keeps that property. We did not have this before the lock.

Could you have written it like this:

Suggested change
notificationHistory.Add(request);
notifications.Enqueue(request);
History.Add(request);
notifications.Enqueue(request);

so that it's consistent with GetLatestAndClearQueue?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

History is read only, so I can not call Add on it.

return result;
}
public TNotification? GetLatestAndClearQueue(Func<TNotification, bool> predicate) {
lock (this) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, great, that way the history and the notification queue are in sync. Yes, that is definitely something necessary.
Note that, if you lock this, you can now reuse clearQueue() as before, it should not matter and would be more readable.

The implicit precondition of this method is that the notifications queue is a suffix of the history.

Copy link
Member Author

@keyboardDrummer keyboardDrummer Apr 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that, if you lock this, you can now reuse clearQueue() as before

AsyncQueue does not have a Clear method, so I have to do

while (notifications.Size > 0) {
  _ = notifications.Dequeue(CancellationToken.None);
}

to clear it.

The implicit precondition of this method is that the notifications queue is a suffix of the history.

Once you enter the lock, yes.

MikaelMayer
MikaelMayer previously approved these changes Apr 10, 2024
@keyboardDrummer keyboardDrummer merged commit 62a00c7 into dafny-lang:master Apr 11, 2024
20 checks passed
@keyboardDrummer keyboardDrummer deleted the testNotificationReceiverConcurrencyBug branch April 11, 2024 14:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Exception in TestNotificationReceiver due to race condition Unstable test EditWhenUsingStandardLibrary
2 participants