Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce fast and scalable channels #3103

Merged
merged 26 commits into from
Feb 13, 2023
Merged

Introduce fast and scalable channels #3103

merged 26 commits into from
Feb 13, 2023

Conversation

ndkoval
Copy link
Member

@ndkoval ndkoval commented Dec 20, 2021

No description provided.

@ndkoval ndkoval force-pushed the fast-channels branch 2 times, most recently from 3bc29ab to 39a6f64 Compare January 16, 2022 14:38
@ndkoval ndkoval force-pushed the fast-channels branch 2 times, most recently from 04ada5b to 80335a9 Compare January 28, 2022 14:50
@ndkoval ndkoval force-pushed the new-channels-select branch 3 times, most recently from f0c3f5a to 851cb9b Compare August 2, 2022 22:09
@ndkoval ndkoval force-pushed the fast-channels branch 3 times, most recently from 0713a6f to 6185191 Compare August 3, 2022 14:06
@qwwdfsad qwwdfsad self-requested a review August 4, 2022 09:55
@ndkoval ndkoval changed the base branch from new-channels-select to develop September 13, 2022 12:34
@qwwdfsad
Copy link
Collaborator

Two critical issues:

  • No TCO in send due to a bug in the inliner. Ilmir is figuring it out. If I inline sendImpl in send manually, TCO kicks in and everything becomes better
  • No TCO in iterator's hasNext. If I inline receiveImpl manually, TCO does not kick in and it is a serious problem that has to be investigated.

Please take a look

Signed-off-by: Nikita Koval <[email protected]>
Signed-off-by: Nikita Koval <[email protected]>
Signed-off-by: Nikita Koval <[email protected]>
Signed-off-by: Nikita Koval <[email protected]>
@ndkoval ndkoval requested a review from qwwdfsad January 16, 2023 10:43
Signed-off-by: Nikita Koval <[email protected]>
Signed-off-by: Nikita Koval <[email protected]>
Signed-off-by: Nikita Koval <[email protected]>
// and the `INTERRUPTED_SEND` token has been installed as a waiter,
// this request finishes with the `onClosed()` action.
if (closed) {
segment.onSlotCleaned()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This branch isn't reachable, isn't it?
It the channel is closed, updateCellSend won't ever return RESULT_SUSPEND

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be reachable. Consider a situation when a channel is marked as closed, but the cell is still empty. In this case, the operation installs INTERRUPTED_SEND in the cell. I missed this part of the algorithm when documenting the code -- will fix that.

ndkoval and others added 2 commits January 17, 2023 21:58
Signed-off-by: Nikita Koval <[email protected]>
Marked as "fixes 3330" to automatically close the issue yet it's the original PR that fixed it

Fixes #3330
@qwwdfsad qwwdfsad self-requested a review January 25, 2023 10:24
qwwdfsad and others added 3 commits January 25, 2023 15:12
…rom 'onUndeliveredElement' to global exception handler and wrap them into dedicated exception
…ancellationDoesNotBreakMutex`

Signed-off-by: Nikita Koval <[email protected]>
// physically to avoid memory leaks.
if (closed) {
return onClosed()
} else continue
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both me and Dmitry prefer having {} around continue if it's not a one-liner

@qwwdfsad qwwdfsad self-requested a review February 8, 2023 10:26
// already been cancelled or the cell is poisoned.
// Restart from the beginning in this case.
// To avoid memory leaks, we also need to reset
// the `prev` pointer of the working segment.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please elaborate on the scenario when memory leak occurs if prev pointer is not cleared?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do Lincheck tests fail if you comment this out?

Copy link
Collaborator

@qwwdfsad qwwdfsad Feb 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, none of ./gradlew :kotlinx-coroutines-core:jvmLincheckTest :kotlinx-coroutines-core:jvmLincheckTestAdditional

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

jvmLincheckTestAdditional fails with isStressTest = true

}
}
// The cell stores a buffered element.
state === BUFFERED -> if (segment.casState(index, state, DONE_RCV)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This CAS should always succeed, shouldn't it?
Probably then it's better to either assert it or to replace it with the regular write

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The BUFFERED state can be replaced with CLOSED. I would not recommend applying such optimizations, sticking to the more straightforward code patterns. According to my experience developing this algorithm, such optimizations are often wrong.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, thanks.

I was semi-automatically using diagram from the paper, and CLOSED is the only state omitted :)

return segment.retrieveElement(index)
}
}
return updateCellReceiveSlow(segment, index, r, waiter)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please drop a comment (here and in send as well) before slow-path that explains when we are falling into it?

E.g. // Slow-path: CAS on rendezvous failed and it's time to poison or ...another scenario...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment will be hard to maintain and useless. The slow path covers all transitions; that's it. The right question is not when we go to the slow path but what the fast path does.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I've meant not to document all scenarios covered by slowpath with an inline comment, but rather a comment that explains when we go to the slow-path ("// There is no rendezvous or fast-path suspend failed")

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then the comment will be like "we go to the slow-path code when the fast path fails"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, it's better to go to the only call of the slow-path function and see when it is invoked. At least, this approach is easy-to-maintain if the fast-path logic changes.

}
}
return updateCellReceiveSlow(segment, index, r, waiter)
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And on a separate note, could you please elaborate why slowpath-fastpath separation is needed at all?

My intuition is the following:

Fast path is "check few states and try CAS once, otherwise go to slow path" , while slow-path is "check all states and wrap it in a while loop in case CASes fail".
If that's true, a single "slow-path" should be enough, all we win by separation is a loop avoidance (which probably is virtually zero), but we get much less code and less duplicated state management in return

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is critical for better inlining. You can comment out the fast path and check that the performance on our benchmarks degrades.

@qwwdfsad
Copy link
Collaborator

qwwdfsad commented Feb 8, 2023

It seems like we only have a question about cleanPrev.
As soon as it's addressed (and JVM IR transformer is disabled probably), the PR is good to go

Signed-off-by: Nikita Koval <[email protected]>
@qwwdfsad qwwdfsad merged commit 22f8626 into develop Feb 13, 2023
qwwdfsad pushed a commit that referenced this pull request Feb 13, 2023
See #3621 for detailed description and rationale

Fixes #3621
@LouisCAD
Copy link
Contributor

Looks like that was a tough one!

Finally merged a bit more than one year in, good job, seems like state of the art! 👏👏👏👏👏

ndkoval added a commit that referenced this pull request Feb 13, 2023
See #3621 for detailed description and rationale

Fixes #3621
@ndkoval ndkoval self-assigned this Feb 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants