Optimize `CancellableContinuationImpl.invokeOnCancellation(..)` for `Segment`s #3084

ndkoval · 2021-12-14T15:38:54Z

Current semaphore implementation uses Segment-s for storing waiting continuations. Moreover, the upcoming new channels and mutex algorithms also use segments to store waiters. When suspending, a cancellation handler should be provided via cont.invokeOnCancellation { ... }-- it cleans up the corresponding slot in the segment and physically removes this segment from the linked list if it becomes full of cancelled cells. However, this cancellation handler requires an allocation every time.

To reduce the memory pressure, we can store the segment along with the slot index in CancellableContinuationImpl directly, as a cancellation handler instruction; thus, eliminating allocations for the corresponding cancellation handlers. For this purpose, we:

Allow storing Segment in state field, similarly to CancelHandler. On cancellation, Segment.invokeOnCancellation(index, cause) function is called.
Store the slot index in the existing decision integer field, extending its purpose correspondingly.

The benchmark below (see the comment) shows a significant allocation rate reduction.

ndkoval · 2021-12-14T18:09:13Z

As semaphore leverages this optimization, I added a simple sequential benchmark to show the impact. The results are below.

WITHOUT optimization:

Benchmark                                                                   Mode  Cnt         Score           Error   Units
SequentialSemaphoreAsMutexBenchmark.benchmark                               avgt   10         0.123 ±         0.007    s/op
SequentialSemaphoreAsMutexBenchmark.benchmark:·gc.alloc.rate                avgt   10       340.849 ±        20.797  MB/sec
SequentialSemaphoreAsMutexBenchmark.benchmark:·gc.alloc.rate.norm           avgt   10  64500711.033 ±         4.138    B/op
SequentialSemaphoreAsMutexBenchmark.benchmark:·gc.churn.G1_Eden_Space       avgt   10       389.221 ±       620.649  MB/sec
SequentialSemaphoreAsMutexBenchmark.benchmark:·gc.churn.G1_Eden_Space.norm  avgt   10  74868326.400 ± 119718113.259    B/op
SequentialSemaphoreAsMutexBenchmark.benchmark:·gc.count                     avgt   10         5.000                  counts
SequentialSemaphoreAsMutexBenchmark.benchmark:·gc.time                      avgt   10        57.000                      ms

WITH optimization:

Benchmark                                                                   Mode  Cnt         Score           Error   Units
SequentialSemaphoreAsMutexBenchmark.benchmark                               avgt   10         0.123 ±        0.004    s/op
SequentialSemaphoreAsMutexBenchmark.benchmark:·gc.alloc.rate                avgt   10       213.820 ±       10.688  MB/sec
SequentialSemaphoreAsMutexBenchmark.benchmark:·gc.alloc.rate.norm           avgt   10  40500711.033 ±        4.138    B/op
SequentialSemaphoreAsMutexBenchmark.benchmark:·gc.churn.G1_Eden_Space       avgt   10       157.394 ±      501.946  MB/sec
SequentialSemaphoreAsMutexBenchmark.benchmark:·gc.churn.G1_Eden_Space.norm  avgt   10  30303846.400 ± 96795349.649    B/op
SequentialSemaphoreAsMutexBenchmark.benchmark:·gc.churn.G1_Old_Gen          avgt   10        ≈ 10⁻⁴                 MB/sec
SequentialSemaphoreAsMutexBenchmark.benchmark:·gc.churn.G1_Old_Gen.norm     avgt   10        25.778 ±      123.241    B/op
SequentialSemaphoreAsMutexBenchmark.benchmark:·gc.count                     avgt   10         2.000                 counts
SequentialSemaphoreAsMutexBenchmark.benchmark:·gc.time                      avgt   10        22.000                     ms

qwwdfsad

I'm quite okay with the general idea. Please postpone it from merge though, I'll evaluate it once channels are properly reviewed

ndkoval · 2022-11-28T09:47:54Z

Will be delivered along with #3103

ndkoval · 2023-02-13T14:51:42Z

Let's keep the separation into two commits: the first that fixes/adds benchmarks, and the second that optimizes the cancellation handling mechanism.

qwwdfsad · 2023-02-13T17:32:37Z

Could you please show before/after on ChannelSinkBenchmark?

ndkoval · 2023-02-15T12:57:34Z

See below the results on my laptop (MacBook Pro 16-inch, 2021, Apple M1 Max, 64 GB, OpenJDK 64-Bit Server VM Zulu19.32+13-CA).

WITHOUT the optimization:

Benchmark                                                                             Mode  Cnt        Score         Error   Units
ChannelSinkBenchmark.channelPipeline                                                  avgt    5        1.375 ±       0.023   ms/op
ChannelSinkBenchmark.channelPipeline:·gc.alloc.rate.norm                              avgt    5   668370.560 ±     256.516    B/op
ChannelSinkBenchmark.channelPipelineOneThreadLocal                                    avgt    5        1.756 ±       0.012   ms/op
ChannelSinkBenchmark.channelPipelineOneThreadLocal:·gc.alloc.rate.norm                avgt    5   668468.144 ±     296.998    B/op
ChannelSinkBenchmark.channelPipelineTwoThreadLocals                                   avgt    5        2.501 ±       0.123   ms/op
ChannelSinkBenchmark.channelPipelineTwoThreadLocals:·gc.alloc.rate.norm               avgt    5  1668726.477 ±     115.551    B/op
ChannelSinkNoAllocationsBenchmark.channelPipeline                                     avgt    5        6.081 ±       0.140   ms/op
ChannelSinkNoAllocationsBenchmark.channelPipeline:·gc.alloc.rate.norm                 avgt    5  3426068.483 ±     354.374    B/op

WITH the optimization:

Benchmark                                                                              Mode  Cnt        Score         Error   Units
ChannelSinkBenchmark.channelPipeline                                                   avgt    5        1.248 ±       0.013   ms/op
ChannelSinkBenchmark.channelPipeline:·gc.alloc.rate.norm                               avgt    5   488344.851 ±     129.550    B/op
ChannelSinkBenchmark.channelPipelineOneThreadLocal                                     avgt    5        1.681 ±       0.031   ms/op
ChannelSinkBenchmark.channelPipelineOneThreadLocal:·gc.alloc.rate.norm                 avgt    5   488460.940 ±     258.884    B/op
ChannelSinkBenchmark.channelPipelineTwoThreadLocals                                    avgt    5        2.518 ±       0.027   ms/op
ChannelSinkBenchmark.channelPipelineTwoThreadLocals:·gc.alloc.rate.norm                avgt    5  1493167.804 ±     116.414    B/op
ChannelSinkNoAllocationsBenchmark.channelPipeline                                      avgt    5        5.971 ±       0.760   ms/op
ChannelSinkNoAllocationsBenchmark.channelPipeline:·gc.alloc.rate.norm                  avgt    5  1025957.479 ±     274.255    B/op

ndkoval · 2023-02-15T13:01:22Z

These benchmarks do not show performance improvement but clearly show reduced allocations (more than 3x on ChannelSinkNoAllocationsBenchmark).

qwwdfsad · 2023-02-15T17:54:19Z

Nice! I'm looking into that

…annelSinkBenchmark` that supports buffered channels and pre-allocates elements. Signed-off-by: Nikita Koval <[email protected]>

…ments Signed-off-by: Nikita Koval <[email protected]>

qwwdfsad

LGTM. Will wait for the tests to run and merge

ndkoval force-pushed the optimize-invoke-on-cancellation branch from aeeea94 to e903287 Compare December 14, 2021 17:57

ndkoval requested a review from qwwdfsad December 14, 2021 18:55

ndkoval marked this pull request as ready for review December 15, 2021 10:54

qwwdfsad added the postponed label Dec 16, 2021

qwwdfsad added the for 1.7 label May 30, 2022

ndkoval mentioned this pull request Jun 1, 2022

New select and Mutex algorithms #3020

Merged

ndkoval force-pushed the optimize-invoke-on-cancellation branch from 001e766 to 1092fca Compare August 3, 2022 19:13

qwwdfsad reviewed Aug 4, 2022

View reviewed changes

qwwdfsad mentioned this pull request Oct 18, 2022

Introduce fast and scalable channels #3103

Merged

ndkoval closed this Nov 28, 2022

ndkoval reopened this Feb 6, 2023

ndkoval force-pushed the optimize-invoke-on-cancellation branch from 1092fca to 638760f Compare February 10, 2023 20:10

ndkoval changed the base branch from develop to fast-channels February 10, 2023 20:14

qwwdfsad mentioned this pull request Feb 13, 2023

Fast and scalable channels algorithm #3621

Closed

ndkoval force-pushed the optimize-invoke-on-cancellation branch 2 times, most recently from c7042e2 to f8af950 Compare February 13, 2023 14:48

ndkoval changed the base branch from fast-channels to develop February 13, 2023 14:48

ndkoval requested a review from qwwdfsad February 13, 2023 14:48

ndkoval force-pushed the optimize-invoke-on-cancellation branch 3 times, most recently from 9ed3b1b to 7317fe5 Compare February 15, 2023 12:57

qwwdfsad removed the postponed label Feb 15, 2023

Add a sequential semaphore benchmark and a generalized version of `Ch…

92cbf6d

…annelSinkBenchmark` that supports buffered channels and pre-allocates elements. Signed-off-by: Nikita Koval <[email protected]>

Optimize CancellableContinuationImpl.invokeOnCancellation(..) for Seg…

8202abf

…ments Signed-off-by: Nikita Koval <[email protected]>

ndkoval force-pushed the optimize-invoke-on-cancellation branch from 7317fe5 to 8202abf Compare February 23, 2023 13:09

~simplify code

2dea136

qwwdfsad approved these changes Feb 27, 2023

View reviewed changes

qwwdfsad mentioned this pull request Feb 27, 2023

Investigate whether we still need BeforeResumeCancelHandler #3646

Closed

qwwdfsad merged commit 2da6817 into develop Feb 28, 2023

qwwdfsad deleted the optimize-invoke-on-cancellation branch February 28, 2023 14:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize `CancellableContinuationImpl.invokeOnCancellation(..)` for `Segment`s #3084

Optimize `CancellableContinuationImpl.invokeOnCancellation(..)` for `Segment`s #3084

ndkoval commented Dec 14, 2021 •

edited

Loading

ndkoval commented Dec 14, 2021

qwwdfsad left a comment

ndkoval commented Nov 28, 2022

ndkoval commented Feb 13, 2023

qwwdfsad commented Feb 13, 2023

ndkoval commented Feb 15, 2023 •

edited

Loading

ndkoval commented Feb 15, 2023 •

edited

Loading

qwwdfsad commented Feb 15, 2023

qwwdfsad left a comment

Optimize CancellableContinuationImpl.invokeOnCancellation(..) for Segments #3084

Optimize CancellableContinuationImpl.invokeOnCancellation(..) for Segments #3084

Conversation

ndkoval commented Dec 14, 2021 • edited Loading

ndkoval commented Dec 14, 2021

qwwdfsad left a comment

Choose a reason for hiding this comment

ndkoval commented Nov 28, 2022

ndkoval commented Feb 13, 2023

qwwdfsad commented Feb 13, 2023

ndkoval commented Feb 15, 2023 • edited Loading

ndkoval commented Feb 15, 2023 • edited Loading

qwwdfsad commented Feb 15, 2023

qwwdfsad left a comment

Choose a reason for hiding this comment

Optimize `CancellableContinuationImpl.invokeOnCancellation(..)` for `Segment`s #3084

Optimize `CancellableContinuationImpl.invokeOnCancellation(..)` for `Segment`s #3084

ndkoval commented Dec 14, 2021 •

edited

Loading

ndkoval commented Feb 15, 2023 •

edited

Loading

ndkoval commented Feb 15, 2023 •

edited

Loading