Add delegate for collecting eventloop tick metrics #2608

hamzahrmalik · 2023-12-13T09:54:37Z

Add delegate for collecting eventloop tick metrics

Motivation:

Users need a way to monitor how eventloops are running, to detect problems such as the eventloop being blocked

Modifications:

Add a delegate to SelectableEventLoop and multithreadedEventLoop which is called on every tick. The delegate is given information about the tick

weissi · 2023-12-15T13:05:46Z

Sources/NIOPosix/SelectableEventLoop.swift

+}
+
+/// Implement this delegate to receive information about the EventLoop, such as each tick
+public protocol EventLoopDelegate {


needs a NIO prefix to satisfy the public API guarantees NIO has

I'd call this NIOEventLoopMetricsDelegate or something

weissi · 2023-12-15T13:05:59Z

Sources/NIOPosix/MultiThreadedEventLoopGroup.swift

@@ -76,6 +76,7 @@ public final class MultiThreadedEventLoopGroup: EventLoopGroup {
 canEventLoopBeShutdownIndividually: Bool,
 selectorFactory: @escaping () throws -> NIOPosix.Selector<NIORegistration>,
 initializer: @escaping ThreadInitializer,
+ delegate: EventLoopDelegate?,


this this should be metricsDelegate or similar

What's also odd is that we seem to have one delegate which gets metrics from multiple EventLoops in that EventLoopGroup.

To make this useful to users we should aggregate the metrics from all loops in a group, no? If we don't give the user to stats from all loops collectively, then the user would have to perform the aggregation themselves -- likely with locks across all loops -- which isn't ideal.

weissi · 2023-12-15T13:11:00Z

Sources/NIOPosix/SelectableEventLoop.swift

@@ -30,6 +30,27 @@ internal func withAutoReleasePool<T>(_ execute: () throws -> T) rethrows -> T {
 #endif
 }

+/// Information about an EventLoop tick
+public struct EventLoopTickInfo {


needs NIO prefix

lacks identifier for EventLoop

as said above, I think we should probably periodically give the user one value which contains information from all loops in a group together. Something like

public struct NIOEventLoopMetrics { public struct LoopMetric { var loopID: ... var numberOfTasks: Int var startTime: NIODeadline } public var loopMetrics: [LoopMetric] }

I know this raises a lot of questions but I think we should think through how a user would use this and then settle on what NIO should provide. Just calling the delegate on each loop for every tick is easy but requires a lot of heavy-lifting on the user's part which seems bound to go wrong, no?

IMO the trade-off falls the other way: we should offer the lowest level thing, which lets expert users get what they need, first. We can then do the engineering to produce something better. But frankly we've spent years not exposing these metrics because we wanted to do it right, and I'm a bit sick of stumbling around in the dark. So if we can get something good enough cheaply, we should.

I agree with Cory here. Let's do the simplest approach and just expose a delegate that gets called every tick and users have to do the aggregation. Over time we will see what kind of aggregation users are doing and provide pre-backed delegates that give them an even simpler interface

Lukasa

I'm generally happy with this approach @hamzahrmalik. I've left a few notes in the diff but I think it's a good start. Now we just need some tests!

Lukasa · 2024-02-29T14:22:19Z

Sources/NIOPosix/MultiThreadedEventLoopGroup.swift

 selectorFactory: @escaping () throws -> NIOPosix.Selector<NIORegistration>) {
 precondition(numberOfThreads > 0, "numberOfThreads must be positive")
 let initializers: [ThreadInitializer] = Array(repeating: { _ in }, count: numberOfThreads)
 self.init(threadInitializers: initializers,
 canBeShutDown: canBeShutDown,
 threadNamePrefix: threadNamePrefix,
+ metricsDelegate: metricsDelegate,


Nit: indentation.

Lukasa · 2024-02-29T14:23:27Z

Sources/NIOPosix/SelectableEventLoop.swift

@@ -30,6 +30,30 @@ internal func withAutoReleasePool<T>(_ execute: () throws -> T) rethrows -> T {
 #endif
 }

+/// Information about an EventLoop tick
+public struct NIOEventLoopTickInfo {


Let's make this Sendable at the very least. Maybe Hashable too.

Lukasa · 2024-02-29T14:23:59Z

Sources/NIOPosix/SelectableEventLoop.swift

+}
+
+/// Implement this delegate to receive information about the EventLoop, such as each tick
+public protocol NIOEventLoopMetricsDelegate {


This type needs to be Sendable.

Lukasa · 2024-02-29T14:25:36Z

Sources/NIOPosix/SelectableEventLoop.swift

@@ -526,6 +557,7 @@ Further information:
 }

 // Execute all the tasks that were submitted
+ tasksProcessedInTick += self.tasksCopy.count


We should be a little careful here. It's not really practically possible, but conceptually this addition can trap. We probably want to replace this with an addingReportingOverflow that, if we overflow, saturates instead.

Lukasa · 2024-02-29T14:28:13Z

Sources/NIOPosix/SelectableEventLoop.swift

@@ -547,6 +579,8 @@ Further information:
 // Drop everything (but keep the capacity) so we can fill it again on the next iteration.
 self.tasksCopy.removeAll(keepingCapacity: true)
 }
+ let tickInfo = NIOEventLoopTickInfo(eventLoopID: ObjectIdentifier(self), numberOfTasks: tasksProcessedInTick, startTime: tickStartTime)


We can optimize this and create the loop ID only once. The optimizer might do this for us, but it's nicer not to rely on it.

Lukasa

Basically good, just a doc note.

Lukasa · 2024-03-26T09:43:05Z

Sources/NIOPosix/SelectableEventLoop.swift

+/// Implement this delegate to receive information about the EventLoop, such as each tick
+public protocol NIOEventLoopMetricsDelegate: Sendable {
+ /// Called after a tick has run
+ /// This function is called after every tick - avoid long-running tasks here


Let's be really clear in this API, add a big warning block that says something like: "This function is called after every event loop tick and on the event loop thread. Any non-trivial work in this function will block the event loop and cause latency increases and performance degradation."

Lukasa

Looks like you have a flaky test here:

14:00:14 /swift-nio/Tests/NIOPosixTests/EventLoopMetricsDelegateTests.swift:55: error: EventLoopMetricsDelegateTests.testMetricsDelegateTickInfo : XCTAssertEqual failed: ("2") is not equal to ("1") -
14:00:14 /swift-nio/Tests/NIOPosixTests/EventLoopMetricsDelegateTests.swift:57: error: EventLoopMetricsDelegateTests.testMetricsDelegateTickInfo : XCTAssertEqual failed: ("Optional(1)") is not equal to ("Optional(2)") -

hamzahrmalik force-pushed the el_metrics branch from 2bb268d to e8301e1 Compare December 13, 2023 10:01

weissi reviewed Dec 15, 2023

View reviewed changes

hamzahrmalik force-pushed the el_metrics branch 3 times, most recently from 0786fc6 to 1c86a03 Compare January 9, 2024 13:58

hamzahrmalik marked this pull request as ready for review January 9, 2024 14:02

Lukasa reviewed Feb 29, 2024

View reviewed changes

hamzahrmalik requested a review from Lukasa March 4, 2024 17:11

hamzahrmalik force-pushed the el_metrics branch from 6645150 to e2a4a9a Compare March 4, 2024 18:36

Add delegate for collecting eventloop tick metrics

599aeae

hamzahrmalik force-pushed the el_metrics branch from e2a4a9a to 599aeae Compare March 19, 2024 12:09

Lukasa reviewed Mar 26, 2024

View reviewed changes

hamzahrmalik added 2 commits March 26, 2024 10:11

add warning to docc

12b0474

add missing headers

395cc2c

Lukasa approved these changes Mar 26, 2024

View reviewed changes

Lukasa added 2 commits March 26, 2024 13:40

Merge branch 'main' into el_metrics

752bc8e

Merge branch 'main' into el_metrics

d1084d4

Lukasa enabled auto-merge (squash) March 26, 2024 13:55

Lukasa requested changes Mar 26, 2024

View reviewed changes

fix test

3c0984a

Lukasa added the 🔼 needs-minor-version-bump For PRs that when merged cause a bump of the minor version, ie. 1.x.0 -> 1.(x+1).0 label Mar 27, 2024

Lukasa approved these changes Mar 27, 2024

View reviewed changes

Lukasa merged commit 082ac21 into apple:main Mar 27, 2024
8 of 9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add delegate for collecting eventloop tick metrics #2608

Add delegate for collecting eventloop tick metrics #2608

hamzahrmalik commented Dec 13, 2023

weissi Dec 15, 2023

weissi Dec 15, 2023

weissi Dec 15, 2023

weissi Dec 15, 2023

Lukasa Dec 19, 2023

FranzBusch Jan 5, 2024

Lukasa left a comment

Lukasa Feb 29, 2024

Lukasa Feb 29, 2024

Lukasa Feb 29, 2024

Lukasa Feb 29, 2024

Lukasa Feb 29, 2024

Lukasa left a comment

Lukasa Mar 26, 2024

Lukasa left a comment •

edited

Add delegate for collecting eventloop tick metrics #2608

Add delegate for collecting eventloop tick metrics #2608

Conversation

hamzahrmalik commented Dec 13, 2023

Motivation:

Modifications:

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Lukasa left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Lukasa left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Lukasa left a comment • edited

Choose a reason for hiding this comment

Lukasa left a comment •

edited