Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add delegate for collecting eventloop tick metrics #2608

Merged
merged 6 commits into from
Mar 27, 2024

Conversation

hamzahrmalik
Copy link
Contributor

Add delegate for collecting eventloop tick metrics

Motivation:

Users need a way to monitor how eventloops are running, to detect problems such as the eventloop being blocked

Modifications:

Add a delegate to SelectableEventLoop and multithreadedEventLoop which is called on every tick. The delegate is given information about the tick

}

/// Implement this delegate to receive information about the EventLoop, such as each tick
public protocol EventLoopDelegate {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • needs a NIO prefix to satisfy the public API guarantees NIO has
  • I'd call this NIOEventLoopMetricsDelegate or something

@@ -76,6 +76,7 @@ public final class MultiThreadedEventLoopGroup: EventLoopGroup {
canEventLoopBeShutdownIndividually: Bool,
selectorFactory: @escaping () throws -> NIOPosix.Selector<NIORegistration>,
initializer: @escaping ThreadInitializer,
delegate: EventLoopDelegate?,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this this should be metricsDelegate or similar

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's also odd is that we seem to have one delegate which gets metrics from multiple EventLoops in that EventLoopGroup.

To make this useful to users we should aggregate the metrics from all loops in a group, no? If we don't give the user to stats from all loops collectively, then the user would have to perform the aggregation themselves -- likely with locks across all loops -- which isn't ideal.

@@ -30,6 +30,27 @@ internal func withAutoReleasePool<T>(_ execute: () throws -> T) rethrows -> T {
#endif
}

/// Information about an EventLoop tick
public struct EventLoopTickInfo {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • needs NIO prefix
  • lacks identifier for EventLoop
  • as said above, I think we should probably periodically give the user one value which contains information from all loops in a group together. Something like
public struct NIOEventLoopMetrics {
    public struct LoopMetric {
        var loopID: ...
        var numberOfTasks: Int
        var startTime: NIODeadline
    }
    
    public var loopMetrics: [LoopMetric]
}

I know this raises a lot of questions but I think we should think through how a user would use this and then settle on what NIO should provide. Just calling the delegate on each loop for every tick is easy but requires a lot of heavy-lifting on the user's part which seems bound to go wrong, no?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO the trade-off falls the other way: we should offer the lowest level thing, which lets expert users get what they need, first. We can then do the engineering to produce something better. But frankly we've spent years not exposing these metrics because we wanted to do it right, and I'm a bit sick of stumbling around in the dark. So if we can get something good enough cheaply, we should.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with Cory here. Let's do the simplest approach and just expose a delegate that gets called every tick and users have to do the aggregation. Over time we will see what kind of aggregation users are doing and provide pre-backed delegates that give them an even simpler interface

@hamzahrmalik hamzahrmalik force-pushed the el_metrics branch 3 times, most recently from 0786fc6 to 1c86a03 Compare January 9, 2024 13:58
@hamzahrmalik hamzahrmalik marked this pull request as ready for review January 9, 2024 14:02
Copy link
Contributor

@Lukasa Lukasa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm generally happy with this approach @hamzahrmalik. I've left a few notes in the diff but I think it's a good start. Now we just need some tests!

selectorFactory: @escaping () throws -> NIOPosix.Selector<NIORegistration>) {
precondition(numberOfThreads > 0, "numberOfThreads must be positive")
let initializers: [ThreadInitializer] = Array(repeating: { _ in }, count: numberOfThreads)
self.init(threadInitializers: initializers,
canBeShutDown: canBeShutDown,
threadNamePrefix: threadNamePrefix,
metricsDelegate: metricsDelegate,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: indentation.

@@ -30,6 +30,30 @@ internal func withAutoReleasePool<T>(_ execute: () throws -> T) rethrows -> T {
#endif
}

/// Information about an EventLoop tick
public struct NIOEventLoopTickInfo {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make this Sendable at the very least. Maybe Hashable too.

}

/// Implement this delegate to receive information about the EventLoop, such as each tick
public protocol NIOEventLoopMetricsDelegate {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This type needs to be Sendable.

@@ -526,6 +557,7 @@ Further information:
}

// Execute all the tasks that were submitted
tasksProcessedInTick += self.tasksCopy.count
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be a little careful here. It's not really practically possible, but conceptually this addition can trap. We probably want to replace this with an addingReportingOverflow that, if we overflow, saturates instead.

@@ -547,6 +579,8 @@ Further information:
// Drop everything (but keep the capacity) so we can fill it again on the next iteration.
self.tasksCopy.removeAll(keepingCapacity: true)
}
let tickInfo = NIOEventLoopTickInfo(eventLoopID: ObjectIdentifier(self), numberOfTasks: tasksProcessedInTick, startTime: tickStartTime)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can optimize this and create the loop ID only once. The optimizer might do this for us, but it's nicer not to rely on it.

Copy link
Contributor

@Lukasa Lukasa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically good, just a doc note.

/// Implement this delegate to receive information about the EventLoop, such as each tick
public protocol NIOEventLoopMetricsDelegate: Sendable {
/// Called after a tick has run
/// This function is called after every tick - avoid long-running tasks here
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's be really clear in this API, add a big warning block that says something like: "This function is called after every event loop tick and on the event loop thread. Any non-trivial work in this function will block the event loop and cause latency increases and performance degradation."

@Lukasa Lukasa enabled auto-merge (squash) March 26, 2024 13:55
Copy link
Contributor

@Lukasa Lukasa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like you have a flaky test here:

14:00:14 /swift-nio/Tests/NIOPosixTests/EventLoopMetricsDelegateTests.swift:55: error: EventLoopMetricsDelegateTests.testMetricsDelegateTickInfo : XCTAssertEqual failed: ("2") is not equal to ("1") -
14:00:14 /swift-nio/Tests/NIOPosixTests/EventLoopMetricsDelegateTests.swift:57: error: EventLoopMetricsDelegateTests.testMetricsDelegateTickInfo : XCTAssertEqual failed: ("Optional(1)") is not equal to ("Optional(2)") -

@Lukasa Lukasa added the 🔼 needs-minor-version-bump For PRs that when merged cause a bump of the minor version, ie. 1.x.0 -> 1.(x+1).0 label Mar 27, 2024
@Lukasa Lukasa merged commit 082ac21 into apple:main Mar 27, 2024
8 of 9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🔼 needs-minor-version-bump For PRs that when merged cause a bump of the minor version, ie. 1.x.0 -> 1.(x+1).0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants