Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(timers): Refactor timers to use one async op per timer #12862

Merged
merged 16 commits into from
Dec 7, 2021

Conversation

andreubotella
Copy link
Contributor

@andreubotella andreubotella commented Nov 22, 2021

This change also makes the timers implementation closer to the spec, and sets up the stage to implement AbortSignal.timeout() (whatwg/dom#1032).

Fixes #8965.
Fixes #10974.
Fixes #11398.

@andreubotella andreubotella marked this pull request as draft November 22, 2021 22:11
ext/timers/lib.rs Outdated Show resolved Hide resolved
@andreubotella
Copy link
Contributor Author

#12908 and #12913 must be merged before this PR, since they fix the CI failures.

This change also makes the timers implementation closer to the spec, and
sets up the stage to implement `AbortSignal.timeout()`
(whatwg/dom#1032).

Fixes denoland#8965.
Fixes denoland#10974.
Fixes denoland#11398.
@andreubotella andreubotella changed the title [WIP] Rewrite the timers implementation refactor(timers): Refactor timers to use one async op per timer Nov 28, 2021
@andreubotella andreubotella marked this pull request as ready for review November 28, 2021 15:59
@andreubotella
Copy link
Contributor Author

It seems like the httpConnAutoCloseDelayedOnUpgrade and httpServerDeleteRequestHasBody tests are flaky with this implementation of timers – they occasionally fail with pending ops. Since it used to be that timers were clamped to a minimum of 4ms everywhere, and now they're clamped to 0ms when the timer nesting level is lower than 5, it might be that we need to set the op sanitizer delay to 4ms.

@bartlomieju
Copy link
Member

@lucacasonato please advise

@lucacasonato
Copy link
Member

lucacasonato commented Nov 30, 2021

it might be that we need to set the op sanitizer delay to 4ms.

I don't think so. That sounds like a hack. The tests are probably not correct if they fail if the timer is actually 0 sec (1 evt loop turn).

@andreubotella
Copy link
Contributor Author

andreubotella commented Dec 2, 2021

I've added back the code in runAfterTimeout that waits to run the callback until all previous timers have resolved. Rather than an array from which entries are spliced out, I made it a linked list, but this might make the code less readable.

This should unblock #12963 and perhaps #12953. cc @kt3k

Copy link
Member

@bartlomieju bartlomieju left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andreubotella thank you for the PR. It fixes a lot of long outstanding bugs which is great. At the same time it's touching a crucial piece of infrastructure so special care must be taken when reviewing and landing it.

The implementation mostly looks good to me, I have a few question for bits that are not immediately obvious to me.

Before landing I'd like to get more eyes on it.

@bnoordhuis @lucacasonato please review

cli/tests/unit/timers_test.ts Outdated Show resolved Hide resolved
ext/timers/lib.rs Show resolved Hide resolved
ext/timers/01_timers.js Show resolved Hide resolved
ext/timers/01_timers.js Show resolved Hide resolved
ext/timers/01_timers.js Outdated Show resolved Hide resolved
Copy link
Contributor

@bnoordhuis bnoordhuis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR, Andreu. I can't really intuit how this affects performance. Do we have timer benchmarks?

cli/tests/unit/timers_test.ts Outdated Show resolved Hide resolved
// 4. If timeout is less than 0, then set timeout to 0.
// 5. If nesting level is greater than 5, and timeout is less than 4, then
// set timeout to 4.
timeout = MathMax(timeout, (timerNestingLevel > 5) ? 4 : 0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two comments:

  1. 5 is a magic constant. Why not 4, 6 or 42? The comment doesn't explain it.

  2. I find it harder to read than the equivalent if statement (which is shorter to boot):

if (timerNestingLevel > 5 && timeout < 4) timeout = 4;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note step 4 and the ternary operator. The equivalent would be:

if (timeout < 0) timeout = 0;
if (timerNestingLevel > 5 && timeout < 4) timeout = 4;

Copy link
Contributor Author

@andreubotella andreubotella Dec 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

5 is a magic constant. Why not 4, 6 or 42? The comment doesn't explain it.

It's spec-mandated. I suspect due to compat concerns.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if (timeout < 0) timeout = 0;
if (timerNestingLevel > 5 && timeout < 4) timeout = 4;

This one is a bit easier to read, so maybe let's go with it?

It's spec-mandated. I suspect due to compat concerns.

Right, can you add a comment or reference to the spec here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, can you add a comment or reference to the spec here?

This comment is the spec text:

https://github.com/andreubotella/deno/blob/timers/ext/timers/01_timers.js#L131-L136

ext/timers/01_timers.js Show resolved Hide resolved
ext/timers/01_timers.js Show resolved Hide resolved
ext/timers/01_timers.js Outdated Show resolved Hide resolved
/**
* A doubly linked list of timers.
* @type { { head: ScheduledTimer | null, tail: ScheduledTimer | null } } */
const scheduledTimers = { head: null, tail: null };
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can get rid of all the null checks if you turn this into a list head:

// ideally give the list head and timer objects the same object shape
// to keep property reads and writes monomorphic
const scheduledTimers = { head: null, tail: null };
scheduledTimers.head = scheduledTimers;
scheduledTimers.tail = scheduledTimers;

function isEmpty() {
  return scheduledTimers.tail === scheduledTimers;
}

function append(timer) {
  timer.prev = scheduledTimers.tail;
  timer.next = scheduledTimers;
  scheduledTimers.tail.next = timer;
  scheduledTimers.tail = timer;
}

function remove(timer) {
  timer.prev.next = timer.next;
  timer.next.prev = timer.prev;
  // timer.prev = timer.next = null;
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andreubotella do you want to address this bit before landing?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice to have now and I'd definitely want to see it addressed eventually but I'm okay with landing this as-is.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems harder to reason through, especially if we go all the way and make scheduledTimers monomorphic, since scheduledTimers.prev would be the tail and scheduledTimers.next would be the head. And if we keep the property names head and tail, we'd still need checks to append and remove.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's good enough for nginx, libuv, linux and the BSDs, then it's certainly good enough for deno. :-)

(Also node - but I'm reasonably certain I was the one who introduced that pattern there.)

}
// 1.
PromisePrototypeThen(
core.opAsync("op_sleep", millis, cancelRid),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This means there's an op call per timer? To keep overhead down JS timers would ideally all hang off a single tokio::time::sleep().

Copy link
Contributor Author

@andreubotella andreubotella Dec 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The trade-off here is either:

a) Having at most one pending timer op at a time, but having to call into Rust to start another op whenever a timer expires.
b) Only calling into Rust when starting or canceling timers, but having multiple pending timer ops.

I haven't measured the difference between these, and due to the very nature of timers, measuring performance isn't easy. Also, if we compare with the current timers implementation, there's the additional confounding factor of #10974. So I'm not sure how to benchmark this.

But note that these async ops do very little work, and I expect that the work of checking the time would be done by tokio's global timer, not when polling the Sleep futures.

And in any case, unrefing timers would be much easier with b) – see #12953.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that we should go with this approach, especially in light of ref/unref for timers.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Node has regular JS timers hanging off a single C++ timer (N->1, common case) and unref'd JS timers backed by individual C++ timers (N->N, uncommon case.)

For Node, that's a worthwhile optimization because programs often have thousands of active timers, usually one or more per TCP connection.

For Deno, I'm not sure. TBH, I don't know how (or if) we deal with read/write timeouts or slowloris style attacks.

At any rate, it's an area worth investigating but it doesn't have to hold up this PR.

setTimeout(() => array.push(6));
}, 0);

await delay(100);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks still a little bit flaky from my observation. ref: https://github.com/denoland/deno/runs/4418361252?check_suite_focus=true

How about creating a promise for each setTimeout call and waiting for all of them instead of delay(N)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like a genuine bug. A race would be missing [4,5,6] but it's only missing 6. Merits deeper investigation.

Copy link
Contributor Author

@andreubotella andreubotella Dec 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the spec, a timer mustn't go off before other timers that were both scheduled before it and have a lesser or equal timeout. So the array must not be out of order, and the delay must expire after 3 is pushed, but the delay doesn't necessarily have to go off after any of the 4/5/6 timers do. And that's what I find when I test this – either [1,2,3], [1,2,3,4], [1,2,3,4,5] or [1,2,3,4,5,6], but not [1,2] or anything out of order.

Note that the spec doesn't say that a timeout of 0 must queue the action in the timer task queue immediately – step 5.3 in "run steps after a timeout" allows for an arbitrary additional wait. And it seems like an op_sleep with a timeout of 0 doesn't necessarily resolve at the next event loop turn. Should we queue the task immediately in that case?

In any case, I'll be taking @kt3k's suggestion.

Copy link
Contributor Author

@andreubotella andreubotella Dec 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we queue the task immediately in that case?

Come to think about it, no. This would let timers starve other parts of the event loop. Whereas if timers are always backed by ops, the earliest that a task can be pushed to the timers queue is on the next event loop turn.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't follow your line of reasoning.

We have timers A, B and C that expire at time T+0, who arm three more timers D, E, and F that expire at time T+1 (or T+4, doesn't matter), and a delay G that expires at time T+100.

Under normal conditions, they expire in order A to G because 0 < 1 < 100.

Under abnormal conditions, say an event loop stall or an OS scheduler fluke that pauses the program until T+200, then either A-G or A-C, G, D-F are reasonable behavior.

But A-E, G, F? No, never.


A timer's expiry time is set to now() + timeout rather than event_loop_tick_start_time() + timeout. Put another way: timers have no shared time base.

Under normal conditions that doesn't matter because computers are fast: now() doesn't change meaningfully in the time it takes to execute an event loop "tick".

But if there is a stall of duration N between B and C, then there is an equivalent gap between E and F: E expires at time T+1 but F at time T+N.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know how tokio's timers are implemented –I have tried to follow the code and can't really make sense of it–, or how they interact with FuturesUnordered –which is probably more relevant for how they show up here. The spec requires that timers don't resolve before now() + timeout –which I'm assuming tokio does correctly–, and it also requires that a timer doesn't resolve before any timers that were scheduled before it and that have a lower or equal timeout. The code around scheduledTimers guarantees that that condition holds, but it doesn't guarantee anything else. And, purely in terms of the spec requirements, A-E, G, F is valid, because D-F are not required to be added to the timer task queue the instant setTimeout() is called.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For posterity: Bartek and I talked this through OOB.

IMO, it's a quality-of-implementation issue - expiry order should follow the Principle of Least Surprise, regardless of how relaxed the spec is - but it's pre-existing, not a regression, and therefore doesn't need to hold up this PR.

Copy link
Contributor

@bnoordhuis bnoordhuis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM if the test failure that @kt3k pointed out can be resolved.

}
// 1.
PromisePrototypeThen(
core.opAsync("op_sleep", millis, cancelRid),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Node has regular JS timers hanging off a single C++ timer (N->1, common case) and unref'd JS timers backed by individual C++ timers (N->N, uncommon case.)

For Node, that's a worthwhile optimization because programs often have thousands of active timers, usually one or more per TCP connection.

For Deno, I'm not sure. TBH, I don't know how (or if) we deal with read/write timeouts or slowloris style attacks.

At any rate, it's an area worth investigating but it doesn't have to hold up this PR.

/**
* A doubly linked list of timers.
* @type { { head: ScheduledTimer | null, tail: ScheduledTimer | null } } */
const scheduledTimers = { head: null, tail: null };
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice to have now and I'd definitely want to see it addressed eventually but I'm okay with landing this as-is.

@bnoordhuis bnoordhuis merged commit 33da15a into denoland:main Dec 7, 2021
@bnoordhuis
Copy link
Contributor

Thanks for the PR, Andreu!

@andreubotella andreubotella deleted the timers branch December 9, 2021 08:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants