`IceAgent`'s `poll_timeout`and `handle_timeout` disagree by `TIMING_ADVANCE` #514

thomaseizinger · 2024-05-20T06:41:14Z

str0m currently has a hard-coded value of 50ms that will apply to several return paths of poll_timeout. This means, there is a discrepancy between poll_timeout and handle_timeout.

If I add candidates to the IceAgent it will form pairs and it will send STUN requests to those after calling handle_timeout. However, poll_timeout adds an additional 50ms to the last known now value, meaning 50ms first have to pass before handle_timeout will actually start sending STUN requests.

Based on the docs, it appears that these 50ms actually come from the ICE spec but it also says that we MAY use a different value. Would there be a problem in removing these 50ms?

This problem came up because I am currently writing property-based tests for our entire state machine (https://proptest-rs.github.io/proptest/proptest/state-machine.html) and there, I am advancing time using a separate state transition from the packets that are being sent. In this test, I have a loop that exhausts the current state machine which in this case doesn't work: We get stuck half-way through establishing a connection because we'd need to advance time by an additional 50ms before str0m would start sending STUN requests.

The text was updated successfully, but these errors were encountered:

algesten · 2024-05-20T07:01:09Z

Hi @thomaseizinger (welcome back! 😉)

So far, the 50ms seems to work well for our real-world use cases. If I understand this correctly, the problem arises in test cases, rather than real world scenarios?

I'm happy to take a PR that makes the timing advance configurable. I don't know what happens if you set that to 0, but if you make it configurable, you can of course experiment without affecting anything else.

thomaseizinger · 2024-05-20T07:13:48Z

Hi @thomaseizinger (welcome back! 😉)

Thank you! :)

So far, the 50ms seems to work well for our real-world use cases. If I understand this correctly, the problem arises in test cases, rather than real world scenarios?

It manifests in the tests yes. In reality, 50ms are so short that it probably doesn't make a difference.

What I don't really understand is what it achieves? My expectation would be that, if handle_timeout has "work" to do, then poll_timeout should return last_now and not last_now + 50ms. What do we gain from deferring things we could be doing immediately by another 50ms?

I'm happy to take a PR that makes the timing advance configurable. I don't know what happens if you set that to 0, but if you make it configurable, you can of course experiment without affecting anything else.

That sounds good, I'll send one!

algesten · 2024-05-20T15:35:55Z

What I don't really understand is what it achieves?

I guess the Ice Agent was among the first things I implemented in str0m. With time being driven from outside, I remember having a thought that I needed time to always go forward. This has been abandoned in the rest of the code base however.

Be interesting to know what happens if you set it to 0.

thomaseizinger · 2024-05-20T21:10:19Z

What I don't really understand is what it achieves?

I guess the Ice Agent was among the first things I implemented in str0m. With time being driven from outside, I remember having a thought that I needed time to always go forward. This has been abandoned in the rest of the code base however.

Be interesting to know what happens if you set it to 0.

I ran all the tests with 0 and they pass.

From my own experience, it is easy to create bugs with the SANS-IO model where poll_timeout keeps returning the same instant even after handle_timeout is called. That then leads to a busy-loop of the eventloop and everything gets stuck. Such a bug gets avoided / masked if we always increment time in at least 50ms steps.

I don't think we need it though. I don't have anything like that in our system and the amount of sans-io code there keeps growing :)

I'll set it to 0 using the linked patch and report back. If it doesn't cause any issues, we can perhaps delete it?

algesten · 2024-05-21T06:19:36Z

I'll set it to 0 using the linked patch and report back. If it doesn't cause any issues, we can perhaps delete it?

Yeah, maybe. I'd need to test this in str0m (rtc) to ensure we don't rely on this being the main "driver of time forward". That responsibility could of course move up the stack (probably into Rtc).

thomaseizinger mentioned this issue May 20, 2024

feat: make timing advance (Ta) configurable #515

Merged

algesten closed this as completed in #515 May 20, 2024

thomaseizinger mentioned this issue May 22, 2024

test(connlib): simulate IO in state machine tests firezone/firezone#4728

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`IceAgent`'s `poll_timeout`and `handle_timeout` disagree by `TIMING_ADVANCE` #514

`IceAgent`'s `poll_timeout`and `handle_timeout` disagree by `TIMING_ADVANCE` #514

thomaseizinger commented May 20, 2024

algesten commented May 20, 2024

thomaseizinger commented May 20, 2024

algesten commented May 20, 2024

thomaseizinger commented May 20, 2024

algesten commented May 21, 2024

IceAgent's poll_timeoutand handle_timeout disagree by TIMING_ADVANCE #514

IceAgent's poll_timeoutand handle_timeout disagree by TIMING_ADVANCE #514

Comments

thomaseizinger commented May 20, 2024

algesten commented May 20, 2024

thomaseizinger commented May 20, 2024

algesten commented May 20, 2024

thomaseizinger commented May 20, 2024

algesten commented May 21, 2024

`IceAgent`'s `poll_timeout`and `handle_timeout` disagree by `TIMING_ADVANCE` #514

`IceAgent`'s `poll_timeout`and `handle_timeout` disagree by `TIMING_ADVANCE` #514