Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IceAgent's poll_timeoutand handle_timeout disagree by TIMING_ADVANCE #514

Closed
thomaseizinger opened this issue May 20, 2024 · 5 comments · Fixed by #515
Closed

IceAgent's poll_timeoutand handle_timeout disagree by TIMING_ADVANCE #514

thomaseizinger opened this issue May 20, 2024 · 5 comments · Fixed by #515

Comments

@thomaseizinger
Copy link
Collaborator

str0m currently has a hard-coded value of 50ms that will apply to several return paths of poll_timeout. This means, there is a discrepancy between poll_timeout and handle_timeout.

If I add candidates to the IceAgent it will form pairs and it will send STUN requests to those after calling handle_timeout. However, poll_timeout adds an additional 50ms to the last known now value, meaning 50ms first have to pass before handle_timeout will actually start sending STUN requests.

Based on the docs, it appears that these 50ms actually come from the ICE spec but it also says that we MAY use a different value. Would there be a problem in removing these 50ms?

This problem came up because I am currently writing property-based tests for our entire state machine (https://proptest-rs.github.io/proptest/proptest/state-machine.html) and there, I am advancing time using a separate state transition from the packets that are being sent. In this test, I have a loop that exhausts the current state machine which in this case doesn't work: We get stuck half-way through establishing a connection because we'd need to advance time by an additional 50ms before str0m would start sending STUN requests.

@algesten
Copy link
Owner

Hi @thomaseizinger (welcome back! 😉)

So far, the 50ms seems to work well for our real-world use cases. If I understand this correctly, the problem arises in test cases, rather than real world scenarios?

I'm happy to take a PR that makes the timing advance configurable. I don't know what happens if you set that to 0, but if you make it configurable, you can of course experiment without affecting anything else.

@thomaseizinger
Copy link
Collaborator Author

Hi @thomaseizinger (welcome back! 😉)

Thank you! :)

So far, the 50ms seems to work well for our real-world use cases. If I understand this correctly, the problem arises in test cases, rather than real world scenarios?

It manifests in the tests yes. In reality, 50ms are so short that it probably doesn't make a difference.

What I don't really understand is what it achieves? My expectation would be that, if handle_timeout has "work" to do, then poll_timeout should return last_now and not last_now + 50ms. What do we gain from deferring things we could be doing immediately by another 50ms?

I'm happy to take a PR that makes the timing advance configurable. I don't know what happens if you set that to 0, but if you make it configurable, you can of course experiment without affecting anything else.

That sounds good, I'll send one!

@algesten
Copy link
Owner

What I don't really understand is what it achieves?

I guess the Ice Agent was among the first things I implemented in str0m. With time being driven from outside, I remember having a thought that I needed time to always go forward. This has been abandoned in the rest of the code base however.

Be interesting to know what happens if you set it to 0.

@thomaseizinger
Copy link
Collaborator Author

What I don't really understand is what it achieves?

I guess the Ice Agent was among the first things I implemented in str0m. With time being driven from outside, I remember having a thought that I needed time to always go forward. This has been abandoned in the rest of the code base however.

Be interesting to know what happens if you set it to 0.

I ran all the tests with 0 and they pass.

From my own experience, it is easy to create bugs with the SANS-IO model where poll_timeout keeps returning the same instant even after handle_timeout is called. That then leads to a busy-loop of the eventloop and everything gets stuck. Such a bug gets avoided / masked if we always increment time in at least 50ms steps.

I don't think we need it though. I don't have anything like that in our system and the amount of sans-io code there keeps growing :)

I'll set it to 0 using the linked patch and report back. If it doesn't cause any issues, we can perhaps delete it?

@algesten
Copy link
Owner

I'll set it to 0 using the linked patch and report back. If it doesn't cause any issues, we can perhaps delete it?

Yeah, maybe. I'd need to test this in str0m (rtc) to ensure we don't rely on this being the main "driver of time forward". That responsibility could of course move up the stack (probably into Rtc).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants