-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New 5.9.0-dev networking is not allowing legitimate players to connect under some rare circumstances #14765
Comments
The changes in question: #14217. What would help is a verbose log from the server (ideally the client too) that shows the problems being reproduced. |
This is what we had gathered earlier: (The user's name and IP address has been obfuscated just in case she wanted that. I'm OK with sharing the IP address privately but I'm not sure if that would even help in this situation.)
After we applied the fix mentioned in the OP, we got this:
Immediately after, the client stalled at Item Definitions and no new information about the client was logged or sent to the server. After 10 minutes or so, finally the client joined, with no other interesting logs. |
The entire log please, I can't work with fragments. |
This is all I got - I could try running the server in verbose logging and have Jane Doe connect for more information. We ran it in info logging before - is that not good? I hadn't checked that, because I don't know of any other servers that are using 5.9.0-dev with the new networking code. That's a good idea though, we'll check that if we can. |
I checked quickly and we are getting many more timeouts from users with the new networking code as well, around 400-500% more than previously. |
Yes, verbose logging is needed. trace would be even better. |
This comment was marked as resolved.
This comment was marked as resolved.
I got a verbose log: https://downloads-th2.1f616emo.xyz/Sgik-timeout.log.txt The player Sgik timed out because of "outgoing reliables channel=0". At first glance, there seem to be too many pending packets causing Minetest to declare a time-out. #14867 will help identify the "Ran out of sequence numbers" lines. Extract of last few relevant lines (i.e. removing all mapgen-related stuff etc):
|
This PR adds peer ID into "Ran out of sequence numbers" logs. This helps to debug minetest#14765. This PR is ready for review.
Here is another log with 416e323: https://downloads-th2.1f616emo.xyz/fahaad105-timeout.log.txt This one is harder to debug, because two clients, ABDalrhman and fahaad105, joined from the same IP. One left normally, and one timed out. Unlike Sgik, fahaad105 mainly suffers from timed-out RELIABLE, while Sgik suffers from running out of sequence numbers. d502b05 will help to debug ACKed packet errors and sendRequestedMedia requests. |
Minetest version
Irrlicht device
No response
Operating system and version
Ubuntu 20.04
CPU model
No response
GPU model
No response
Active renderer
No response
Summary
Since upgrading our server to Minetest 5.9.0-dev, we have had two cases now about users not being able to log in, but having no difficulty logging into other servers that are on 5.8.0 or a later version.
The first case involves one of our users (we'll call her Jane Doe) that had no problems logging in before, and started having difficulty joining (whenever she joined, she would get a Connection timed out message). After investigating a bit, we applied this patch: https://gitlab.com/tunnelers-abyss/minetest/-/commit/f9ece9553a970dd550bf2f97b7999580ae60f502
This allowed Jane Doe to get past the initial connecting phase, but her client would then stall on Item Definitions. After waiting for 10 or 15 minutes there, her client finally logged in. Once she is in-game, it's mostly smooth sailing. Sometimes it is much faster to login, sometimes it is slower, and when the server was on 5.8.0 she apparently did not have this problem. Her internet is also pretty good, so it's not that, and she doesn't have any firewalls / something similar that might be happening here.
The second case involves a user that has trouble logging in (stuck at Item Definitions again), but when they finally log in, everything moves at a snail's pace around them - almost no packets are being sent/received for some reason - when this user was fine before the upgrade. Additionally, this user has like 80mb up/down internet, which is more than enough...
I don't really have the time to do deep debugging with Wireshark or something similar, which is why tweaking the new checks in the networking code was my first thought. The vast majority of our users have had absolutely no problems, which makes this a bit more difficult to debug.
Steps to reproduce
I really have no idea. If you give me things to try I can test them with Jane Doe, who wants very much to get this issue fixed!
The text was updated successfully, but these errors were encountered: