-
Notifications
You must be signed in to change notification settings - Fork 282
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Split brain issue #218
Comments
I think the AutoConnect feature was immediately cancelling any attempts to repair split meshes once every node had three working connections. It might be fixed in commit de7d5a0. |
Hi, I believe this problem took down my production infrastructure (all nodes are Randomly restarting some nodes worked. Before restartAfter restartExplanationIn this network, However, tinc still seemed to count them towards the number of 3 working connections. Questions@gsliepen Can you confirm whether
Thanks! |
Hmm, this doesn't seem to help; even after I specified an explicit each-to-each |
@gsliepen I have now encountered another split-brain problem, even with commit de7d5a0 cherry-picked. In my network of 8 machines, 4 believe in one world view and the other 4 in another one: View 1 (4 machines think this)Other nodes with same view: View 2 (4 other machines think this)Other nodes with same view: Restarting is a workaroundAfter restarting tinc on @gsliepen Any other ideas to prevent this from happening? |
Happened again to me today. I strongly suspect that the I noticed that by observing the following hourly spike patterns in There's no pattern or failures in the non-VPN pings: This does not provide an explanation or fix for the underlying issue (tinc getting netsplit and not recovering), but does provide a method to work around it (setting However, given that incorect keys seem to be what confuses tinc here, the question remains whether externally sent, incorrect keys could also trigger the same problem. |
Once, while tinc network was running and no active intervention was made (no restarts, configuration change, etc.), a lot of nodes suddenly become offline.
When I tried to find out what was happening, I saw the following graph from "dump graph":
.
In fact I found two working subgraphs (when I try to dump graph from node of another subgraph, red becomes green and vice versa).
The network is the mix of 1.1pre17 and 1.1pre16 versions (both subgraphs contain both).
I tried to "reload" different nodes several times. I tried to restart tinc on different nodes several times. Every time, node connects to the same subgraph.
Typical configuration is the following:
The "solution" was to stop tinc daemons on all nodes of one subgraph and start them one by one. After this, every started node connects to another subgraph and joins the full network.
Unfortunately, I do not know how to reproduce this. But, currently I suspect something in AutoConnect feature.
The text was updated successfully, but these errors were encountered: