-
Notifications
You must be signed in to change notification settings - Fork 93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
There is a possibility that the leader is never elected even if the majority of the members are alive #439
Comments
@sile thank you for the detailed report. Please open a pull request with your patch. If your |
@lukebakken Thank you for your response. I will work on fixing this issue and submit a pull request. By the way, if there is a design document about the |
Just passing by, @sile, it is explained in §4.2.3 of the original paper. |
@illotum I wasn't aware of the paper. Thank you for the information! |
Describe the bug
Let me report an issue we encountered while operating our service that uses
ra
.We were operating a 7-node cluster and stopped 3 of them for maintenance. After stopping the 3 nodes, our service became unavailable due to the absence of the Raft leader. It seemed that leader elections were executed periodically, but a new leader was never elected until we restarted member nodes.
I think that, as shown in the following reproduction steps section, this is a subtle bug relating to the
pre_vate
state (ra
original state which is not defined in the Raft paper), and how to fix this is not immediately obvious. Therefore, I think it would be better to leave the resolution of this issue to thera
dev team. However, since this is a critical issue for us, I am willing to create a PR ifra
team does not have enough resources to address this issue.Reproduction steps
Simplified scenario where this issue could occur
I guess a scenario like the following occurred:
ra
cluster consists of 3 members nameda
,b
, andc
c
is the leader with termN
and log indexM
(whereN
andM
are arbitrary integers)a
andb
are infollower
statea
transitions topre_vote
state:a
broadcasts#pre_vote_rpc{ term = N }
b
replies#pre_vote_result{ term = N, vote_granted = true }
toa
a
transitions tocandidate
state with termN + 1
a
broadcasts#request_vote_rpc{ term = N + 1 }
c
processes a command:c
increases local log index toM + 1
, and broadcasts#append_entries_rpc{ term = N }
b
increases local log index toM + 1
, and replies#append_entries_reply{ term = N, success = true }
toc
a
rejects the RPC asa
has a greater term thanc
(i.e., the local log index ofc
does not increase here)c
andb
receive#request_vote_rpc{ term = N + 1 }
froma
(this message was sent during step 2-4):c
transitions tofollower
state (asc
has an smaller term)c
replies#request_vote_result{ vote_granded = false }
as the local log index ofc
is higher thana
b
replies#request_vote_result{ vote_granded = false }
as the local log index ofb
is higher thana
a
initiates new elections but is never chosen as the next leader becausea
has a smaller log indexb
is stoppedc
transitions topre_vote
state:c
broadcasts#pre_vote_rpc{ term = N_ }
N_
is an integer larger thanN
N_
is incremented bya
each timea
initiates a new electiona
ignores#pre_vote_rpc{ term = N_ }
asa
is incandidate
state anda
's term is always equal to or larger thanN_
c
cannot transition tocandidate
state as there are not majority votesc
repeats step 6.a
remains incandidate
state (with a shorter log index thanc
)c
alternates betweenfollower
andpre_vote
states (with a term equal to or smaller thana
's term)Commands and a patch for reproduction
Please execute the following commands to reproduce the scenario described above.
(The reproduction rate is not 100%, but it is high in my environment.)
ra.patch
Expected behavior
A leader should eventually be elected if the majority of members are alive.
Additional context
No response
The text was updated successfully, but these errors were encountered: