Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Leadership expiry not respected as a limit for leadership expiry #522

Open
antoniofilipovic opened this issue Jul 23, 2024 · 2 comments
Open

Comments

@antoniofilipovic
Copy link

Hi,

I expected the leadership_expiry_ flag to be used as the amount of time needed to pass between now and the last time the follower responded to the leader, for the leader to yield its leadership.

However, looking at the code, leadership only expires after followers are deemed unhealthy. And followers are deemed unhealthy according to the following policy:

int expiry = params->heart_beat_interval_ *
                     raft_server::raft_limits_.response_limit_;

    // Check the number of not responding peers.
    for (auto& entry: peers_) {
        ptr<peer> p = entry.second;

        if (!is_regular_member(p)) continue;

        int32 resp_elapsed_ms = (int32)(p->get_resp_timer_us() / 1000);
        if ( resp_elapsed_ms > expiry ) {
            num_not_resp_nodes++;
        }
    }

I would like to open PR to fix this issue that the expiry used in this function is the same as the leadership expiry time, as otherwise this flag is not used to calculate when will leadership expire.

Let me know if this is actually not an issue, and I didn't understand something properly.

many thanks,
Antonio

@greensky00
Copy link
Contributor

greensky00 commented Jul 23, 2024

Thanks for pointing this out. Looks like this part of code was previously correct, and then the bug has been introduced since the below commit.
bb863ab#diff-60ca73a04b9bb2ab5fa615dc2a38a817a8260975d6c06d35ca90a02555731635R573

As get_not_responding_peers_count and get_not_responding_peers are currently used for other purposes as well, can we put an optional parameter to set custom expiry? For example,

std::list<ptr<peer>> get_not_responding_peers(int expiry = 0);
size_t get_not_responding_peers_count(int expiry = 0);
void apply_to_not_responding_peers(const std::function<void(const ptr<peer>&)>& callback, int expiry);

If given expiry is non-zero, honor the given expiry. If not, use heart_beat_interval_ * reponse_limit_.

@antoniofilipovic
Copy link
Author

Hi, @greensky00. Thanks for the fast response.

The change which you suggested makes sense. I will open the PR to fix it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants