Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

请教下为什么 Replicator 发送 AppendEntriesRequest 不需要设置超时时间? #331

Closed
Slontia opened this issue Oct 25, 2021 · 2 comments

Comments

@Slontia
Copy link

Slontia commented Oct 25, 2021

int Replicator::start(const ReplicatorOptions& options, ReplicatorId *id) {
    if (options.log_manager == NULL || options.ballot_box == NULL
            || options.node == NULL) {
        LOG(ERROR) << "Invalid arguments, group " << options.group_id;
        return -1;
    }
    Replicator* r = new Replicator();
    brpc::ChannelOptions channel_opt;
    //channel_opt.connect_timeout_ms = *options.heartbeat_timeout_ms;
    channel_opt.timeout_ms = -1; // We don't need RPC timeout (Why?)
    if (r->_sending_channel.Init(options.peer_id.addr, &channel_opt) != 0) {
        LOG(ERROR) << "Fail to init sending channel"
                   << ", group " << options.group_id;
        delete r;
        return -1;
    }
    ...
}

在阅读 braft 代码的时候,发现这里特意设置将 channel_opt.timeout_ms 设置为了 -1,这样一来如果 follower 收到 AppendEntriesRequest 之后一直没有回包(如遭遇网络错误等原因),leader 就一直不会回调 Replicator::_on_rpc_returned 了,这样 follower 的日志就越来越落后了。

这里想请教下是处于何种考量,直接禁掉了 RPC 超时,而非采用重试机制提高容错性呢?

@chenzhangyi
Copy link
Collaborator

重试会带来雪崩, 网络出错连接会断,RPC减少

@Slontia
Copy link
Author

Slontia commented Oct 29, 2021

明白了,感谢~

@Slontia Slontia closed this as completed Oct 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants