Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

braft+brpc长时间大压力时的死锁问题求解决方法 #219

Open
kof02guy opened this issue Jun 14, 2020 · 30 comments
Open

braft+brpc长时间大压力时的死锁问题求解决方法 #219

kof02guy opened this issue Jun 14, 2020 · 30 comments

Comments

@kof02guy
Copy link

kof02guy commented Jun 14, 2020

我们用braft+brpc来做强一致性解决方案,目前在压力大持续时间长时,会导致死锁,通过gdb可以直接bt看到的栈分为两类。
类型1的线程在等类型2的线程中的锁,而2在执行过程中拿了1所需要的锁,在试图往rq中push时由于_rq是满的会失败然后usleep。而_rq里的task pop出来执行很可能也是log_manager的相关操作,也需要拿同一把锁。这样就形成了循环,死锁
在这里log_manager中的锁有可能进行逻辑上的拆分,从而使得从bthread里push进入时需要拿的锁和pop出来时调用的函数拿锁使用不同的锁吗?
请教一下这里的解决思路 ~~

类型1
`#0 0x00007f9edd212334 in __lll_lock_wait () from /lib64/libpthread.so.0

#1 0x00007f9edd20d5d8 in _L_lock_854 () from /lib64/libpthread.so.0
#2 0x00007f9edd20d4a7 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3 0x0000000000df637b in pthread_mutex_lock_impl (__mutex=0x7f89f75ff4e8) at xxx/deps/incubator-brpc/src/bthread/mutex.cpp:555
#4 pthread_mutex_lock (__mutex=0x7f89f75ff4e8) at xxx/deps/incubator-brpc/src/bthread/mutex.cpp:813
#5 0x0000000000d14ca0 in lock (this=0x7f89f75ff4b0, index=14541) at /usr/include/butil/synchronization/lock.h:69
#6 lock (this=0x7f89f75ff4b0, index=14541) at /usr/include/c++/4.9.2/mutex:474
#7 unique_lock (this=0x7f89f75ff4b0, index=14541) at /usr/include/c++/4.9.2/mutex:406
#8 braft::LogManager::get_term (this=0x7f89f75ff4b0, index=14541) at xxx/src/braft/log_manager.cpp:787
#9 0x0000000000d2ad2d in braft::Replicator::_fill_common_fields (this=0x7f86cc92c460, request=0x7f86a8f9d360, prev_log_index=14541, is_heartbeat=Unhandled dwarf expression opcode 0xf3
) at xxx/src/braft/replicator.cpp:496
#10 0x0000000000d3063c in braft::Replicator::_send_entries (this=0x7f86cc92c460) at xxx/src/braft/replicator.cpp:611
#11 0x0000000000d326bf in braft::Replicator::_on_rpc_returned (id=Unhandled dwarf expression opcode 0xf3
) at xxx/src/braft/replicator.cpp:489
#12 0x0000000000d33b36 in brpc::internal::FunctionClosure5<unsigned long, brpc::Controller*, braft::AppendEntriesRequest*, braft::AppendEntriesResponse*, long>::Run (this=0x7f86d16ad950) at /usr/include/brpc/callback.h:339
#13 0x0000000000e03ef8 in brpc::Controller::EndRPC (this=0x7f86d16ad5d0, info=Unhandled dwarf expression opcode 0xf3
) at xxx/deps/incubator-brpc/src/brpc/controller.cpp:893
#14 0x0000000000e05bc4 in brpc::Controller::OnVersionedRPCReturned (this=0x7f86d16ad5d0, info=..., new_bthread=Unhandled dwarf expression opcode 0xf3
) at xxx/deps/incubator-brpc/src/brpc/controller.cpp:676
#15 0x0000000000e6de9a in OnResponse (msg_base=0x7f897544b140) at xxx/deps/incubator-brpc/src/brpc/details/controller_private_accessor.h:48
#16 brpc::policy::ProcessRpcResponse (msg_base=0x7f897544b140) at xxx/deps/incubator-brpc/src/brpc/policy/baidu_rpc_protocol.cpp:618
#17 0x0000000000e5b9fa in brpc::ProcessInputMessage (void_arg=Unhandled dwarf expression opcode 0xf3
) at xxx/deps/incubator-brpc/src/brpc/input_messenger.cpp:136
#18 0x0000000000decee1 in bthread::TaskGroup::task_runner (skip_remained=Unhandled dwarf expression opcode 0xf3
) at xxx/deps/incubator-brpc/src/bthread/task_group.cpp:297
#19 0x0000000000e3d5b1 in bthread_make_fcontext ()
#20 0x0000000000000000 in ?? ()`

类型2
`#0 0x00007f9edb9c1cbd in nanosleep () from /lib64/libc.so.6

#1 0x00007f9edb9f6f14 in usleep () from /lib64/libc.so.6
#2 0x0000000000deb891 in bthread::TaskGroup::ready_to_run_remote (this=0x7f8b6c0008c0, tid=142391050791690, nosignal=Unhandled dwarf expression opcode 0xf3
) at xxx/deps/incubator-brpc/src/bthread/task_group.cpp:675
#3 0x0000000000dee52a in bthread::TaskGroup::start_background (this=Unhandled dwarf expression opcode 0xf3
) at xxx/deps/incubator-brpc/src/bthread/task_group.cpp:448
#4 0x0000000000df4d6c in start_from_non_worker (tid=0x7f8aed7e9800, attr=0x7f89f7582988, fn=0xddcbd0 bthread::ExecutionQueueBase::_execute_tasks(void*), arg=0x7f8511942ef0)
at xxx/deps/incubator-brpc/src/bthread/bthread.cpp:146
#5 bthread_start_background (tid=0x7f8aed7e9800, attr=0x7f89f7582988, fn=0xddcbd0 bthread::ExecutionQueueBase::_execute_tasks(void*), arg=0x7f8511942ef0)
at xxx/deps/incubator-brpc/src/bthread/bthread.cpp:194
#6 0x0000000000dddb97 in bthread::ExecutionQueueBase::start_execute (this=0x7f89f75828d0, node=0x7f8511942ef0) at xxx/deps/incubator-brpc/src/bthread/execution_queue.cpp:115
#7 0x0000000000d1a898 in execute (id=, task=@0x7f8aed7e98d8, options=0x0, handle=0x0) at /usr/include/bthread/execution_queue_inl.h:318
#8 bthread::execution_queue_executebraft::LogManager::StableClosure* (id=, task=@0x7f8aed7e98d8, options=0x0, handle=0x0) at /usr/include/bthread/execution_queue_inl.h:363
#9 0x0000000000d16251 in execution_queue_executebraft::LogManager::StableClosure* (this=0x7f89f75ff4b0, entries=0x7f8aed7e9a70, done=0x7f84d06daa00) at /usr/include/bthread/execution_queue_inl.h:352
#10 execution_queue_executebraft::LogManager::StableClosure* (this=0x7f89f75ff4b0, entries=0x7f8aed7e9a70, done=0x7f84d06daa00) at /usr/include/bthread/execution_queue_inl.h:345
#11 braft::LogManager::append_entries (this=0x7f89f75ff4b0, entries=0x7f8aed7e9a70, done=0x7f84d06daa00) at xxx/src/braft/log_manager.cpp:485
#12 0x0000000000cf3821 in braft::NodeImpl::apply (this=0x7f89f75feb50, tasks=Unhandled dwarf expression opcode 0xf3
) at xxx/src/braft/node.cpp:1959
#13 0x0000000000cf3b1e in braft::NodeImpl::execute_applying_tasks (meta=0x7f89f75feb50, iter=...) at xxx/src/braft/node.cpp:724
#14 0x0000000000dda82d in bthread::ExecutionQueueBase::_execute (this=0x7f89f75827d0, head=0x7f8572168a90, high_priority=Unhandled dwarf expression opcode 0xf3
) at xxx/deps/incubator-brpc/src/bthread/execution_queue.cpp:273
#15 0x0000000000dddc08 in bthread::ExecutionQueueBase::start_execute (this=0x7f89f75827d0, node=0x7f8572168a90) at xxx/deps/incubator-brpc/src/bthread/execution_queue.cpp:95
#16 0x0000000000c3ba4c in cellar::raft::BucketStateMachine::AsyncApply (this=0x3efba20, op_type=Unhandled dwarf expression opcode 0xf3
) at bucket_state_machine.cpp:87
...
...`

通过gdb_bthread_stack.py拿到的bthread stack基本上(99%)都是:
`#0 0x0000000000dea8a8 in jump_stack (pg=0x0, next_meta=0x5a7f77d934dc6) at xxx/deps/incubator-brpc/src/bthread/stack_inl.h:133

#1 bthread::TaskGroup::sched_to (pg=0x0, next_meta=0x5a7f77d934dc6) at xxx/deps/incubator-brpc/src/bthread/task_group.cpp:605
#2 0x0000000000deadce in sched_to (pg=0x7f894ebeba08) at xxx/deps/incubator-brpc/src/bthread/task_group_inl.h:80
#3 bthread::TaskGroup::sched (pg=0x7f894ebeba08) at xxx/deps/incubator-brpc/src/bthread/task_group.cpp:563
#4 0x0000000000de1cc7 in bthread::butex_wait (arg=0x7f87bceb60c0, expected_value=1, abstime=0x0) at xxx/deps/incubator-brpc/src/bthread/butex.cpp:660
#5 0x0000000000de93a0 in bthread::CountdownEvent::wait (this=0x7f894ebebb48) at xxx/deps/incubator-brpc/src/bthread/countdown_event.cpp:65
#6 0x0000000000d13991 in wait (this=Unhandled dwarf expression opcode 0xf3
) at xxx/src/braft/log_manager.cpp:164
#7 braft::LogManager::last_log_id (this=Unhandled dwarf expression opcode 0xf3
) at xxx/src/braft/log_manager.cpp:201
#8 0x0000000000cf1d7a in braft::NodeImpl::handle_pre_vote_request (this=0x7f89d42a3c30, request=0x7f87bc3c30a0, response=0x7f87bd29ef20) at xxx/src/braft/node.cpp:2027
#9 0x0000000000d7d892 in braft::RaftServiceImpl::pre_vote (this=Unhandled dwarf expression opcode 0xf3
) at xxx/src/braft/raft_service.cpp:62
#10 0x0000000000d67c6b in braft::RaftService::CallMethod (this=Unhandled dwarf expression opcode 0xf3
) at xxx/bld/braft/raft.pb.cc:5130
#11 0x0000000000e70cab in brpc::policy::ProcessRpcRequest (msg_base=Unhandled dwarf expression opcode 0xf3
) at xxx/deps/incubator-brpc/src/brpc/policy/baidu_rpc_protocol.cpp:499
#12 0x0000000000e5b9fa in brpc::ProcessInputMessage (void_arg=Unhandled dwarf expression opcode 0xf3
) at xxx/deps/incubator-brpc/src/brpc/input_messenger.cpp:136
#13 0x0000000000decee1 in bthread::TaskGroup::task_runner (skip_remained=Unhandled dwarf expression opcode 0xf3
) at xxx/deps/incubator-brpc/src/bthread/task_group.cpp:297
#14 0x0000000000e3d5b1 in bthread_make_fcontext ()
#15 0x0000000000000000 in ?? ()`

@PFZheng
Copy link
Collaborator

PFZheng commented Jun 14, 2020

从栈上看是因为没有足够的线程运行bthread,导致了 bthread 的堆积,这个得找为什么堆积了

@PFZheng
Copy link
Collaborator

PFZheng commented Jun 14, 2020

pthread mutex 里有线程 id的字段,可以看看持有锁的线程在做什么

@PFZheng
Copy link
Collaborator

PFZheng commented Jun 14, 2020

另外,参考一下这个问题:#139

@kof02guy
Copy link
Author

pthread mutex 里有线程 id的字段,可以看看持有锁的线程在做什么

这样的。有挺多的栈,分成了两种,1类型的栈的pthread mutex里的owner是2类型的栈的线程所持有的,在我的问题中我标注了一下类型。
我们的机器是32核的,正好有32个类型是1的栈都卡住了,分别去等多个类型是2的栈持有的锁
我理解堆积是因为这些堆积的bthread也要去拿同一把锁导致的

@PFZheng
Copy link
Collaborator

PFZheng commented Jun 14, 2020

本质上是这个问题:apache/brpc#473 。可以先切换成 bthread mutex 试试 ,参见 macros.h:

//#define USE_BTHREAD_MUTEX

使用 bthread mutex 后,类型 1 的栈会在类型 2 后排队,可以让用于执行 bthread 的线程更多,不造成堆积的话,也就不会有这种死锁 case

@kof02guy
Copy link
Author

kof02guy commented Jun 15, 2020

本质上是这个问题:apache/incubator-brpc#473 。可以先切换成 bthread mutex 试试 ,参见 macros.h:

//#define USE_BTHREAD_MUTEX

使用 bthread mutex 后,类型 1 的栈会在类型 2 后排队,可以让用于执行 bthread 的线程更多,不造成堆积的话,也就不会有这种死锁 case

多谢解答!
我试了下去掉这个注释,但看起来还需要别的修改,因为log.h里用到butil里的东西,而那边是用的butil::Mutex,具体编译错误如下:
In file included from xxx/src/braft/node.cpp:28:0: /src/braft/log.h: In constructor ‘braft::SegmentLogStorage::Writer::Writer(braft::raft_mutex_t*)’: xxx/src/braft/log.h:519:50: error: no matching function for call to ‘butil::ConditionVariable::ConditionVariable(braft::raft_mutex_t*&)’ explicit Writer(raft_mutex_t* mu) : cv(mu) { } ^ xxx/src/braft/log.h:519:50: note: candidate is: In file included from xxx/src/braft/log.h:27:0, from xxx/src/braft/node.cpp:28: /usr/include/butil/synchronization/condition_variable.h:87:12: note: butil::ConditionVariable::ConditionVariable(butil::Mutex*) explicit ConditionVariable(Mutex* user_lock); ^ /usr/include/butil/synchronization/condition_variable.h:87:12: note: no known conversion for argument 1 from ‘braft::raft_mutex_t* {aka bthread::Mutex*}’ to ‘butil::Mutex*’

所以请问这里还需要做一些什么别的改动吗?

@PFZheng
Copy link
Collaborator

PFZheng commented Jun 16, 2020

本质上是这个问题:apache/incubator-brpc#473 。可以先切换成 bthread mutex 试试 ,参见 macros.h:
//#define USE_BTHREAD_MUTEX
使用 bthread mutex 后,类型 1 的栈会在类型 2 后排队,可以让用于执行 bthread 的线程更多,不造成堆积的话,也就不会有这种死锁 case

多谢解答!
我试了下去掉这个注释,但看起来还需要别的修改,因为log.h里用到butil里的东西,而那边是用的butil::Mutex,具体编译错误如下:
In file included from xxx/src/braft/node.cpp:28:0: /src/braft/log.h: In constructor ‘braft::SegmentLogStorage::Writer::Writer(braft::raft_mutex_t*)’: xxx/src/braft/log.h:519:50: error: no matching function for call to ‘butil::ConditionVariable::ConditionVariable(braft::raft_mutex_t*&)’ explicit Writer(raft_mutex_t* mu) : cv(mu) { } ^ xxx/src/braft/log.h:519:50: note: candidate is: In file included from xxx/src/braft/log.h:27:0, from xxx/src/braft/node.cpp:28: /usr/include/butil/synchronization/condition_variable.h:87:12: note: butil::ConditionVariable::ConditionVariable(butil::Mutex*) explicit ConditionVariable(Mutex* user_lock); ^ /usr/include/butil/synchronization/condition_variable.h:87:12: note: no known conversion for argument 1 from ‘braft::raft_mutex_t* {aka bthread::Mutex*}’ to ‘butil::Mutex*’

所以请问这里还需要做一些什么别的改动吗?

我看下

@PFZheng
Copy link
Collaborator

PFZheng commented Jun 16, 2020

我本地可以编译过,你们改代码了?butil和bthread下各有一个ConditionVariable,用bthread命名空间下的就可以了

@kof02guy
Copy link
Author

我本地可以编译过,你们改代码了?butil和bthread下各有一个ConditionVariable,用bthread命名空间下的就可以了

多谢,我们是有做一些代码的修改,现在已经能够编译通过,并用bthread_mutex代替pthread_mutex做了一下验证,发现仍然有可能导致死锁的问题。
我理解在_rq full时,如果在_rq里所有换出的bthread拿的锁恰好是当前正在执行的bthread里所需要拿的锁,就可能发生死锁的情况,不知道我理解得对不对。这里如果想要彻底解决这个问题,就需要避免_rq full的时候换出的bthread持有的锁是正在执行的bthread要拿的锁这种情况。不知道是否有比较好的办法可以从根本上解决这个问题?
还有一个问题想请教一下,bthread_mutex里的mutex是一个unsigned int,没有标识线程信息,这里拿到了这个unsigned int之后如何去找谁持有了这个mutex呢?

@PFZheng
Copy link
Collaborator

PFZheng commented Jun 17, 2020

我本地可以编译过,你们改代码了?butil和bthread下各有一个ConditionVariable,用bthread命名空间下的就可以了

多谢,我们是有做一些代码的修改,现在已经能够编译通过,并用bthread_mutex代替pthread_mutex做了一下验证,发现仍然有可能导致死锁的问题。
我理解在_rq full时,如果在_rq里所有换出的bthread拿的锁恰好是当前正在执行的bthread里所需要拿的锁,就可能发生死锁的情况,不知道我理解得对不对。这里如果想要彻底解决这个问题,就需要避免_rq full的时候换出的bthread持有的锁是正在执行的bthread要拿的锁这种情况。不知道是否有比较好的办法可以从根本上解决这个问题?
还有一个问题想请教一下,bthread_mutex里的mutex是一个unsigned int,没有标识线程信息,这里拿到了这个unsigned int之后如何去找谁持有了这个mutex呢?

关键还是看为什么请求会堆积。可以看下 bthread usage,在brpc的bvar页面里能找到,如果这个usage和cpu usage能对上,说明是cpu瓶颈了,处理能力已经超出单机的能力,需要考虑做限流。如果usage远低于cpu usage,说明代码内阻塞的操作比较多,这时候需要找耗时比较长的地方,可能是磁盘操作或者其他类型的阻塞,具体的得靠抓栈和一些监控来分析,调大max_concurrency可以缓解情况,不过还是得先找到根因。

@PFZheng
Copy link
Collaborator

PFZheng commented Jun 17, 2020

我本地可以编译过,你们改代码了?butil和bthread下各有一个ConditionVariable,用bthread命名空间下的就可以了

多谢,我们是有做一些代码的修改,现在已经能够编译通过,并用bthread_mutex代替pthread_mutex做了一下验证,发现仍然有可能导致死锁的问题。
我理解在_rq full时,如果在_rq里所有换出的bthread拿的锁恰好是当前正在执行的bthread里所需要拿的锁,就可能发生死锁的情况,不知道我理解得对不对。这里如果想要彻底解决这个问题,就需要避免_rq full的时候换出的bthread持有的锁是正在执行的bthread要拿的锁这种情况。不知道是否有比较好的办法可以从根本上解决这个问题?
还有一个问题想请教一下,bthread_mutex里的mutex是一个unsigned int,没有标识线程信息,这里拿到了这个unsigned int之后如何去找谁持有了这个mutex呢?

关键还是看为什么请求会堆积。可以看下 bthread usage,在brpc的bvar页面里能找到,如果这个usage和cpu usage能对上,说明是cpu瓶颈了,处理能力已经超出单机的能力,需要考虑做限流。如果usage远低于cpu usage,说明代码内阻塞的操作比较多,这时候需要找耗时比较长的地方,可能是磁盘操作或者其他类型的阻塞,具体的得靠抓栈和一些监控来分析,调大max_concurrency可以缓解情况,不过还是得先找到根因。

另外,brpc内置了一个contention分析的页面,可以分析锁冲突的耗时。到了rq is full发生的时候,现场已经无法恢复了,得往前找情况开始恶化的时间段

@kof02guy
Copy link
Author

我本地可以编译过,你们改代码了?butil和bthread下各有一个ConditionVariable,用bthread命名空间下的就可以了

多谢,我们是有做一些代码的修改,现在已经能够编译通过,并用bthread_mutex代替pthread_mutex做了一下验证,发现仍然有可能导致死锁的问题。
我理解在_rq full时,如果在_rq里所有换出的bthread拿的锁恰好是当前正在执行的bthread里所需要拿的锁,就可能发生死锁的情况,不知道我理解得对不对。这里如果想要彻底解决这个问题,就需要避免_rq full的时候换出的bthread持有的锁是正在执行的bthread要拿的锁这种情况。不知道是否有比较好的办法可以从根本上解决这个问题?
还有一个问题想请教一下,bthread_mutex里的mutex是一个unsigned int,没有标识线程信息,这里拿到了这个unsigned int之后如何去找谁持有了这个mutex呢?

关键还是看为什么请求会堆积。可以看下 bthread usage,在brpc的bvar页面里能找到,如果这个usage和cpu usage能对上,说明是cpu瓶颈了,处理能力已经超出单机的能力,需要考虑做限流。如果usage远低于cpu usage,说明代码内阻塞的操作比较多,这时候需要找耗时比较长的地方,可能是磁盘操作或者其他类型的阻塞,具体的得靠抓栈和一些监控来分析,调大max_concurrency可以缓解情况,不过还是得先找到根因。

另外,brpc内置了一个contention分析的页面,可以分析锁冲突的耗时。到了rq is full发生的时候,现场已经无法恢复了,得往前找情况开始恶化的时间段

谢谢,我用了brpc的profile来看了下你说的bthread worker usage和process usage,两者几乎一致
image
所以看起来确实不是由于太多的等待导致的。限流的话我理解需要一些反馈信息来进行流控,例如_rq的size这样的一个指标。流控上你有什么建议吗?

contention分析的页面我这边试了下,会报错
Fail to execute `perl ./rpc_data/profiling/pprof.pl --dot ./ts ./rpc_data/profiling/180f54b113b35acab63e2f8d1cd0d898/20200617.193911.contention 2>&1 ', No child processes
文件倒是生成了,2MB左右,看了下里面有一些数据,但没法直观查看

@PFZheng
Copy link
Collaborator

PFZheng commented Jun 18, 2020

我本地可以编译过,你们改代码了?butil和bthread下各有一个ConditionVariable,用bthread命名空间下的就可以了

多谢,我们是有做一些代码的修改,现在已经能够编译通过,并用bthread_mutex代替pthread_mutex做了一下验证,发现仍然有可能导致死锁的问题。
我理解在_rq full时,如果在_rq里所有换出的bthread拿的锁恰好是当前正在执行的bthread里所需要拿的锁,就可能发生死锁的情况,不知道我理解得对不对。这里如果想要彻底解决这个问题,就需要避免_rq full的时候换出的bthread持有的锁是正在执行的bthread要拿的锁这种情况。不知道是否有比较好的办法可以从根本上解决这个问题?
还有一个问题想请教一下,bthread_mutex里的mutex是一个unsigned int,没有标识线程信息,这里拿到了这个unsigned int之后如何去找谁持有了这个mutex呢?

关键还是看为什么请求会堆积。可以看下 bthread usage,在brpc的bvar页面里能找到,如果这个usage和cpu usage能对上,说明是cpu瓶颈了,处理能力已经超出单机的能力,需要考虑做限流。如果usage远低于cpu usage,说明代码内阻塞的操作比较多,这时候需要找耗时比较长的地方,可能是磁盘操作或者其他类型的阻塞,具体的得靠抓栈和一些监控来分析,调大max_concurrency可以缓解情况,不过还是得先找到根因。

另外,brpc内置了一个contention分析的页面,可以分析锁冲突的耗时。到了rq is full发生的时候,现场已经无法恢复了,得往前找情况开始恶化的时间段

谢谢,我用了brpc的profile来看了下你说的bthread worker usage和process usage,两者几乎一致
image
所以看起来确实不是由于太多的等待导致的。限流的话我理解需要一些反馈信息来进行流控,例如_rq的size这样的一个指标。流控上你有什么建议吗?

contention分析的页面我这边试了下,会报错
Fail to execute `perl ./rpc_data/profiling/pprof.pl --dot ./ts ./rpc_data/profiling/180f54b113b35acab63e2f8d1cd0d898/20200617.193911.contention 2>&1 ', No child processes
文件倒是生成了,2MB左右,看了下里面有一些数据,但没法直观查看

如果是cpu usage能对上,得到快出问题的时候,用cpu分析页面或者perf来看,看看消耗在哪儿了。
“perl ./rpc_data/profiling/pprof.pl --dot ./ts ./rpc_data/profiling/180f54b113b35acab63e2f8d1cd0d898/20200617.193911.contention 2>&1” 这个也可以手动执行,./rpc_data/profiling/pprof.pl 本身可以help查看下用法,就是标准的pprof,可以生成pdf或者svg图

@PFZheng
Copy link
Collaborator

PFZheng commented Jun 18, 2020

流控可以依赖pending的任务数量(自己在代码里用bvar统计,或者检查status页面rpc请求的processing的统计)来做,这里的case,应该会有一个不断增加的过程

@kof02guy
Copy link
Author

流控可以依赖pending的任务数量(自己在代码里用bvar统计,或者检查status页面rpc请求的processing的统计)来做,这里的case,应该会有一个不断增加的过程

多谢 @PFZheng 我们在考虑流控的事情,但我这里有一个小疑问,现在我们用的是bthread mutex了,但仍然死锁,查看死锁的线程栈,发现有如下几种:

1

`Thread 113 (Thread 0x7f7117fff700 (LWP 30422)):

#0 0x00007f840ece6419 in syscall () from /lib64/libc.so.6
#1 0x0000000000de7dd2 in futex_wait_private (pw=..., ptimeout=0x0) at /home/zhangbiao11/Code/mraft/mraft/deps/incubator-brpc/src/bthread/sys_futex.h:42
#2 bthread::wait_pthread (pw=..., ptimeout=0x0) at /home/zhangbiao11/Code/mraft/mraft/deps/incubator-brpc/src/bthread/butex.cpp:142
#3 0x0000000000de8f54 in butex_wait_from_pthread (arg=0x7f6f3ae26520, expected_value=257, abstime=Unhandled dwarf expression opcode 0xf3
) at /home/zhangbiao11/Code/mraft/mraft/deps/incubator-brpc/src/bthread/butex.cpp:589
#4 bthread::butex_wait (arg=0x7f6f3ae26520, expected_value=257, abstime=Unhandled dwarf expression opcode 0xf3
) at /home/zhangbiao11/Code/mraft/mraft/deps/incubator-brpc/src/bthread/butex.cpp:622
#5 0x0000000000dfd327 in mutex_lock_contended (m=Unhandled dwarf expression opcode 0xf3
) at /home/zhangbiao11/Code/mraft/mraft/deps/incubator-brpc/src/bthread/mutex.cpp:635
#6 bthread_mutex_lock (m=Unhandled dwarf expression opcode 0xf3
) at /home/zhangbiao11/Code/mraft/mraft/deps/incubator-brpc/src/bthread/mutex.cpp:736
#7 0x0000000000d08532 in bthread::Mutex::lock (this=Unhandled dwarf expression opcode 0xf3
) at /usr/include/bthread/mutex.h:58
#8 0x0000000000cf4410 in lock (this=0x7f6f3ae31e50, request=0x7f66eb2a4550, response=0x7f63c56f9460) at /usr/include/c++/4.9.2/mutex:474
#9 unique_lock (this=0x7f6f3ae31e50, request=0x7f66eb2a4550, response=0x7f63c56f9460) at /usr/include/c++/4.9.2/mutex:406
#10 braft::NodeImpl::handle_pre_vote_request (this=0x7f6f3ae31e50, request=0x7f66eb2a4550, response=0x7f63c56f9460) at xxx/braft/node.cpp:1994
#11 0x0000000000d837c2 in braft::RaftServiceImpl::pre_vote (this=Unhandled dwarf expression opcode 0xf3
) at xxx/braft/raft_service.cpp:62
#12 0x0000000000d6da8b in braft::RaftService::CallMethod (this=Unhandled dwarf expression opcode 0xf3
) at /home/zhangbiao11/Code/mraft/mraft/bld/braft/raft.pb.cc:5130
#13 0x0000000000e77d0b in brpc::policy::ProcessRpcRequest (msg_base=Unhandled dwarf expression opcode 0xf3
) at /home/zhangbiao11/Code/mraft/mraft/deps/incubator-brpc/src/brpc/policy/baidu_rpc_protocol.cpp:499
#14 0x0000000000e62a1a in brpc::ProcessInputMessage (void_arg=Unhandled dwarf expression opcode 0xf3
) at /home/zhangbiao11/Code/mraft/mraft/deps/incubator-brpc/src/brpc/input_messenger.cpp:136
#15 0x0000000000e63910 in operator() (m=0x7f70d002c1f0) at /home/zhangbiao11/Code/mraft/mraft/deps/incubator-brpc/src/brpc/input_messenger.cpp:142
#16 brpc::InputMessenger::OnNewMessages (m=0x7f70d002c1f0) at /usr/include/c++/4.9.2/bits/unique_ptr.h:236
#17 0x0000000000e4faad in brpc::Socket::ProcessEvent (arg=0x7f70d002c1f0) at /home/zhangbiao11/Code/mraft/mraft/deps/incubator-brpc/src/brpc/socket.cpp:1020
#18 0x0000000000df3f01 in bthread::TaskGroup::task_runner (skip_remained=Unhandled dwarf expression opcode 0xf3
) at /home/zhangbiao11/Code/mraft/mraft/deps/incubator-brpc/src/bthread/task_group.cpp:297
#19 0x0000000000df45a2 in bthread::TaskGroup::run_main_task (this=0x7f71080008c0) at /home/zhangbiao11/Code/mraft/mraft/deps/incubator-brpc/src/bthread/task_group.cpp:158
#20 0x0000000000df8c2e in bthread::TaskControl::worker_thread (arg=0x4b599e0) at /home/zhangbiao11/Code/mraft/mraft/deps/incubator-brpc/src/bthread/task_control.cpp:77
#21 0x00007f84104f7aa1 in start_thread () from /lib64/libpthread.so.0
#22 0x00007f840ece9bcd in clone () from /lib64/libc.so.6`

2
`Thread 110 (Thread 0x7f71161fc700 (LWP 30425)):

#0 0x00007f840ece6419 in syscall () from /lib64/libc.so.6
#1 0x0000000000de7dd2 in futex_wait_private (pw=..., ptimeout=0x0) at /home/zhangbiao11/Code/mraft/mraft/deps/incubator-brpc/src/bthread/sys_futex.h:42
#2 bthread::wait_pthread (pw=..., ptimeout=0x0) at /home/zhangbiao11/Code/mraft/mraft/deps/incubator-brpc/src/bthread/butex.cpp:142
#3 0x0000000000de8f54 in butex_wait_from_pthread (arg=0x7f6f7d19e3a0, expected_value=257, abstime=Unhandled dwarf expression opcode 0xf3
) at /home/zhangbiao11/Code/mraft/mraft/deps/incubator-brpc/src/bthread/butex.cpp:589
#4 bthread::butex_wait (arg=0x7f6f7d19e3a0, expected_value=257, abstime=Unhandled dwarf expression opcode 0xf3
) at /home/zhangbiao11/Code/mraft/mraft/deps/incubator-brpc/src/bthread/butex.cpp:622
#5 0x0000000000dfd327 in mutex_lock_contended (m=Unhandled dwarf expression opcode 0xf3
) at /home/zhangbiao11/Code/mraft/mraft/deps/incubator-brpc/src/bthread/mutex.cpp:635
#6 bthread_mutex_lock (m=Unhandled dwarf expression opcode 0xf3
) at /home/zhangbiao11/Code/mraft/mraft/deps/incubator-brpc/src/bthread/mutex.cpp:736
#7 0x0000000000d08532 in bthread::Mutex::lock (this=Unhandled dwarf expression opcode 0xf3
) at /usr/include/bthread/mutex.h:58
#8 0x0000000000d3cfd6 in lock_guard (this=0x7f6f7d45af70, group_id=12638, index=62666) at /usr/include/c++/4.9.2/mutex:377
#9 braft::SegmentLogStorage::get_segment (this=0x7f6f7d45af70, group_id=12638, index=62666) at xxx/braft/log.cpp:2247
#10 0x0000000000d3d706 in braft::SegmentLogStorage::get_term (this=Unhandled dwarf expression opcode 0xf3
) at xxx/braft/log.cpp:1910
#11 0x0000000000d17f39 in braft::LogManager::get_term (this=0x7f6f2c946840, index=62666) at xxx/braft/log_manager.cpp:813
#12 0x0000000000d2d90d in braft::Replicator::_fill_common_fields (this=0x7f707457f100, request=0x7f63246e4b90, prev_log_index=62666, is_heartbeat=Unhandled dwarf expression opcode 0xf3
) at xxx/braft/replicator.cpp:496
#13 0x0000000000d30ce3 in braft::Replicator::_send_empty_entries (this=0x7f707457f100, is_heartbeat=Unhandled dwarf expression opcode 0xf3
) at xxx/braft/replicator.cpp:527
#14 0x0000000000d32c82 in braft::Replicator::_send_heartbeat (arg=Unhandled dwarf expression opcode 0xf3
) at xxx/braft/replicator.cpp:902
#15 0x0000000000df3f01 in bthread::TaskGroup::task_runner (skip_remained=Unhandled dwarf expression opcode 0xf3
) at /home/zhangbiao11/Code/mraft/mraft/deps/incubator-brpc/src/bthread/task_group.cpp:297
#16 0x0000000000df45a2 in bthread::TaskGroup::run_main_task (this=0x7f71040008c0) at /home/zhangbiao11/Code/mraft/mraft/deps/incubator-brpc/src/bthread/task_group.cpp:158
#17 0x0000000000df8c2e in bthread::TaskControl::worker_thread (arg=0x4b599e0) at /home/zhangbiao11/Code/mraft/mraft/deps/incubator-brpc/src/bthread/task_control.cpp:77
#18 0x00007f84104f7aa1 in start_thread () from /lib64/libpthread.so.0
#19 0x00007f840ece9bcd in clone () from /lib64/libc.so.6`

3

`Thread 78 (Thread 0x7f708cdfa700 (LWP 30457)):

#0 0x00007f840ece6419 in syscall () from /lib64/libc.so.6
#1 0x0000000000de7dd2 in futex_wait_private (pw=..., ptimeout=0x0) at /home/zhangbiao11/Code/mraft/mraft/deps/incubator-brpc/src/bthread/sys_futex.h:42
#2 bthread::wait_pthread (pw=..., ptimeout=0x0) at /home/zhangbiao11/Code/mraft/mraft/deps/incubator-brpc/src/bthread/butex.cpp:142
#3 0x0000000000de8f54 in butex_wait_from_pthread (arg=0x7f6f392e3a30, expected_value=257, abstime=Unhandled dwarf expression opcode 0xf3
) at /home/zhangbiao11/Code/mraft/mraft/deps/incubator-brpc/src/bthread/butex.cpp:589
#4 bthread::butex_wait (arg=0x7f6f392e3a30, expected_value=257, abstime=Unhandled dwarf expression opcode 0xf3
) at /home/zhangbiao11/Code/mraft/mraft/deps/incubator-brpc/src/bthread/butex.cpp:622
#5 0x0000000000dfd327 in mutex_lock_contended (m=Unhandled dwarf expression opcode 0xf3
) at /home/zhangbiao11/Code/mraft/mraft/deps/incubator-brpc/src/bthread/mutex.cpp:635
#6 bthread_mutex_lock (m=Unhandled dwarf expression opcode 0xf3
) at /home/zhangbiao11/Code/mraft/mraft/deps/incubator-brpc/src/bthread/mutex.cpp:736
#7 0x0000000000d08532 in bthread::Mutex::lock (this=Unhandled dwarf expression opcode 0xf3
) at /usr/include/bthread/mutex.h:58
#8 0x0000000000cf5abf in lock (this=0x7f6f392e8250, tasks=0x7f708cde8b00, size=1) at /usr/include/c++/4.9.2/mutex:474
#9 unique_lock (this=0x7f6f392e8250, tasks=0x7f708cde8b00, size=1) at /usr/include/c++/4.9.2/mutex:406
#10 braft::NodeImpl::apply (this=0x7f6f392e8250, tasks=0x7f708cde8b00, size=1) at xxx/braft/node.cpp:1913
#11 0x0000000000cf62de in braft::NodeImpl::execute_applying_tasks (meta=0x7f6f392e8250, iter=...) at xxx/braft/node.cpp:724
#12 0x0000000000de184d in bthread::ExecutionQueueBase::_execute (this=0x7f6f392bf1e0, head=0x7f700c0600a0, high_priority=Unhandled dwarf expression opcode 0xf3
) at /home/zhangbiao11/Code/mraft/mraft/deps/incubator-brpc/src/bthread/execution_queue.cpp:273
#13 0x0000000000de4c28 in bthread::ExecutionQueueBase::start_execute (this=0x7f6f392bf1e0, node=0x7f700c0600a0) at /home/zhangbiao11/Code/mraft/mraft/deps/incubator-brpc/src/bthread/execution_queue.cpp:95
#14 0x0000000000c3c11c in xxx::raft::BucketStateMachine::AsyncApply (this=0x466c1d0, op_type=Unhandled dwarf expression opcode 0xf3
) at bucket_state_machine.cpp:87
#15 0x0000000000827da6 in xxx::DataManager::AsyncApply (this=Unhandled dwarf expression opcode 0xf3
) at data_manager.cpp:224
#16 0x00000000008283b0 in xxx::DataManager::BatchPut (this=0x4bb5820, c=0x7f706e0e89d0, done=0x7f706def8bd0) at data_manager.cpp:113
#17 0x000000000076e64b in tair::request_processor::DoBatchPut (this=0x4b78f10, r_ctx=std::shared_ptr (count 16) 0x7f706e1c8000) at request_processor.cpp:264
#18 0x0000000000754a93 in tair::tair_server::handlePacketQueue (this=0x44a06f0, apacket=0x7f6364740ff0, arg=Unhandled dwarf expression opcode 0xf3
) at tair_server.cpp:772
#19 0x0000000000c8649e in tair::SharedQueueThread::run (this=0x44b13b0, thread=Unhandled dwarf expression opcode 0xf3
) at packetqueuethread.cpp:592
#20 0x00000000007d1359 in tbsys::CThread::hook (arg=0x44b13f8) at /home/zhangbiao11/Code/xxx/tair/src/common/include/tbsys/thread.h:117
#21 0x00007f84104f7aa1 in start_thread () from /lib64/libpthread.so.0
#22 0x00007f840ece9bcd in clone () from /lib64/libc.so.6`

这些栈的最后几帧都是一样的结果,看起来都是在去拿bthread mutex。我们的问题是:协程锁的话理应不block线程才对,也就是说通过线程栈应该看不见或者很少看见去拿协程锁才对,因为如果拿协程锁的话应该就会被换出排队去了。但是这里的线程栈里却都是这样的去拿协程锁block的。我们的max_concurrency是0,我理解实际效果是无限大的。
image

想请教一下这里是否符合预期?

@PFZheng
Copy link
Collaborator

PFZheng commented Jun 19, 2020

你们是开启了 usercode_in_pthread或者混用了其他的线程池? futex_wait_private 这个说明是在普通pthread线程里持有了锁,这个时候butex会退化成futex。如果是混用了其他的线程池,可以主动发起一个bthread去执行操作,切换到bthread线程池里

@PFZheng
Copy link
Collaborator

PFZheng commented Jun 19, 2020

braft::SegmentLogStorage::get_segment (this=0x7f6f7d45af70, group_id=12638, index=62666) 看起来根源是这个锁,这里有做改动吗?

这儿有个嵌套。replicator 的 bthread id lock 里会去拿 Segment 的锁。Node 里有周期性的定时器会拿 replicator的 bthread id lock(获取 last_rpc_send_timestamp),Segment 锁住会造成整个 node 的锁竞争。这个地方目前在内部版本已经改了,还没同步过来。不过这个锁竞争仍然是 butex 的,不应该占住线程,主要还是看 get_segment 为什么会锁住。

@kof02guy
Copy link
Author

kof02guy commented Jun 19, 2020

你们是开启了 usercode_in_pthread或者混用了其他的线程池? futex_wait_private 这个说明是在普通pthread线程里持有了锁,这个时候butex会退化成futex。如果是混用了其他的线程池,可以主动发起一个bthread去执行操作,切换到bthread线程池里

我看了下,braft里面唯一让用户去控制usercode_in_pthread的地方就是在创建node的时候init()函数传入的option,这个option我们没有去设置usercode_in_pthread这个值,我看默认值是false的。所以我们应该没有开启usercode_in_pthread。
我们其实没有直接地使用bthread,都是通过braft来间接使用bthread的。我们对braft的使用其实蛮简单的,就只会调用apply接口以及实现state_machine的on_apply等回调
我们的服务里是会有别的线程,但是这些线程不会和braft bthread有关

@PFZheng
Copy link
Collaborator

PFZheng commented Jun 19, 2020

void NodeImpl::apply(const Task& task) {
    LogEntry* entry = new LogEntry;
    entry->AddRef();
    entry->data.swap(*task.data);
    LogEntryAndClosure m;
    m.entry = entry;
    m.done = task.done;
    m.expected_term = task.expected_term;
    if (_apply_queue->execute(m, &bthread::TASK_OPTIONS_INPLACE, NULL) != 0) {
        task.done->status().set_error(EPERM, "Node is down");
        entry->Release();
        return run_closure_in_bthread(task.done);
    }
}

_apply_queue->execute(m, &bthread::TASK_OPTIONS_INPLACE, NULL) 这儿改一下,第二个参数改成 bthread::TASK_OPTIONS_NORMAL。原来的参数意思是原地执行 execution queue,减少一次线程切换开销,这里看起来是从你们自己的线程里发起的,这样执行就没在 bthread 里。

@kof02guy
Copy link
Author

braft::SegmentLogStorage::get_segment (this=0x7f6f7d45af70, group_id=12638, index=62666) 看起来根源是这个锁,这里有做改动吗?

这儿有个嵌套。replicator 的 bthread id lock 里会去拿 Segment 的锁。Node 里有周期性的定时器会拿 replicator的 bthread id lock(获取 last_rpc_send_timestamp),Segment 锁住会造成整个 node 的锁竞争。这个地方目前在内部版本已经改了,还没同步过来。不过这个锁竞争仍然是 butex 的,不应该占住线程,主要还是看 get_segment 为什么会锁住。

我们对写日志这块有一些优化,是改过code的

@kof02guy
Copy link
Author

void NodeImpl::apply(const Task& task) {
    LogEntry* entry = new LogEntry;
    entry->AddRef();
    entry->data.swap(*task.data);
    LogEntryAndClosure m;
    m.entry = entry;
    m.done = task.done;
    m.expected_term = task.expected_term;
    if (_apply_queue->execute(m, &bthread::TASK_OPTIONS_INPLACE, NULL) != 0) {
        task.done->status().set_error(EPERM, "Node is down");
        entry->Release();
        return run_closure_in_bthread(task.done);
    }
}

_apply_queue->execute(m, &bthread::TASK_OPTIONS_INPLACE, NULL) 这儿改一下,第二个参数改成 bthread::TASK_OPTIONS_NORMAL。原来的参数意思是原地执行 execution queue,减少一次线程切换开销,这里看起来是从你们自己的线程里发起的,这样执行就没在 bthread 里。

我试下这个修改
但我理解我们调用的apply()就是braft暴露出来的接口吧,应该这样的用法蛮正常的~

@kof02guy
Copy link
Author

kof02guy commented Jun 19, 2020

void NodeImpl::apply(const Task& task) {
    LogEntry* entry = new LogEntry;
    entry->AddRef();
    entry->data.swap(*task.data);
    LogEntryAndClosure m;
    m.entry = entry;
    m.done = task.done;
    m.expected_term = task.expected_term;
    if (_apply_queue->execute(m, &bthread::TASK_OPTIONS_INPLACE, NULL) != 0) {
        task.done->status().set_error(EPERM, "Node is down");
        entry->Release();
        return run_closure_in_bthread(task.done);
    }
}

_apply_queue->execute(m, &bthread::TASK_OPTIONS_INPLACE, NULL) 这儿改一下,第二个参数改成 bthread::TASK_OPTIONS_NORMAL。原来的参数意思是原地执行 execution queue,减少一次线程切换开销,这里看起来是从你们自己的线程里发起的,这样执行就没在 bthread 里。

按照这个修改了,还是会hang住,但是和之前不一样的是,现在hang住之后在pthread stack里面没有发现之前的来自braft的接口调用apply()了。现在hang住的时候33个worker里有这样的几种,其中第二种占大部分,第一种只有一个,第5种也只有一个
1
`Thread 116 (Thread 0x7fc191737700 (LWP 18115)):

#0 0x00007fd48abc4499 in syscall () from /lib64/libc.so.6
#1 0x0000000000ded8a9 in futex_wait_private (this=0x489a420) at xxx/deps/incubator-brpc/src/bthread/sys_futex.h:42
#2 bthread::TimerThread::run (this=0x489a420) at xxx/deps/incubator-brpc/src/bthread/timer_thread.cpp:439
#3 0x0000000000dee6f9 in bthread::TimerThread::run_this (arg=Unhandled dwarf expression opcode 0xf3
) at xxx/deps/incubator-brpc/src/bthread/timer_thread.cpp:121
#4 0x00007fd48c3d5aa1 in start_thread () from /lib64/libpthread.so.0
#5 0x00007fd48abc7c4d in clone () from /lib64/libc.so.6`

2
`Thread 115 (Thread 0x7fc190d36700 (LWP 18116)):

#0 0x00007fd48abc4499 in syscall () from /lib64/libc.so.6
#1 0x0000000000df8db2 in futex_wait_private (pw=..., ptimeout=0x0) at xxx/deps/incubator-brpc/src/bthread/sys_futex.h:42
#2 bthread::wait_pthread (pw=..., ptimeout=0x0) at xxx/deps/incubator-brpc/src/bthread/butex.cpp:142
#3 0x0000000000df9f34 in butex_wait_from_pthread (arg=0x7fbf810955d0, expected_value=257, abstime=Unhandled dwarf expression opcode 0xf3
) at xxx/deps/incubator-brpc/src/bthread/butex.cpp:589
#4 bthread::butex_wait (arg=0x7fbf810955d0, expected_value=257, abstime=Unhandled dwarf expression opcode 0xf3
) at xxx/deps/incubator-brpc/src/bthread/butex.cpp:622
#5 0x0000000000de65e7 in mutex_lock_contended (m=Unhandled dwarf expression opcode 0xf3
) at xxx/deps/incubator-brpc/src/bthread/mutex.cpp:635
#6 bthread_mutex_lock (m=Unhandled dwarf expression opcode 0xf3
) at xxx/deps/incubator-brpc/src/bthread/mutex.cpp:736
#7 0x0000000000d08452 in bthread::Mutex::lock (this=Unhandled dwarf expression opcode 0xf3
) at /usr/include/bthread/mutex.h:58
#8 0x0000000000cf59df in lock (this=0x7fbf810a0550, tasks=0x7fc190d24bf0, size=1) at /usr/include/c++/4.9.2/mutex:474
#9 unique_lock (this=0x7fbf810a0550, tasks=0x7fc190d24bf0, size=1) at /usr/include/c++/4.9.2/mutex:406
#10 braft::NodeImpl::apply (this=0x7fbf810a0550, tasks=0x7fc190d24bf0, size=1) at /home/zhangbiao11/Code/mraft/mraft/src/braft/node.cpp:1913
#11 0x0000000000cf61fe in braft::NodeImpl::execute_applying_tasks (meta=0x7fbf810a0550, iter=...) at /home/zhangbiao11/Code/mraft/mraft/src/braft/node.cpp:724
#12 0x0000000000def84d in bthread::ExecutionQueueBase::_execute (this=0x7fbf810812e0, head=0x7fc15c0abf90, high_priority=Unhandled dwarf expression opcode 0xf3
) at xxx/deps/incubator-brpc/src/bthread/execution_queue.cpp:273
#13 0x0000000000df20bc in bthread::ExecutionQueueBase::_execute_tasks (arg=Unhandled dwarf expression opcode 0xf3
) at xxx/deps/incubator-brpc/src/bthread/execution_queue.cpp:152
#14 0x0000000000dfef61 in bthread::TaskGroup::task_runner (skip_remained=Unhandled dwarf expression opcode 0xf3
) at xxx/deps/incubator-brpc/src/bthread/task_group.cpp:297
#15 0x0000000000dff602 in bthread::TaskGroup::run_main_task (this=0x7fc1880008c0) at xxx/deps/incubator-brpc/src/bthread/task_group.cpp:158
#16 0x0000000000df710e in bthread::TaskControl::worker_thread (arg=0x4894ea0) at xxx/deps/incubator-brpc/src/bthread/task_control.cpp:77
#17 0x00007fd48c3d5aa1 in start_thread () from /lib64/libpthread.so.0
#18 0x00007fd48abc7c4d in clone () from /lib64/libc.so.6`

3
`Thread 112 (Thread 0x7fd34ceff700 (LWP 18119)):

#0 0x00007fd48abc4499 in syscall () from /lib64/libc.so.6
#1 0x0000000000df8db2 in futex_wait_private (pw=..., ptimeout=0x0) at xxx/deps/incubator-brpc/src/bthread/sys_futex.h:42
#2 bthread::wait_pthread (pw=..., ptimeout=0x0) at xxx/deps/incubator-brpc/src/bthread/butex.cpp:142
#3 0x0000000000df9f34 in butex_wait_from_pthread (arg=0x7fc15826aa60, expected_value=121, abstime=Unhandled dwarf expression opcode 0xf3
) at xxx/deps/incubator-brpc/src/bthread/butex.cpp:589
#4 bthread::butex_wait (arg=0x7fc15826aa60, expected_value=121, abstime=Unhandled dwarf expression opcode 0xf3
) at xxx/deps/incubator-brpc/src/bthread/butex.cpp:622
#5 0x0000000000de1868 in bthread_id_lock_and_reset_range_verbose (id=, pdata=0x7fd34ceeddb8, range=0, location=0x1140468 "/home/zhangbiao11/Code/mraft/mraft/src/braft/replicator.cpp:158")
at xxx/deps/incubator-brpc/src/bthread/id.cpp:443
#6 0x0000000000d2b8ec in braft::Replicator::last_rpc_send_timestamp (id=28359669055607) at /home/zhangbiao11/Code/mraft/mraft/src/braft/replicator.cpp:158
#7 0x0000000000cfc5d1 in braft::NodeImpl::check_dead_nodes (this=0x7fbf80734e00, conf=..., now_ms=3468297977) at /home/zhangbiao11/Code/mraft/mraft/src/braft/node.cpp:838
#8 0x0000000000cfcaf2 in braft::NodeImpl::handle_stepdown_timeout (this=0x7fbf80734e00) at /home/zhangbiao11/Code/mraft/mraft/src/braft/node.cpp:870
#9 0x0000000000d558b8 in braft::RepeatedTimerTask::on_timedout (this=0x7fbf807352c0) at /home/zhangbiao11/Code/mraft/mraft/src/braft/repeated_timer_task.cpp:63
#10 0x0000000000d55b59 in braft::RepeatedTimerTask::run_on_timedout_in_new_thread (arg=Unhandled dwarf expression opcode 0xf3
) at /home/zhangbiao11/Code/mraft/mraft/src/braft/repeated_timer_task.cpp:111
#11 0x0000000000dfef61 in bthread::TaskGroup::task_runner (skip_remained=Unhandled dwarf expression opcode 0xf3
) at xxx/deps/incubator-brpc/src/bthread/task_group.cpp:297
#12 0x0000000000dff602 in bthread::TaskGroup::run_main_task (this=0x7fc1740008c0) at xxx/deps/incubator-brpc/src/bthread/task_group.cpp:158
#13 0x0000000000df710e in bthread::TaskControl::worker_thread (arg=0x4894ea0) at xxx/deps/incubator-brpc/src/bthread/task_control.cpp:77
#14 0x00007fd48c3d5aa1 in start_thread () from /lib64/libpthread.so.0
#15 0x00007fd48abc7c4d in clone () from /lib64/libc.so.6`

4
`Thread 107 (Thread 0x7fc1797fb700 (LWP 18124)):

#0 0x00007fd48abc4499 in syscall () from /lib64/libc.so.6
#1 0x0000000000df8db2 in futex_wait_private (pw=..., ptimeout=0x0) at xxx/deps/incubator-brpc/src/bthread/sys_futex.h:42
#2 bthread::wait_pthread (pw=..., ptimeout=0x0) at xxx/deps/incubator-brpc/src/bthread/butex.cpp:142
#3 0x0000000000df9f34 in butex_wait_from_pthread (arg=0x7fbfe919e390, expected_value=257, abstime=Unhandled dwarf expression opcode 0xf3
) at xxx/deps/incubator-brpc/src/bthread/butex.cpp:589
#4 bthread::butex_wait (arg=0x7fbfe919e390, expected_value=257, abstime=Unhandled dwarf expression opcode 0xf3
) at xxx/deps/incubator-brpc/src/bthread/butex.cpp:622
#5 0x0000000000de65e7 in mutex_lock_contended (m=Unhandled dwarf expression opcode 0xf3
) at xxx/deps/incubator-brpc/src/bthread/mutex.cpp:635
#6 bthread_mutex_lock (m=Unhandled dwarf expression opcode 0xf3
) at xxx/deps/incubator-brpc/src/bthread/mutex.cpp:736
#7 0x0000000000d08452 in bthread::Mutex::lock (this=Unhandled dwarf expression opcode 0xf3
) at /usr/include/bthread/mutex.h:58
#8 0x0000000000d3cef6 in lock_guard (this=0x7fbfe945af60, group_id=3078, index=78426) at /usr/include/c++/4.9.2/mutex:377
#9 braft::SegmentLogStorage::get_segment (this=0x7fbfe945af60, group_id=3078, index=78426) at /home/zhangbiao11/Code/mraft/mraft/src/braft/log.cpp:2247
#10 0x0000000000d3d626 in braft::SegmentLogStorage::get_term (this=Unhandled dwarf expression opcode 0xf3
) at /home/zhangbiao11/Code/mraft/mraft/src/braft/log.cpp:1910
#11 0x0000000000d17e59 in braft::LogManager::get_term (this=0x7fbfeb6e9a20, index=78426) at /home/zhangbiao11/Code/mraft/mraft/src/braft/log_manager.cpp:813
#12 0x0000000000d2d82d in braft::Replicator::_fill_common_fields (this=0x7fc14c1b2c90, request=0x7fb5b493be40, prev_log_index=78426, is_heartbeat=Unhandled dwarf expression opcode 0xf3
) at /home/zhangbiao11/Code/mraft/mraft/src/braft/replicator.cpp:496
#13 0x0000000000d30c03 in braft::Replicator::_send_empty_entries (this=0x7fc14c1b2c90, is_heartbeat=Unhandled dwarf expression opcode 0xf3
) at /home/zhangbiao11/Code/mraft/mraft/src/braft/replicator.cpp:527
#14 0x0000000000d32ba2 in braft::Replicator::_send_heartbeat (arg=Unhandled dwarf expression opcode 0xf3
) at /home/zhangbiao11/Code/mraft/mraft/src/braft/replicator.cpp:902
#15 0x0000000000dfef61 in bthread::TaskGroup::task_runner (skip_remained=Unhandled dwarf expression opcode 0xf3
) at xxx/deps/incubator-brpc/src/bthread/task_group.cpp:297
#16 0x0000000000dff602 in bthread::TaskGroup::run_main_task (this=0x7fc15c0008c0) at xxx/deps/incubator-brpc/src/bthread/task_group.cpp:158
#17 0x0000000000df710e in bthread::TaskControl::worker_thread (arg=0x4894ea0) at xxx/deps/incubator-brpc/src/bthread/task_control.cpp:77
#18 0x00007fd48c3d5aa1 in start_thread () from /lib64/libpthread.so.0
#19 0x00007fd48abc7c4d in clone () from /lib64/libc.so.6`

5
`Thread 91 (Thread 0x7fc12abfd700 (LWP 18141)):

#0 0x00007fd48ab8bcbd in nanosleep () from /lib64/libc.so.6
#1 0x00007fd48abc0f14 in usleep () from /lib64/libc.so.6
#2 0x0000000000dfd3f8 in push_rq (this=0x7fc1100008c0, tid=656554470677386, nosignal=Unhandled dwarf expression opcode 0xf3
) at xxx/deps/incubator-brpc/src/bthread/task_group_inl.h:97
#3 bthread::TaskGroup::ready_to_run (this=0x7fc1100008c0, tid=656554470677386, nosignal=Unhandled dwarf expression opcode 0xf3
) at xxx/deps/incubator-brpc/src/bthread/task_group.cpp:649
#4 0x0000000000dfcc5e in bthread::TaskGroup::sched_to (pg=0x7fbc46e31418, next_meta=Unhandled dwarf expression opcode 0xf3
) at xxx/deps/incubator-brpc/src/bthread/task_group.cpp:625
#5 0x0000000000dfd16e in sched_to (pg=0x7fbc46e31418) at xxx/deps/incubator-brpc/src/bthread/task_group_inl.h:79
#6 bthread::TaskGroup::sched (pg=0x7fbc46e31418) at xxx/deps/incubator-brpc/src/bthread/task_group.cpp:563
#7 0x0000000000df9cc7 in bthread::butex_wait (arg=0x7fbfe919e390, expected_value=257, abstime=0x0) at xxx/deps/incubator-brpc/src/bthread/butex.cpp:660
#8 0x0000000000de65e7 in mutex_lock_contended (m=Unhandled dwarf expression opcode 0xf3
) at xxx/deps/incubator-brpc/src/bthread/mutex.cpp:635
#9 bthread_mutex_lock (m=Unhandled dwarf expression opcode 0xf3
) at xxx/deps/incubator-brpc/src/bthread/mutex.cpp:736
#10 0x0000000000d08452 in bthread::Mutex::lock (this=Unhandled dwarf expression opcode 0xf3
) at /usr/include/bthread/mutex.h:58
#11 0x0000000000d3cef6 in lock_guard (this=0x7fbfe945af60, group_id=10142, index=78405) at /usr/include/c++/4.9.2/mutex:377
#12 braft::SegmentLogStorage::get_segment (this=0x7fbfe945af60, group_id=10142, index=78405) at /home/zhangbiao11/Code/mraft/mraft/src/braft/log.cpp:2247
#13 0x0000000000d45da6 in braft::SegmentLogStorage::get_entry (this=Unhandled dwarf expression opcode 0xf3
) at /home/zhangbiao11/Code/mraft/mraft/src/braft/log.cpp:1892
#14 0x0000000000d18112 in braft::LogManager::get_entry (this=0x7fbfa33baae0, index=78405) at /home/zhangbiao11/Code/mraft/mraft/src/braft/log_manager.cpp:834
#15 0x0000000000d2be56 in braft::Replicator::_prepare_entry (this=Unhandled dwarf expression opcode 0xf3
) at /home/zhangbiao11/Code/mraft/mraft/src/braft/replicator.cpp:570
#16 0x0000000000d332f5 in braft::Replicator::_send_entries (this=0x7fc17007ddf0) at /home/zhangbiao11/Code/mraft/mraft/src/braft/replicator.cpp:619
#17 0x0000000000d351bf in braft::Replicator::_on_rpc_returned (id=Unhandled dwarf expression opcode 0xf3
) at /home/zhangbiao11/Code/mraft/mraft/src/braft/replicator.cpp:489
#18 0x0000000000d36636 in brpc::internal::FunctionClosure5<unsigned long, brpc::Controller*, braft::AppendEntriesRequest*, braft::AppendEntriesResponse*, long>::Run (this=0x7fb36a3ca500) at /usr/include/brpc/callback.h:339
#19 0x0000000000e08008 in brpc::Controller::EndRPC (this=0x7fb379829910, info=Unhandled dwarf expression opcode 0xf3
) at xxx/deps/incubator-brpc/src/brpc/controller.cpp:893
#20 0x0000000000e09cd4 in brpc::Controller::OnVersionedRPCReturned (this=0x7fb379829910, info=..., new_bthread=Unhandled dwarf expression opcode 0xf3
) at xxx/deps/incubator-brpc/src/brpc/controller.cpp:676
#21 0x0000000000eb036a in OnResponse (msg_base=0x7fc1700231e0) at xxx/deps/incubator-brpc/src/brpc/details/controller_private_accessor.h:48
#22 brpc::policy::ProcessRpcResponse (msg_base=0x7fc1700231e0) at xxx/deps/incubator-brpc/src/brpc/policy/baidu_rpc_protocol.cpp:618
#23 0x0000000000e2194a in brpc::ProcessInputMessage (void_arg=Unhandled dwarf expression opcode 0xf3
) at xxx/deps/incubator-brpc/src/brpc/input_messenger.cpp:136
#24 0x0000000000dfef61 in bthread::TaskGroup::task_runner (skip_remained=Unhandled dwarf expression opcode 0xf3
) at xxx/deps/incubator-brpc/src/bthread/task_group.cpp:297
#25 0x0000000000eaf571 in bthread_make_fcontext ()
#26 0x0000000000000000 in ?? ()`

看起来还是会有futex_wait_private。。。这还需要做什么修改么?现在看起来所有的栈都在内部了,但看起来都是从task_runner调度出来的栈。

@kof02guy kof02guy reopened this Jun 20, 2020
@kof02guy
Copy link
Author

@PFZheng 大佬有v没,我也在BJ,我们比较着急,可以当面聊聊么

@PFZheng
Copy link
Collaborator

PFZheng commented Jun 21, 2020

int butex_wait(void* arg, int expected_value, const timespec* abstime) {
    ...
    TaskGroup* g = tls_task_group;
    if (NULL == g || g->is_current_pthread_task()) {
        return butex_wait_from_pthread(g, b, expected_value, abstime);
    }

brpc 源码里,只有这两种情况是 butex_wait_from_pthread,看栈是 is_current_pthread_task() 这个判断的概率比较大。这儿不太科学,理论上 bthread stack 这里应该是 false,你们应该没动过 bthread 调度部分的代码吧?这儿需要一个确认一下在 bthread TaskGroup::task_runner 里面发生了什么,看起来是原地执行了 bthread 函数

@PFZheng
Copy link
Collaborator

PFZheng commented Jun 21, 2020

@PFZheng 大佬有v没,我也在BJ,我们比较着急,可以当面聊聊么

你往我的邮箱发一个邮件吧,告知一下微信号,我加一下你: [email protected]

@PFZheng
Copy link
Collaborator

PFZheng commented Jun 21, 2020

int butex_wait(void* arg, int expected_value, const timespec* abstime) {
    ...
    TaskGroup* g = tls_task_group;
    if (NULL == g || g->is_current_pthread_task()) {
        return butex_wait_from_pthread(g, b, expected_value, abstime);
    }

brpc 源码里,只有这两种情况是 butex_wait_from_pthread,看栈是 is_current_pthread_task() 这个判断的概率比较大。这儿不太科学,理论上 bthread stack 这里应该是 false,你们应该没动过 bthread 调度部分的代码吧?这儿需要一个确认一下在 bthread TaskGroup::task_runner 里面发生了什么,看起来是原地执行了 bthread 函数

int butex_wait(void* arg, int expected_value, const timespec* abstime) {
    ...
    TaskGroup* g = tls_task_group;
    if (NULL == g || g->is_current_pthread_task()) {
        return butex_wait_from_pthread(g, b, expected_value, abstime);
    }

brpc 源码里,只有这两种情况是 butex_wait_from_pthread,看栈是 is_current_pthread_task() 这个判断的概率比较大。这儿不太科学,理论上 bthread stack 这里应该是 false,你们应该没动过 bthread 调度部分的代码吧?这儿需要一个确认一下在 bthread TaskGroup::task_runner 里面发生了什么,看起来是原地执行了 bthread 函数

我看你之前贴的栈的栈顶还是正常的,这个是正常的 bthread 协程栈:
...
#19 0x0000000000e3d5b1 in bthread_make_fcontext ()
#20 0x0000000000000000 in ?? ()`

现在贴的栈里这种是不正常的,worker_thread 这个是调度器的栈,不应该在这里直接执行:
...
#14 0x0000000000d32ba2 in braft::Replicator::_send_heartbeat (arg=Unhandled dwarf expression opcode 0xf3
) at /home/zhangbiao11/Code/mraft/mraft/src/braft/replicator.cpp:902
#15 0x0000000000dfef61 in bthread::TaskGroup::task_runner (skip_remained=Unhandled dwarf expression opcode 0xf3
) at xxx/deps/incubator-brpc/src/bthread/task_group.cpp:297
#16 0x0000000000dff602 in bthread::TaskGroup::run_main_task (this=0x7fc15c0008c0) at xxx/deps/incubator-brpc/src/bthread/task_group.cpp:158
#17 0x0000000000df710e in bthread::TaskControl::worker_thread (arg=0x4894ea0) at xxx/deps/incubator-brpc/src/bthread/task_control.cpp:77
#18 0x00007fd48c3d5aa1 in start_thread () from /lib64/libpthread.so.0
#19 0x00007fd48abc7c4d in clone () from /lib64/libc.so.6`

@wangxun155423
Copy link

braft::SegmentLogStorage::get_segment (this=0x7f6f7d45af70, group_id=12638, index=62666) 看起来根源是这个锁,这里有做改动吗?

这儿有个嵌套。replicator 的 bthread id lock 里会去拿 Segment 的锁。Node 里有周期性的定时器会拿 replicator的 bthread id lock(获取 last_rpc_send_timestamp),Segment 锁住会造成整个 node 的锁竞争。这个地方目前在内部版本已经改了,还没同步过来。不过这个锁竞争仍然是 butex 的,不应该占住线程,主要还是看 get_segment 为什么会锁住。

这个锁想问下内部版本是怎么改的,优化了吗

@PFZheng
Copy link
Collaborator

PFZheng commented Jun 22, 2020

braft::SegmentLogStorage::get_segment (this=0x7f6f7d45af70, group_id=12638, index=62666) 看起来根源是这个锁,这里有做改动吗?
这儿有个嵌套。replicator 的 bthread id lock 里会去拿 Segment 的锁。Node 里有周期性的定时器会拿 replicator的 bthread id lock(获取 last_rpc_send_timestamp),Segment 锁住会造成整个 node 的锁竞争。这个地方目前在内部版本已经改了,还没同步过来。不过这个锁竞争仍然是 butex 的,不应该占住线程,主要还是看 get_segment 为什么会锁住。

这个锁想问下内部版本是怎么改的,优化了吗

比较简单,把 last_rpc_send_timestamp 拆出来了,需要的话这周单独拆个patch同步过来。github上的版本和内部版本代码上是一致的,只不过是内部测试比较稳定后才会同步过来,这个地方是刚改没多久

@wangxun155423
Copy link

braft::SegmentLogStorage::get_segment (this=0x7f6f7d45af70, group_id=12638, index=62666) 看起来根源是这个锁,这里有做改动吗?
这儿有个嵌套。replicator 的 bthread id lock 里会去拿 Segment 的锁。Node 里有周期性的定时器会拿 replicator的 bthread id lock(获取 last_rpc_send_timestamp),Segment 锁住会造成整个 node 的锁竞争。这个地方目前在内部版本已经改了,还没同步过来。不过这个锁竞争仍然是 butex 的,不应该占住线程,主要还是看 get_segment 为什么会锁住。

这个锁想问下内部版本是怎么改的,优化了吗

比较简单,把 last_rpc_send_timestamp 拆出来了,需要的话这周单独拆个patch同步过来。github上的版本和内部版本代码上是一致的,只不过是内部测试比较稳定后才会同步过来,这个地方是刚改没多久

可以发出来我们看看吗,这个get_segment函数调用比较频繁,想看看性能优化效果

@PFZheng
Copy link
Collaborator

PFZheng commented Jul 11, 2020

braft::SegmentLogStorage::get_segment (this=0x7f6f7d45af70, group_id=12638, index=62666) 看起来根源是这个锁,这里有做改动吗?
这儿有个嵌套。replicator 的 bthread id lock 里会去拿 Segment 的锁。Node 里有周期性的定时器会拿 replicator的 bthread id lock(获取 last_rpc_send_timestamp),Segment 锁住会造成整个 node 的锁竞争。这个地方目前在内部版本已经改了,还没同步过来。不过这个锁竞争仍然是 butex 的,不应该占住线程,主要还是看 get_segment 为什么会锁住。

这个锁想问下内部版本是怎么改的,优化了吗

比较简单,把 last_rpc_send_timestamp 拆出来了,需要的话这周单独拆个patch同步过来。github上的版本和内部版本代码上是一致的,只不过是内部测试比较稳定后才会同步过来,这个地方是刚改没多久

可以发出来我们看看吗,这个get_segment函数调用比较频繁,想看看性能优化效果

好的。不过对 get_segment 整体估计无太大收益,get_segment主要是很多地方需要获取 term 等信息。
这里应该有优化的空间,主要的冲突点来自 disk thread 写和只读逻辑的冲突,是一个单写多读的模型。除了切主刚开始的短暂时间内,存在日志冲突,会有已经存在的日志被删除的case,正常情况下,读是安全的。如果你们手上有一些性能数据可以share出来看看,目前我们自己的测试场景还没发现这个地方是瓶颈。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants