Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot contact delegate daemon. Daemon died? #47

Open
Crispig opened this issue Nov 10, 2022 · 9 comments
Open

Cannot contact delegate daemon. Daemon died? #47

Crispig opened this issue Nov 10, 2022 · 9 comments

Comments

@Crispig
Copy link
Contributor

Crispig commented Nov 10, 2022

我按技术文档配置,调度机和客户机都各自启动了scheduler和daemon,调度机检测到新的servant。

I1110 14:19:54.431716 1780984 init.cc:114] Flare started.
I1110 14:19:54.432881 1780984 runtime.cc:425] Using fiber scheduling profile [neutral].
I1110 14:19:54.432904 1780984 runtime.cc:220] Starting 8 worker threads per group, for a total of 1 groups. The system is treated as UMA.
I1110 14:19:54.436939 1780984 init.cc:122] Flare runtime initialized.
I1110 14:19:54.702870 1780997 task_dispatcher.cc:205] [192.168.33.5:46518] Discovered new servant at [192.168.33.5:8335]

但是运行YADCC_LOG_LEVEL=0 CXX='/home/ubuntu/.yadcc/symlinks/g++' LD='/home/ubuntu/.yadcc/symlinks/g++' CC='/home/ubuntu/.yadcc/symlinks/gcc' make -j8
提示

[2022-11-10 14:20:33.068854] [TRACE] [yadcc/client/yadcc.cc:261] Started
[2022-11-10 14:20:33.068886] [DEBUG] [yadcc/client/utility.cc:92] Looking up for [gcc] in [/home/ubuntu/.yadcc/symlinks].
[2022-11-10 14:20:33.068912] [DEBUG] [yadcc/client/utility.cc:92] Looking up for [gcc] in [/usr/local/sbin].
[2022-11-10 14:20:33.068919] [DEBUG] [yadcc/client/utility.cc:92] Looking up for [gcc] in [/usr/local/bin].
[2022-11-10 14:20:33.068926] [DEBUG] [yadcc/client/utility.cc:92] Looking up for [gcc] in [/usr/sbin].
[2022-11-10 14:20:33.068932] [DEBUG] [yadcc/client/utility.cc:92] Looking up for [gcc] in [/usr/bin].
[2022-11-10 14:20:33.068942] [TRACE] [yadcc/client/utility.cc:99] Found [gcc] at [/usr/bin].
[2022-11-10 14:20:33.068947] [TRACE] [yadcc/client/yadcc.cc:133] Using compiler: /usr/bin/gcc
[2022-11-10 14:20:33.068980] [ERROR] [yadcc/client/task_quota.cc:61] Cannot contact delegate daemon. Daemon died?
[2022-11-10 14:20:34.064286] [ERROR] [yadcc/client/task_quota.cc:61] Cannot contact delegate daemon. Daemon died?
[2022-11-10 14:20:34.065163] [ERROR] [yadcc/client/task_quota.cc:61] Cannot contact delegate daemon. Daemon died?
[2022-11-10 14:20:34.068817] [ERROR] [yadcc/client/task_quota.cc:61] Cannot contact delegate daemon. Daemon died?
[2022-11-10 14:20:34.069060] [ERROR] [yadcc/client/task_quota.cc:61] Cannot contact delegate daemon. Daemon died?
[2022-11-10 14:20:35.064416] [ERROR] [yadcc/client/task_quota.cc:61] Cannot contact delegate daemon. Daemon died?
[2022-11-10 14:20:35.065259] [ERROR] [yadcc/client/task_quota.cc:61] Cannot contact delegate daemon. Daemon died?
[2022-11-10 14:20:35.068913] [ERROR] [yadcc/client/task_quota.cc:61] Cannot contact delegate daemon. Daemon died?
[2022-11-10 14:20:35.069152] [ERROR] [yadcc/client/task_quota.cc:61] Cannot contact delegate daemon. Daemon died?

请问可能是什么问题?

@Crispig
Copy link
Contributor Author

Crispig commented Nov 10, 2022

我发现我得在调度机上也启动守护进程才能解决

@0x804d8000
Copy link
Collaborator

调度机应该不用,客户机需要daemon。

客户机的daemon的具体的作用可以参考 https://github.com/Tencent/yadcc/blob/master/yadcc/doc/daemon.md#处理本地请求

@Crispig
Copy link
Contributor Author

Crispig commented Nov 15, 2022

非常感谢,但我在调度机上起编译任务,好像得在调度机上同时起守护进程

@0x804d8000
Copy link
Collaborator

是的,起编译任务的机器需要daemon存活。如果调度机发起编译,那么这时候调度机同时是客户机&调度机两种角色。

ps: 对于C/S方式的部署,客户机的daemon可以配置为不接受网络任务,参考https://github.com/Tencent/yadcc/blob/master/yadcc/doc/daemon.md#参数

@Crispig
Copy link
Contributor Author

Crispig commented Nov 24, 2022

嗯嗯,已经解决了,感谢感谢!!调度机默认端口 8836在修改代码之后重新编译可以换成其他的端口吧,这里还存在其他依赖吗

@0x804d8000
Copy link
Collaborator

可以换,daemon的参数--scheduler_uri里面的端口号和调度机端口一致就可以

@Crispig
Copy link
Contributor Author

Crispig commented Nov 25, 2022

已经解决,感谢!
30核的情况下CPU利用率几乎能跑满,但扩展到130核发现利用率不足,请问有什么解决方案吗,换更大的workload好像没有改善。

@0x804d8000
Copy link
Collaborator

先看看客户机负载,如果负载很高可能是本地预处理速度不够,可以考虑按照https://github.com/Tencent/yadcc/blob/master/yadcc/doc/client.md#配置 关闭YADCC_COMPILE_ON_CLOUD_SIZE_THRESHOLD(置为0)、启用YADCC_IGNORE_TIMESTAMP_MACROS(置为1)

之后可以参考https://github.com/Tencent/yadcc/blob/master/yadcc/doc/debugging.md#守护进程 看看distributed_task_dispatcher下面的信息,看看本地daemon收到的任务数。如果太少可能是构建系统因为各种原因运行的并发度太低(依赖关系/构建并发度/…),如果很多但是都是waiting状态可以在调度机检查是不是上报上来的编译器hash不匹配导致任务分发不出去

另外也可以看看客户机有没有其他非预期的资源瓶颈(比如并发度太高的时候可能磁盘/网络会跑满,但是这种一般是几百并发才会遇到)

@Crispig
Copy link
Contributor Author

Crispig commented Dec 1, 2022

好的好的,我排查一下,非常感谢!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants