-
Notifications
You must be signed in to change notification settings - Fork 444
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Detect slow coroutines and freezes in the asyncio loop #7414
Conversation
ca266d8
to
a396ea6
Compare
How much slower? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a nice improvement. I've left some comments that are more related to style than to the concept itself.
As a wish, I would also like to see these two files in a separate folder because it will be easier to navigate.
fb8fc4d
to
4afc4c3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All previous comments have been addressed. It seems that the PR is almost ready to be merged.
I left one comment about a possibly forgotten empty file, and another about enabling stack tracing by default for developers.
src/run_tribler.py
Outdated
@@ -94,6 +95,10 @@ def init_boot_logger(): | |||
from tribler.core.components.reporter.exception_handler import default_core_exception_handler | |||
|
|||
init_sentry_reporter(default_core_exception_handler.sentry_reporter) | |||
|
|||
if os.environ.get('SLOW_CORO_STACK_TRACING') == '1': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps it is handier to add an option to always trace a stack for developers.
Something like:
if os.environ.get('SLOW_CORO_STACK_TRACING') == '1': | |
if os.environ.get('SLOW_CORO_STACK_TRACING') == '1' or is_run_from_source(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is possible, but I'm not sure we really want this, as it slows doun the code and it may be harder to turn the option off when running from sources
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a developer, I really want this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me, it seems like a good strategy to enable all debugging tools on each developer's machine and use environment variables as an option to disable a tool.
For example, if you're developing 3 tools, I'm developing 2 tools, and Sandip is developing 3 tools, that means Marcel would need to:
- Know about all 8 tools
- Enable all of them by setting 8 environment variables
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. I changed the code to turn on stack tracking when running from sources by default and allow easier overriding
slow_coro_stack_tracking = os.environ.get('SLOW_CORO_STACK_TRACING', '0' if is_frozen() else '1')
# By default, the stack tracking of slow coroutines is enabled when running the Tribler from sources
# and disabled in the compiled version, as it makes the Python code of Core work slower.
if slow_coro_stack_tracking == '1':
start_main_thread_stack_tracing()
This PR adds functionality to detect slow coroutine step execution in the asyncio loop. If a coroutine step requires too much time (more than 1 second by default), the coroutine is reported to the error log.
A separate thread watches whether the main line is currently inside the coroutine for too long and reports a possible freeze.
It is possible to specify an environment variable
SLOW_CORO_STACK_TRACING=1
; in that case, the coroutine stack is also tracked and reported in case of a possible freeze. With stack tracing enabled, the Core works slower, so this option is intended to be run not on user machines (although this is possible).