Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Detect slow coroutines and freezes in the asyncio loop #7414

Merged
merged 1 commit into from
May 12, 2023

Conversation

kozlovsky
Copy link
Contributor

This PR adds functionality to detect slow coroutine step execution in the asyncio loop. If a coroutine step requires too much time (more than 1 second by default), the coroutine is reported to the error log.

A separate thread watches whether the main line is currently inside the coroutine for too long and reports a possible freeze.

It is possible to specify an environment variable SLOW_CORO_STACK_TRACING=1; in that case, the coroutine stack is also tracked and reported in case of a possible freeze. With stack tracing enabled, the Core works slower, so this option is intended to be run not on user machines (although this is possible).

@kozlovsky kozlovsky force-pushed the slow_coro_detection branch 2 times, most recently from ca266d8 to a396ea6 Compare May 9, 2023 11:12
@kozlovsky kozlovsky marked this pull request as ready for review May 9, 2023 11:17
@kozlovsky kozlovsky requested review from a team, drew2a and xoriole and removed request for a team May 9, 2023 11:17
@drew2a
Copy link
Contributor

drew2a commented May 9, 2023

the Core works slower, so this option is intended to be run not on user machines (although this is possible).

How much slower?

Copy link
Contributor

@drew2a drew2a left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a nice improvement. I've left some comments that are more related to style than to the concept itself.

As a wish, I would also like to see these two files in a separate folder because it will be easier to navigate.

src/tribler/core/utilities/main_thread_stack_tracking.py Outdated Show resolved Hide resolved
src/tribler/core/utilities/main_thread_stack_tracking.py Outdated Show resolved Hide resolved
src/tribler/core/utilities/main_thread_stack_tracking.py Outdated Show resolved Hide resolved
src/tribler/core/utilities/main_thread_stack_tracking.py Outdated Show resolved Hide resolved
src/tribler/core/utilities/main_thread_stack_tracking.py Outdated Show resolved Hide resolved
src/tribler/core/utilities/slow_coro_detection.py Outdated Show resolved Hide resolved
src/tribler/core/utilities/slow_coro_detection.py Outdated Show resolved Hide resolved
src/tribler/core/utilities/slow_coro_detection.py Outdated Show resolved Hide resolved
src/tribler/core/utilities/slow_coro_detection.py Outdated Show resolved Hide resolved
src/tribler/core/utilities/slow_coro_detection.py Outdated Show resolved Hide resolved
@kozlovsky kozlovsky force-pushed the slow_coro_detection branch 3 times, most recently from fb8fc4d to 4afc4c3 Compare May 12, 2023 07:53
@kozlovsky kozlovsky requested review from xoriole and drew2a May 12, 2023 07:54
Copy link
Contributor

@drew2a drew2a left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All previous comments have been addressed. It seems that the PR is almost ready to be merged.

I left one comment about a possibly forgotten empty file, and another about enabling stack tracing by default for developers.

src/tribler/core/utilities/slow_coro_detection.py Outdated Show resolved Hide resolved
@@ -94,6 +95,10 @@ def init_boot_logger():
from tribler.core.components.reporter.exception_handler import default_core_exception_handler

init_sentry_reporter(default_core_exception_handler.sentry_reporter)

if os.environ.get('SLOW_CORO_STACK_TRACING') == '1':
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps it is handier to add an option to always trace a stack for developers.
Something like:

Suggested change
if os.environ.get('SLOW_CORO_STACK_TRACING') == '1':
if os.environ.get('SLOW_CORO_STACK_TRACING') == '1' or is_run_from_source():

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is possible, but I'm not sure we really want this, as it slows doun the code and it may be harder to turn the option off when running from sources

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a developer, I really want this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me, it seems like a good strategy to enable all debugging tools on each developer's machine and use environment variables as an option to disable a tool.

For example, if you're developing 3 tools, I'm developing 2 tools, and Sandip is developing 3 tools, that means Marcel would need to:

  1. Know about all 8 tools
  2. Enable all of them by setting 8 environment variables

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. I changed the code to turn on stack tracking when running from sources by default and allow easier overriding

slow_coro_stack_tracking = os.environ.get('SLOW_CORO_STACK_TRACING', '0' if is_frozen() else '1')
# By default, the stack tracking of slow coroutines is enabled when running the Tribler from sources
# and disabled in the compiled version, as it makes the Python code of Core work slower.
if slow_coro_stack_tracking == '1':
    start_main_thread_stack_tracing()