Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make KnowledgeRulesProcessor less aggressive #7418

Merged
merged 1 commit into from
May 16, 2023

Conversation

drew2a
Copy link
Contributor

@drew2a drew2a commented May 12, 2023

This is a continuation of the previous PR #7417 and the final PR from the KnowledgeRulesProcessor improvements series.

As demonstrated in #7414 by @kozlovsky, process_batch is one of the few "long-running" coroutines. This behavior could lead to a brief freeze of Tribler every 10 seconds. This happens on the core side and should go unnoticed by users. However, it's possible to reduce the time of synchronous process_batch execution by making process_torrent_title an async function.

I've tried to carefully isolate the use of db_session so it won't be affected by the switch to async. @kozlovsky, could you please confirm that I've done everything correctly?

@drew2a drew2a self-assigned this May 12, 2023
@drew2a drew2a added this to the 7.13.0 milestone May 12, 2023
@drew2a drew2a requested a review from kozlovsky May 12, 2023 11:06
@drew2a drew2a marked this pull request as ready for review May 12, 2023 11:06
@drew2a drew2a requested a review from a team May 12, 2023 11:06
kozlovsky

This comment was marked as duplicate.

Copy link
Contributor

@kozlovsky kozlovsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code looks correct. Now each save_statements call is processed as a separate transaction, and it should be slower, but it is more important to occupy the loop only for a small amount of time, and this code achieves it.

I'd like to see an extended log message, not just "Processed: N titles. Added M tags." but "Processed: N titles in S seconds (K titles / second). Added M tags (Q tags / second)."
It may be convenient to understand the throughput of the processor. The estimated numbers will not take into account parallel tasks, but they should allow an estimation of how long the processing of the entire database can take.

@drew2a drew2a force-pushed the fix/krp_asyncio_batch branch 2 times, most recently from 273e670 to 93147f5 Compare May 15, 2023 08:46
@drew2a drew2a requested a review from kozlovsky May 15, 2023 09:28
@drew2a
Copy link
Contributor Author

drew2a commented May 15, 2023

During the testing of this PR, we realized that async/await might not be used correctly in the Tribler codebase.
Async/await has quite a non-intuitive behavior. See https://stackoverflow.com/questions/59586879/does-await-in-python-yield-to-the-event-loop for more information (credits to @kozlovsky for finding this).

Below is a code example:

import asyncio


async def a():
    async def a_print():
        print('a')

    while True:
        await a_print()


async def b():
    async def b_print():
        print('b')

    while True:
        await b_print()


async def main():
    tasks = {
        asyncio.create_task(a()),
        asyncio.create_task(b())
    }

    await asyncio.wait(tasks)


asyncio.run(main())

Intuitively the code should print

a
b
a
b
a
b

But instead of this the output is:

a
a
a
a
a
a

So, this PR also includes a decorator to actually interrupt a coroutine at the moment of awaiting.

@drew2a drew2a changed the title Make process_torrent_title async Make KnowledgeRulesProcessor less aggressive May 15, 2023
Copy link
Contributor

@kozlovsky kozlovsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! I have only a few minor comments.

src/tribler/core/utilities/async_interruption.py Outdated Show resolved Hide resolved
src/tribler/core/utilities/tests/test_async_interuption.py Outdated Show resolved Hide resolved
@drew2a drew2a force-pushed the fix/krp_asyncio_batch branch 4 times, most recently from b62439f to e002568 Compare May 16, 2023 10:17
@drew2a drew2a requested a review from kozlovsky May 16, 2023 10:17
@drew2a drew2a merged commit 8514752 into Tribler:release/7.13 May 16, 2023
@drew2a drew2a deleted the fix/krp_asyncio_batch branch May 16, 2023 15:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants