Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize query performance and fix the result of the popular torrents page #6064

Conversation

kozlovsky
Copy link
Collaborator

@kozlovsky kozlovsky commented Apr 22, 2021

This PR:

  1. Speeds up multiple different queries by 200 times (from 60 seconds to 0.3 seconds)
  2. Fixes a query result for the "Popular torrents" page

A few days ago, I loaded a channel with two million torrents to test Tribler performance on a non-blazingly-fast machine (MacBook 2014 i5). It turns out, current Tribler client works horribly slow on it and is almost unusable. On startup, Tribler freezes for about one minute. Then freezings of a similar length continue with a high frequency, for example, on making a full-text search query, opening the "Popular torrents" page, or just periodically by running some background tasks. It turned out that with two million torrents, many different queries require about a minute to execute (60-65 seconds). As many of these queries are executed directly from the async loop, the Tribler core completely freezes during this time.

So I start detecting all slow queries one-by-one and investigate the reason for the problem. I discovered that the current database contains a really big number of table indexes for the ChannelNode table. This table stores information about channels and all torrents. SQLite query optimizer is pretty simple, and when it sees multiple ways to execute the same query using different indexes, it can choose an incorrect plan based on an inefficient index. For most slow queries, this was the reason - the SQLite optimizer decided to use an index that slows down query execution instead of speeding it up. After I remove bad indexes, SQLite starts using a full table scan of torrents table, and it was much faster than using the wrong index.

The most illustrative example is the index for the metadata_type column of the ChannelNode table. In this column, two million rows have a value corresponding to the TorrentMetadata subclass, and only 1800 rows have other values corresponding to channels and folders. This makes the index pretty horrible for searching torrents, as a full table scan will work much faster. On the other side, this index is beneficial when it is necessary to search for channels.

To make the index work efficiently, I replaced it with a partial index, which excludes usual torrents from the index. It allows SQLite to efficiently perform queries both for channels (using this partial index) and for torrents (using other "good" indexes)

Also, I added a similar partial index to the TorrentState table. Now it indexes health information only for torrents that actually have it.

In SQLite, partial indexes are usable in queries only when the query condition has a pretty strict form: column = some_value. For example, the condition TorrentState.last_check > some_time will not use partial index. To manage this, a new boolean column has_data was added so the condition can be rewritten as TorrentState.has_data = 1, and SQLite query planner can recognize it successfully. The column is managed by triggers and updated automatically.

Logically it was possible to split this pull request into several smaller ones. But it is more convenient to have a single database upgrade for all these changes, so they were combined into a single PR.

As a result of the changes, most queries run faster than 1 second now, and the usual speedup was about 200x for the database of two million torrents. The only two queries that run slightly slower are two queries in the TorrentChecker.torrents_to_check method. They can be optimized later.

This refactoring allowed to fix the logic of the "Popular torrents" page query and return the actual list of recently checked healthy torrents.

@kozlovsky kozlovsky requested a review from ichorid April 22, 2021 10:15
@ghost
Copy link

ghost commented Apr 22, 2021

Congratulations 🎉. DeepCode analyzed your code in 3.064 seconds and we found no issues. Enjoy a moment of no bugs ☀️.

👉 View analysis in DeepCode’s Dashboard | Configure the bot

@kozlovsky kozlovsky force-pushed the fix/fix_slow_queries_and_popular_torrents_page branch 3 times, most recently from 9f1a3cf to 5651270 Compare April 22, 2021 12:52
@kozlovsky
Copy link
Collaborator Author

retest this please

@kozlovsky kozlovsky force-pushed the fix/fix_slow_queries_and_popular_torrents_page branch from 5651270 to 72a4568 Compare April 22, 2021 13:08
@kozlovsky
Copy link
Collaborator Author

retest this please

@kozlovsky kozlovsky force-pushed the fix/fix_slow_queries_and_popular_torrents_page branch from 72a4568 to 58e04ec Compare April 22, 2021 14:45
@sonarcloud
Copy link

sonarcloud bot commented Apr 22, 2021

Kudos, SonarCloud Quality Gate passed!

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 1 Code Smell

No Coverage information No Coverage information
0.0% 0.0% Duplication

@kozlovsky
Copy link
Collaborator Author

retest this please

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants