Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Popularity community #6950

Merged
merged 8 commits into from
Sep 22, 2022

Conversation

xoriole
Copy link
Contributor

@xoriole xoriole commented Jun 24, 2022

Major changes

  1. Popular torrents gossip is now pull based and happens on introduction request callback
  2. Popular torrents are selected from random normal variate favoring higher seeders.
    3. Torrents selection for random check also uses random normal variate to find random torrent.

@xoriole xoriole force-pushed the feature/passive-popularity-712 branch from 732bbd6 to 928d970 Compare August 24, 2022 09:02
@Tribler Tribler deleted a comment from tribler-ci Aug 24, 2022
@Tribler Tribler deleted a comment from tribler-ci Aug 24, 2022
@Tribler Tribler deleted a comment from tribler-ci Aug 24, 2022
@xoriole xoriole force-pushed the feature/passive-popularity-712 branch from 1f89155 to 7605843 Compare August 24, 2022 12:48
@xoriole xoriole marked this pull request as ready for review August 24, 2022 13:04
@xoriole xoriole requested review from a team and kozlovsky and removed request for a team August 24, 2022 13:04
@xoriole
Copy link
Contributor Author

xoriole commented Aug 24, 2022

retest this please

Copy link
Contributor

@kozlovsky kozlovsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important and good PR! Requires some additional fixes. In my opinion it should be merged to the main branch and not to the release branch, as it is a new feature that can affect the release stability.

src/tribler/core/utilities/utilities.py Outdated Show resolved Hide resolved
src/tribler/core/utilities/utilities.py Outdated Show resolved Hide resolved
src/tribler/core/utilities/utilities.py Outdated Show resolved Hide resolved
src/tribler/core/utilities/utilities.py Outdated Show resolved Hide resolved
Comment on lines 204 to 213
total_torrents = self.mds.TorrentState.select().count()
random_index = self._select_randomly_likely_popular_torrent_index(total_torrents)

random_popular_torrent = self.mds.TorrentState.select(lambda g: g.last_check < last_fresh_time)\
.order_by(lambda g: (desc(g.seeders), g.last_check))\
.fetch(limit=1, offset=random_index)[:]

old_torrents = list(self.mds.TorrentState.select(lambda g: g.last_check < last_fresh_time).
order_by(lambda g: (g.last_check, desc(g.seeders))).limit(TORRENT_SELECTION_POOL_SIZE))
random_old_torrent = self.mds.TorrentState.select(lambda g: g.last_check < last_fresh_time)\
.order_by(lambda g: (g.last_check, desc(g.seeders)))\
.fetch(limit=1, offset=random_index)[:]
Copy link
Contributor

@kozlovsky kozlovsky Aug 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is not a good idea to use normally distributed offsets when selecting random torrents from the database for the check.

Let's consider a real-life situation when a user subscribed to several popular channels and has 3.5 million torrents in the database. In this case, the average offset will be approximately 1 million. So, to select a random torrent from the database, the SQL engine should sort torrents by last_check and seeders columns and then skip the first one million torrents to select a random one for the check. On a decent machine with an SSD drive, it took around 2 seconds just to select a single random torrent. During that time, the event loop is blocked, and it may be much slower on a machine with a traditional HDD drive.

It is not as necessary to select random torrents in a perfectly normalized way. Instead, we can select N completely random torrents from the database and choose the most popular from these torrents. It should be much faster. On the same machine, the query self.mds.TorrentState.select_random(100) takes four milliseconds for execution, much better than 2 seconds for the original query.

So, I suggest doing the following instead:

  1. Select N random torrents
  2. From these torrents, in Python, filter torrents with torrent.last_check < last_fresh_time (most torrents from the database should satisfy this criteria)
  3. Order the resulting list by the number of seeders, and choose the most popular one.

Something like that (not tested):

random_torrents = self.mds.TorrentState.select_random(100)
old_torrents = [t for t in random_torrents if t.last_check < last_fresh_time]
result = []
if old_torrents:
    random_old_torrent = old_torrents[0]
    result.append(random_old_torrent)
    if len(old_torrents) > 1:
        popular_random_torrents = sorted(old_torrents[1:], key = lambda torrent: torrrent.seeders, reverse=True)
        most_popular_random_torrent = popular_random_torrents[0]
        result.append(most_popular_random_torrent)
return result    

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the opinion and the suggestion. I think this needs more in-depth thinking and thus can be its own separate PR that only deals with the changes to the torrent checker and not with the Popularity community. Therefore, I'll remove these changes from PR.

@xoriole xoriole force-pushed the feature/passive-popularity-712 branch from 7605843 to b6289b0 Compare September 14, 2022 07:55
@xoriole xoriole changed the base branch from release/7.12 to main September 14, 2022 07:56
@xoriole xoriole marked this pull request as draft September 14, 2022 08:27
@xoriole xoriole force-pushed the feature/passive-popularity-712 branch from 4c79e26 to 8b46066 Compare September 14, 2022 09:15
@xoriole
Copy link
Contributor Author

xoriole commented Sep 14, 2022

Seems src/tribler/gui/tests/test_gui.py::test_add_deeptorrent test is failing on Windows with stack overflow

src/tribler/gui/tests/test_gui.py::test_add_deeptorrent Windows fatal exception: stack overflow

Thread 0x000020ec (most recent call first):
  File "C:\Users\tribler\AppData\Local\Programs\Python\Python38\lib\threading.py", line 306 in wait
  File "C:\Users\tribler\AppData\Local\Programs\Python\Python38\lib\threading.py", line 558 in wait
  File "C:\Users\tribler\AppData\Local\Programs\Python\Python38\lib\threading.py", line 1252 in run
  File "C:\Users\tribler\AppData\Local\Programs\Python\Python38\lib\threading.py", line 932 in _bootstrap_inner
  File "C:\Users\tribler\AppData\Local\Programs\Python\Python38\lib\threading.py", line 890 in _bootstrap

Current thread 0x00001bf8 (most recent call first):
  File "E:\jenkins\workspace\GH_Tribler_PR_Tests\PR_GUI_win64\tribler\src\tribler\gui\widgets\torrentfiletreewidget.py", line 253 in <lambda>
  File "E:\jenkins\workspace\GH_Tribler_PR_Tests\PR_GUI_win64\tribler\src\tribler\gui\widgets\torrentfiletreewidget.py", line 72 in subtree
  File "E:\jenkins\workspace\GH_Tribler_PR_Tests\PR_GUI_win64\tribler\src\tribler\gui\widgets\torrentfiletreewidget.py", line 252 in get_selected_items
  File "E:\jenkins\workspace\GH_Tribler_PR_Tests\PR_GUI_win64\tribler\src\tribler\gui\widgets\torrentfiletreewidget.py", line 225 in fill_entries
  File "E:\jenkins\workspace\GH_Tribler_PR_Tests\PR_GUI_win64\tribler\src\tribler\gui\dialogs\startdownloaddialog.py", line 198 in on_received_metainfo
  File "E:\jenkins\workspace\GH_Tribler_PR_Tests\PR_GUI_win64\tribler\src\tribler\gui\utilities.py", line 411 in trackback_wrapper
  File "E:\jenkins\workspace\GH_Tribler_PR_Tests\PR_GUI_win64\tribler\src\tribler\gui\tribler_request_manager.py", line 228 in on_finished
  File "E:\jenkins\workspace\GH_Tribler_PR_Tests\PR_GUI_win64\tribler\src\tribler\gui\tribler_request_manager.py", line 122 in <lambda>
  File "E:\jenkins\workspace\GH_Tribler_PR_Tests\PR_GUI_win64\tribler\src\tribler\gui\utilities.py", line 411 in trackback_wrapper
  File "E:\jenkins\workspace\GH_Tribler_PR_Tests\PR_GUI_win64\tribler\src\tribler\gui\tests\test_gui.py", line 157 in wait_for_list_populated
  File "E:\jenkins\workspace\GH_Tribler_PR_Tests\PR_GUI_win64\tribler\src\tribler\gui\tests\test_gui.py", line 442 in test_add_deeptorrent
  File "E:\jenkins\workspace\GH_Tribler_PR_Tests\PR_GUI_win64\tribler\venv\lib\site-packages\_pytest\python.py", line 192 in pytest_pyfunc_call
  File "E:\jenkins\workspace\GH_Tribler_PR_Tests\PR_GUI_win64\tribler\venv\lib\site-packages\pluggy\_callers.py", line 39 in _multicall
  File "E:\jenkins\workspace\GH_Tribler_PR_Tests\PR_GUI_win64\tribler\venv\lib\site-packages\pluggy\_manager.py", line 80 in _hookexec
  File "E:\jenkins\workspace\GH_Tribler_PR_Tests\PR_GUI_win64\tribler\venv\lib\site-packages\pluggy\_hooks.py", line 265 in __call__
  File "E:\jenkins\workspace\GH_Tribler_PR_Tests\PR_GUI_win64\tribler\venv\lib\site-packages\_pytest\python.py", line 1761 in runtest
  File "E:\jenkins\workspace\GH_Tribler_PR_Tests\PR_GUI_win64\tribler\venv\lib\site-packages\_pytest\runner.py", line 166 in pytest_runtest_call
  File "E:\jenkins\workspace\GH_Tribler_PR_Tests\PR_GUI_win64\tribler\venv\lib\site-packages\pluggy\_callers.py", line 39 in _multicall
  File "E:\jenkins\workspace\GH_Tribler_PR_Tests\PR_GUI_win64\tribler\venv\lib\site-packages\pluggy\_manager.py", line 80 in _hookexec
  File "E:\jenkins\workspace\GH_Tribler_PR_Tests\PR_GUI_win64\tribler\venv\lib\site-packages\pluggy\_hooks.py", line 265 in __call__
  File "E:\jenkins\workspace\GH_Tribler_PR_Tests\PR_GUI_win64\tribler\venv\lib\site-packages\_pytest\runner.py", line 259 in <lambda>
  File "E:\jenkins\workspace\GH_Tribler_PR_Tests\PR_GUI_win64\tribler\venv\lib\site-packages\_pytest\runner.py", line 338 in from_call
  File "E:\jenkins\workspace\GH_Tribler_PR_Tests\PR_GUI_win64\tribler\venv\lib\site-packages\_pytest\runner.py", line 258 in call_runtest_hook
  File "E:\jenkins\workspace\GH_Tribler_PR_Tests\PR_GUI_win64\tribler\venv\lib\site-packages\_pytest\runner.py", line 219 in call_and_report
  File "E:\jenkins\workspace\GH_Tribler_PR_Tests\PR_GUI_win64\tribler\venv\lib\site-packages\_pytest\runner.py", line 130 in runtestprotocol
  File "E:\jenkins\workspace\GH_Tribler_PR_Tests\PR_GUI_win64\tribler\venv\lib\site-packages\_pytest\runner.py", line 111 in pytest_runtest_protocol
  File "E:\jenkins\workspace\GH_Tribler_PR_Tests\PR_GUI_win64\tribler\venv\lib\site-packages\pluggy\_callers.py", line 39 in _multicall
  File "E:\jenkins\workspace\GH_Tribler_PR_Tests\PR_GUI_win64\tribler\venv\lib\site-packages\pluggy\_manager.py", line 80 in _hookexec
  File "E:\jenkins\workspace\GH_Tribler_PR_Tests\PR_GUI_win64\tribler\venv\lib\site-packages\pluggy\_hooks.py", line 265 in __call__
  File "E:\jenkins\workspace\GH_Tribler_PR_Tests\PR_GUI_win64\tribler\venv\lib\site-packages\_pytest\main.py", line 347 in pytest_runtestloop
  File "E:\jenkins\workspace\GH_Tribler_PR_Tests\PR_GUI_win64\tribler\venv\lib\site-packages\pluggy\_callers.py", line 39 in _multicall
  File "E:\jenkins\workspace\GH_Tribler_PR_Tests\PR_GUI_win64\tribler\venv\lib\site-packages\pluggy\_manager.py", line 80 in _hookexec
  File "E:\jenkins\workspace\GH_Tribler_PR_Tests\PR_GUI_win64\tribler\venv\lib\site-packages\pluggy\_hooks.py", line 265 in __call__
  File "E:\jenkins\workspace\GH_Tribler_PR_Tests\PR_GUI_win64\tribler\venv\lib\site-packages\_pytest\main.py", line 322 in _main
  File "E:\jenkins\workspace\GH_Tribler_PR_Tests\PR_GUI_win64\tribler\venv\lib\site-packages\_pytest\main.py", line 268 in wrap_session
  File "E:\jenkins\workspace\GH_Tribler_PR_Tests\PR_GUI_win64\tribler\venv\lib\site-packages\_pytest\main.py", line 315 in pytest_cmdline_main
  File "E:\jenkins\workspace\GH_Tribler_PR_Tests\PR_GUI_win64\tribler\venv\lib\site-packages\pluggy\_callers.py", line 39 in _multicall
  File "E:\jenkins\workspace\GH_Tribler_PR_Tests\PR_GUI_win64\tribler\venv\lib\site-packages\pluggy\_manager.py", line 80 in _hookexec
  File "E:\jenkins\workspace\GH_Tribler_PR_Tests\PR_GUI_win64\tribler\venv\lib\site-packages\pluggy\_hooks.py", line 265 in __call__
  File "E:\jenkins\workspace\GH_Tribler_PR_Tests\PR_GUI_win64\tribler\venv\lib\site-packages\_pytest\config\__init__.py", line 164 in main
  File "E:\jenkins\workspace\GH_Tribler_PR_Tests\PR_GUI_win64\tribler\venv\lib\site-packages\_pytest\config\__init__.py", line 187 in console_main
  File "E:\jenkins\workspace\GH_Tribler_PR_Tests\PR_GUI_win64\tribler\venv\Scripts\pytest.exe\__main__.py", line 7 in <module>
  File "C:\Users\tribler\AppData\Local\Programs\Python\Python38\lib\runpy.py", line 87 in _run_code
  File "C:\Users\tribler\AppData\Local\Programs\Python\Python38\lib\runpy.py", line 194 in _run_module_as_main

@xoriole
Copy link
Contributor Author

xoriole commented Sep 14, 2022

retest this please

1 similar comment
@xoriole
Copy link
Contributor Author

xoriole commented Sep 15, 2022

retest this please

@kozlovsky
Copy link
Contributor

#7052 should fix the issue

@xoriole xoriole force-pushed the feature/passive-popularity-712 branch from 8b46066 to 03bb3c8 Compare September 16, 2022 09:16
@xoriole
Copy link
Contributor Author

xoriole commented Sep 16, 2022

retest this please

1 similar comment
@xoriole
Copy link
Contributor Author

xoriole commented Sep 19, 2022

retest this please

Add missing super call in introduction request callback in Popularity community
@xoriole xoriole force-pushed the feature/passive-popularity-712 branch from 03bb3c8 to dfa1901 Compare September 19, 2022 13:54
@xoriole xoriole marked this pull request as ready for review September 19, 2022 14:10
Copy link
Contributor

@kozlovsky kozlovsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that sometimes get_normally_distributed_positive_integers function can be really slow. Like, executing get_normally_distributed_positive_integers(9999, 10000) requires more than one minute on my PC.

However, with a low value of size argument, like GOSSIP_RANDOM_TORRENT_COUNT which is 10, it is not a problem.

@xoriole xoriole merged commit 759c445 into Tribler:main Sep 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants