Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speedup channel commits using recursive SQL queries #5469

Closed
hbiyik opened this issue Jul 13, 2020 · 36 comments
Closed

Speedup channel commits using recursive SQL queries #5469

hbiyik opened this issue Jul 13, 2020 · 36 comments

Comments

@hbiyik
Copy link

hbiyik commented Jul 13, 2020

Hello

I have added around 300/400k torrents to my channel, and when I issues commit to the channel, The commit take too long time, and after some time, i received a crash. Too bad that i did not get any logs of it.

Now when I try to reopen tribler again, it tries to regenerate the channel form metadata db, but the process takes literally days to just commit. Funny enough, when the restoration is succefull, and you are patient enough, the core is stuck for some reason, but can close gracefully. After a close/open of tribler, you start retoration again, and it takes a lot of time again.

You can find the metadata.db in the below link. Drop it to you tribler dir, and you will see the problem.

https://ufile.io/7gqv5ump

Tested with a AMD 8320 CPU and 12gb of ram with NVME disk, so the setup is quite fast.

@devos50 devos50 added this to the V7.5: core refactoring milestone Jul 14, 2020
@qstokkink
Copy link
Contributor

This is a known downside of the current implementation, this has been discussed in #3615 (comment)

@qstokkink
Copy link
Contributor

@ichorid as you made this community, do you feel #3615 sufficiently captures this issue or do you want to keep this one open?

@hbiyik
Copy link
Author

hbiyik commented Jul 14, 2020

With recursive SQL queries, the select + sort time of 700K torrents takes about 25 seconds.

https://www.sqlite.org/lang_with.html

The problem is, collection node queries a big dataset, does pythonic recursive sorts (factorial penalty !), puts them in temp data structures (messes the ram) sorts it several times (cpu penalty), i mean this is wrong in so many dimensions, i am not even talking about ORM overhead.

@hbiyik
Copy link
Author

hbiyik commented Jul 14, 2020

Also data should be traversed over generators

@ichorid
Copy link
Contributor

ichorid commented Jul 14, 2020

I did a recursive SQL implementation of commits half a year ago. Indeed, the performance becomes radically better. ORM does not hurt the recursive SQL implementation, as it is just that - a single custom SQL query. Also, I got an alternative, pure SQLite->SQLite (SQLite ATTACH based) channel torrent format in the works. However, because of the lack of manpower during the last half a year, I had to defer these changes.

Hopefully, I'll be able to finish the stuff for 7.6.

@ichorid ichorid closed this as completed Jul 14, 2020
@ichorid
Copy link
Contributor

ichorid commented Jul 14, 2020

@hbiyik , the metadata DB cannot be used with unmodified Tribler source to re-commit the private channel w/o the corresponding private key, BTW.

@hbiyik
Copy link
Author

hbiyik commented Jul 15, 2020

Please dont close this issue, mark it as enhancement or something, but the issue #3615 is discussing the difference between good and perfect.

This issue is about the difference between terrible and OK. Currently it is not usable at all and you dont even need to go 100K torrents at all to see this. I am working on a way to improve that, below some sql snippets to improve the recursive selection

23 seconds on indexed id_ column which is missing

WITH recursive Nodes(metadata_type, origin_id, id_, title, status, lvl) as (
SELECT ChannelNode.metadata_type, ChannelNode.origin_id, ChannelNode.id_, ChannelNode.title, ChannelNode.status, 0 FROM ChannelNode WHERE
ChannelNode.public_key = x'pk'
AND ChannelNode.metadata_type = 300
AND ChannelNode.status in (0,1,6)
union
SELECT ChannelNode.metadata_type, ChannelNode.origin_id, ChannelNode.id_, ChannelNode.title, ChannelNode.status, Nodes.lvl + 1 FROM ChannelNode, Nodes
WHERE ChannelNode.id_ = Nodes.origin_id
) SELECT * FROM Nodes ORDER BY lvl DESC;

Orphan Clean Up Recursive SQL

3 seconds

WITH RECURSIVE Nodes(id_) as (
SELECT ChannelNode.id_ FROM ChannelNode
WHERE
ChannelNode.metadata_type = 400
AND ChannelNode.public_key = x'pk'
UNION
SELECT ChannelNode.id_ FROM ChannelNode, Nodes
WHERE ChannelNode.origin_id = Nodes.id_
) DELETE FROM ChannelNode WHERE
  ChannelNode.public_key = x'pk'
  AND ChannelNode.id_ not in (select id_ from Nodes)

Of course before cleaning up, those data should be dumped to lz4 blobs as well.

Currently i am working on it and I may provide irregular feedback due to the fact that i am not very available in weekdays.

@ichorid ichorid reopened this Jul 15, 2020
@ichorid ichorid changed the title Cant commit bulk torrents to channels Speedup channel commits using recursive SQL queries Jul 15, 2020
@ichorid
Copy link
Contributor

ichorid commented Jul 15, 2020

Unless it is something completely fool-proof and really simple, like 1-to-1 replacing of ORM queries with recursive SQL, this is going to go into 7.6.
If you want it in 7.5.2, we expect to release it July 20.

@synctext
Copy link
Member

btw @arvidn @hbiyik 3-5 November or 17-19 November we would like to organise "TriblerCon". Small-scale physical meeting in the countryside with high-end Github access. Gathering of you and the 12 Delft people working on Tribler for detailed alignment, nerd talk, 🍺 and roadmap for coming year. My dream is to get VLC-founder and Kodi developers to attend future editions and merge it with our scientific workshop, https://dicg2020.github.io

With recursive SQL queries, the select + sort time of 700K torrents takes about 25 seconds.

Because of your high-quality work we would like to refund your airfare if you could possibly attend TriblerCon during these Corona times..

this is going to go into 7.6.

Looking forward to seeing "Recursive SQL" that reduces insert time. Note that the channel format will change at some point in the future. It needs to become easier to parse in real-time. Each MByte downloaded on disk is both parsed, inserted and displayed.. That is not easy to realise, requires adding a logical delete field to magnet links and alignment of 4MByte or so Torrent block/piece (piece picking) with 4MByte or so metadata storage block with padding. In the even further future we will support storing HD preview pictures of the content inside the metadata swarm for additional GUI appeal. Channel owners can then optionally fill in this content. It enables Tribler to move towards https://trontv.com/ Thumbnail-based content navigation. (since Tribler 4.0 days we tried to get this working also)

@ichorid
Copy link
Contributor

ichorid commented Jul 16, 2020

@hbiyik , we will be really glad to see you at Delft in person! 👍

@hbiyik
Copy link
Author

hbiyik commented Jul 16, 2020

@synctext it is a pleasure for me to join such an event and getting to know members closely as well. It seems like KLM is already OK to deliver me from Germany despite COVID crisis and also free lecture from Prof. Pouwelse is a nice addition (even though i lose tracking after 20th or 30th minute or so :) when he starts to talk about Ethics, Freedom and high moral grounds, i hope several beers can increase my mileage in this context so count me in!

@ichorid i am sure that replacing SQL queries wont help by itself, there will be blood for sure. All the database iteration should be synchronous with the rest of the operations with generator iterators to prevent ram torture. Thus i am working on a fork in my repo, when the i build the confidence that this works without any side affects or regression, we can discuss which branch to merge, meanwhile you can keep working on an actual solution with the attach addition and multiple DBs. You never know what you will get with the lifecycle management of a software anyways :).

But I will need some support especially i am not very confident about some legacy definitions and old database formats as well.

@hbiyik
Copy link
Author

hbiyik commented Jul 16, 2020

@synctext forgot to mention, I was also going to bring that metadata possibilty for each channel content and channel itself, like the one there is in kodi. So i am glad that it is somewhere in the back of your head.

I dont want to sidetrack but only critical design clue could be, providing and mediaid for the content where the metadata is maintained by a thirdparty. For further details i would prefer to discuss offline, for the reasons you might guess :)

https://codedocs.xyz/AlwinEsch/kodi/group__python__xbmcgui__listitem.html#ga0b71166869bda87ad744942888fb5f14

@arvidn
Copy link

arvidn commented Jul 16, 2020

btw @arvidn @hbiyik 3-5 November or 17-19 November we would like to organise "TriblerCon". Small-scale physical meeting in the countryside with high-end Github access. Gathering of you and the 12 Delft people working on Tribler for detailed alignment, nerd talk, beer and roadmap for coming year. My dream is to get VLC-founder and Kodi developers to attend future editions and merge it with our scientific workshop, https://dicg2020.github.io

count me in! (assuming the netherlands will accept a swede :) )

@ichorid
Copy link
Contributor

ichorid commented Jul 16, 2020

@hbiyik , I should warn you about the biggest problem with Channels stuff so far: reactor hogging. To circumvent the problem, I had to come up with a kind of a congestion control mechanism. It is not enough to just make the code fast. You need to make it asynchronous as well.
For 7.6 I planned to create a separate asynchronous priority queue to handle DB requests. Nonetheless, you are welcome to come up with a better solution!

@hbiyik
Copy link
Author

hbiyik commented Jul 25, 2020

hello again after a short silence. @ichorid, @qstokkink

https://github.com/hbiyik/tribler/commit/47ac1b8a895760a79b4998d2849ad8b9e97bd225

You can find proof of concept work in the above commit. Please note that it is not a working example at all but an experiment to see the boundaries in terms of performance. I chose consolidate_channel_torrent method to work with to trigger cpu/mem heavy operations.

This is not complete because i am not using orm entities, and yes i have tried to make it work with pony but, i could not find a way to implement my design choices, if there is a way to do it in pony as well, that is also ok.

Ideas under test:

  1. Outsource all recursive and costly elements of data-set to sqlite, including sorting, timestamping, recursive selection etc, dont do such stuff in python. Rationale is, no pure python implentation can be faster that sqlite in terms of data operations.

    Result: SQL recursive selection is quite accurate and fast, may be 100x or more however, sqlite operations are blocking, but i think thread utilization should be solution to overcome this.
    Test data is around 300K recursive node to commit: selection time, including timestamp correction, and recursive member counting, and sorting around : 20s.

  2. Use iterators and generators, to leverage using as small as ram possible. Also this method snychronizes both, data traversal, signing, compression, and writing the mdblob. Since it is synchronized, it can easily be asynchronized in the reactor with the correct task prios.

    Result:

  • In terms of cpu itlization, it uses full single core, but i think this is due to the fact that this utilization should be controlled somewhere else, (tasks, or taskmanager?) so to me this is a good result.
  • With this method there is no mdblob size limit, because the writer process does not consume any memory at all.
  • The signing process is quite very very fast, so this is something not to be considered as an optimization point, on the contrary, instead of integrity checking the batch data, signing again is a feasible choice. (may be this is only the case for x86?)
  • Compression and serialization is equally the cpu intensive operations here, i am amazed that serialization performance from ipv8 is quite near perfect in terms python performance, but i needed wrap around it few methods not to instantiate a new serializer class on each record. That part of code is just placeholder for me to work with tuples instead of bindings.
  • Overall writing of 300K selected records one single mdblob with the new signatre took around 60s which is quite fast and every ms in this process i think can be asynched which is nice.
  • Further update of the database is not implemented because this evolves to complete re-writing of the class. However if we were to implement correct way would be to batch update with a pivot table all instead of updating each record sequentially. ie: https://stackoverflow.com/a/11564511

TLDR: Above commit shows a poc, where it is possible to dump 300K recursive nodes to 1 single 65MB compressed and signed mdblob in 1.5 minute where %70 of this process can be quite asynch

@ichorid
Copy link
Contributor

ichorid commented Jul 26, 2020

Impressive job, @hbiyik ! We can almost put it into our codebase as it is. However, your implementation lacks two important details:

  • every entry's signature must be updated in the DB as well. After you call sign on a serialized entry, there must be a command to update the corresponding DB entries. Is that what you meant in your words about "Further update of the database"?
  • timestamp field should not be updated from the system time. I agree the name is misleading. Instead, timestamps must use an increasing counter. In the current implementation the timestamp is only initialized from the clock, as an optimization to avoid maintaining a counter in the DB. Using the system clock instead of an integer counter leads to the risk of creating multiple entries with the same timestamp. The Channels system expects timestamps to be unique and increasing (for a given public key).

EDIT:

Further update of the database is not implemented because this evolves to complete re-writing of the class.

You don't have to rewrite the class (if you mean the ORM class) at all. You just need to be sure the SQL <-> Pony objects mapping/cache is flushed before running your direct SQL stuff. Then Pony can proceed to work as normal.

@ichorid
Copy link
Contributor

ichorid commented Jul 26, 2020

Also, to test the correctness of your code you can try replacing consolidate_channel_torrent with your implementation and run the following two tests:

These two tests thoroughly check all possible situations and content types that can be put into a channel. To visually control the results you can use the pprint_tree method:

@hbiyik
Copy link
Author

hbiyik commented Jul 28, 2020

You don't have to rewrite the class (if you mean the ORM class) at all. You just need to be sure the SQL <-> Pony objects mapping/cache is flushed before running your direct SQL stuff. Then Pony can proceed to work as normal.

So the only way to do is, instantiate the entity from_dict, with the row, directly with the result of cursor.fetchone(). Because under the hood pony always uses fetchall() or fetchmany(). This will introduce a little bit overhead (cpu+mem) and hopefully should be ok, or not gotta test it. but if i understand you on flushing the orm cache correctly, on each iteration we need run ORM cache flush and may be a manual GC collection after we are done with the entity instance. But the concern i have is GC in Cpython only works where there is no further reference to this particular object. If the instance is somehow referenced in somewhere in pony (through some saching mechanism) the RAM will still keep stacking on itself.

Is that what you meant in your words about "Further update of the database"?

Yes, that part is rather easy i think and the method here is almost similar with the seelction but a simpler one, creating a pivot table on the sqlite memory space, and updating this pivot table on the actual table. Thus all dirty stuff to be handled on sqlite part again. I expect this would not require more resource than selection.

timestamp field should not be updated from the system time. I agree the name is misleading. Instead, timestamps must use

Yes, that part i think is a misconception on my end, i need check in detail how this timestamping works in both writer and reader directions, and also the tests you mentioned would help a lot.

@ichorid
Copy link
Contributor

ichorid commented Jul 28, 2020

So the only way to do is, instantiate the entity from_dict, with the row, directly with the result of cursor.fetchone

No, what I meant is you should not rely on Pony to fetch stuff at all, but instead access SQLite directly as you do. It should work like this (in pseudocode):

get the public key and id of the channel through ORM
finish the ORM session
invalidate ORM cache/mapping
do your SQL stuff directly with cursor as a single transaction
continue to work with ORM as normal

You don't need to flush the ORM cache on every iteration since you do not use Pony object in your code. Instead, you must ensure that all ORM-induced changes are flushed to DB before you start manipulating it directly.

@kozlovsky
Copy link
Collaborator

kozlovsky commented Jul 29, 2020

Hi @hbiyik! I'm the author of PonyORM. @ichorid asked me to look at the code that you suggested regarding its compatibility with PonyORM.

I don't see any problems regarding interaction with PonyORM. It is a good practice to rewrite heavy queries in raw SQL to avoid ORM overhead.

Some small comments:

  1. It is not necessary to wrap consolidate_channel_torrent or consolidate_channel_torrent2 with @db_session decorator, as these methods are called inside another db_sessions and nested db_sessions are ignored by ORM.
  2. It is not necessary to manually perform flush() before executing raw SQL queries. When you call db._exec_sql(...) PonyORM flushes all previous unsaved changes automatically.
  3. Usually, it is better to use db.execute(...) method instead of db._exec_sql() to issue raw SQL queries. db.execute is a part of public API, while db._exec_sql() is a part of implementation details (db.execute calls db._exec_sql internally). Note that db.execute automatically creates query parameters for embedded Python expressions like $var_name or $(some expression) so the result is safe from SQL injections.
  4. But in the case of channel commit implementation, it is probably better to keep use of db._exec_sql(...) as you do. The reason is, db.execute() starts a transaction and this locks SQLite database. We probably don't want to lock the database during this long query execution and subsequent file creation. If concurrent transactions will never update rows involved in channel commit process this should be safe.
  5. It is necessary to use ORM objects only if they have some application-specific logic implemented in Python which we don't want to duplicate in raw SQL updates. In the current case of channel commit implementation, there is no such logic it seems.

So, the correct integration with PonyORM the code should be like that:

  1. start db_session
  2. get the public key and id of the channel through ORM
  3. don't do any updates of ORM objects to prevent starting of transaction and database locking
  4. Perform raw SQL query
  5. Process the generator result without creating any ORM objects
  6. exit from db_session
  7. if necessary, start new db_session and update some ORM objects (as GigaChannelManager.updated_my_channel currently does)

To me, the current code looks correct regarding SQL <-> ORM interaction. It already does what I have described above.

@hbiyik
Copy link
Author

hbiyik commented Jul 29, 2020

@ichorid thanks for the tips, @kozlovsky thanks for your time to review and put the suggestions in a composed way, I think i have the overall way of how to do it without hurting ORM

It is necessary to use ORM objects only if they have some application-specific logic implemented in Python which we don't want to duplicate in raw SQL updates. In current case of channel commit implementation there is no such logic it seems.

Only in above statement, serializer/deserialization is a member of entity classes, however i think reimplementing this as an alternate way with the row tuples/dicts is the only solution, that should not be a very difficult task, but need to test it throughly against weird pitfalls

ie: in poc commit there is no delete serialization because it is just a poc. and also it is based on tuples, dict like object is a cleaner option and sqlite.Row factory to be defined in the connection statement since this is a connection configuration in _sqlite.

https://github.com/python/cpython/blob/b146568dfcbcd7409c724f8917e4f77433dd56e4/Modules/_sqlite/cursor.c#L122
https://docs.python.org/3/library/sqlite3.html#sqlite3.Row

@hbiyik hbiyik closed this as completed Jul 29, 2020
GigaChannels automation moved this from In progress to Done Jul 29, 2020
@hbiyik hbiyik reopened this Jul 29, 2020
GigaChannels automation moved this from Done to In progress Jul 29, 2020
@hbiyik
Copy link
Author

hbiyik commented Aug 4, 2020

@ichorid

Since working with direct rows is the direction, I designed a MDBlob class, that serializes, sings, compresses and writes the mdblobs from rows. and also reads, deserializes, and generates rows back from the mblobs. Benefits:

  1. there is no congestions or async issues, all class is async
  2. there is no ram consumption, litereally all class consumes few bytes, reader and writer is implmeneted in a walker like design.
  3. quite fast: 1 Million record writer in about 2 minutes, 1 million record reader under 1 minute.
  4. should work also with entities from_dict & to_dict methods as well (not tested though)

CONS:

  1. It uses an alternative implementation of serialization rather than what is available in serialization module of metadata_store package, this needs to be DRYed, but i think, this serialization module also needs some refactoring anyhow to optimize the memory consumption since all entities creates a new instance of payload classes.

You can check the implementation in

https://github.com/hbiyik/tribler/commit/67d93f303b954d6c6e0344646130518038d6d20b

And run some tests at:

https://github.com/hbiyik/tribler/blob/67d93f303b954d6c6e0344646130518038d6d20b/src/tribler-core/tribler_core/modules/metadata_store/tests/test_mdblob.py

Here are some results on my end numbers depend on the host cpu and your disk type.

My case
CPU: 8core AMD FX8320 OCed to 4.2Ghz,
RAM: 12GB of DDR3 ram (not really relevant)
DISK: Disk1: 5400RPM, Disk2: 7200RPM, Disk3: Samsung EVO 970 NVME SSD drive

Perf table
Benchmark Type Sample Count Krows / sec MByte / sec Total Sec Total MB Disk Type
Write 500 True 5.324 3.218 0.093 0.302 5400 RPM
Write 500 True 4.669 2.822 0.107 0.302 7200 RPM
Write 500 True 10.883 6.579 0.045 0.302 NVME
Write 1000 True 8.097 4.854 0.123 0.599 5400 RPM
Write 1000 True 8.404 5.038 0.118 0.599 7200 RPM
Write 1000 True 10.880 6.522 0.091 0.599 NVME
Write 10000 True 7.968 4.635 1.254 5.816 5400 RPM
Write 10000 True 7.535 4.383 1.326 5.816 7200 RPM
Write 10000 True 11.287 6.565 0.885 5.816 NVME
Write 1000000 True 8.227 4.806 121.537 584.160 5400 RPM
Write 1000000 True 8.251 4.820 121.187 584.160 7200 RPM
Write 1000000 True 11.416 6.668 87.593 584.160 NVME
Write 500 False 7.352 4.410 0.068 0.299 5400 RPM
Write 500 False 7.874 4.723 0.063 0.299 7200 RPM
Write 500 False 11.678 7.005 0.042 0.299 NVME
Write 1000 False 6.491 3.750 0.154 0.577 5400 RPM
Write 1000 False 6.509 3.761 0.153 0.577 7200 RPM
Write 1000 False 10.856 6.273 0.092 0.577 NVME
Write 10000 False 8.124 4.738 1.230 5.832 5400 RPM
Write 10000 False 8.697 5.072 1.149 5.832 7200 RPM
Write 10000 False 11.663 6.802 0.857 5.832 NVME
Write 1000000 False 8.465 4.916 118.127 580.751 5400 RPM
Write 1000000 False 8.426 4.893 118.666 580.751 7200 RPM
Write 1000000 False 12.071 7.010 82.841 580.751 NVME
Read 500 True 7.928 4.792 0.063 0.302 5400 RPM
Read 500 True 8.194 4.953 0.061 0.302 7200 RPM
Read 500 True 12.197 7.373 0.040 0.302 NVME
Read 1000 True 12.109 7.259 0.082 0.599 5400 RPM
Read 1000 True 10.862 6.511 0.092 0.599 7200 RPM
Read 1000 True 12.600 7.553 0.079 0.599 NVME
Read 10000 True 11.891 6.916 0.840 5.816 5400 RPM
Read 10000 True 10.797 6.280 0.926 5.816 7200 RPM
Read 10000 True 12.937 7.525 0.772 5.816 NVME
Read 1000000 True 10.905 6.370 91.700 584.160 5400 RPM
Read 1000000 True 10.913 6.375 91.628 584.160 7200 RPM
Read 1000000 True 13.038 7.616 76.695 584.160 NVME
Read 500 False 22.893 13.732 0.021 0.299 5400 RPM
Read 500 False 22.832 13.696 0.021 0.299 7200 RPM
Read 500 False 26.157 15.690 0.019 0.299 NVME
Read 1000 False 23.731 13.712 0.042 0.577 5400 RPM
Read 1000 False 23.685 13.685 0.042 0.577 7200 RPM
Read 1000 False 27.311 15.781 0.036 0.577 NVME
Read 10000 False 22.017 12.841 0.454 5.832 5400 RPM
Read 10000 False 23.288 13.582 0.429 5.832 7200 RPM
Read 10000 False 26.712 15.579 0.374 5.832 NVME
Read 1000000 False 23.587 13.698 42.394 580.751 5400 RPM
Read 1000000 False 23.693 13.760 42.205 580.751 7200 RPM
Read 1000000 False 26.733 15.525 37.405 580.751 NVME

@ichorid
Copy link
Contributor

ichorid commented Aug 4, 2020

First of all, bravo for the performance 👏

Now, some food for thought:
aiofile: "Async file access is always slower than sync one (because of the thread pool overhead)". Except for Linux, where aiofile uses native Linux filesystem AIO interface.

What you did is, you optimized the file part of mdblob processing for memory usage (with generators). However, the biggest bottleneck is not the file reading performance, but entry insertion into the main DB.
The whole process looks like this:

"file stuff": [read file bytes into memory] -> [decompress bytes] -> [check signature (optional?)] -> [parse bytes (deserialize)] -> 
"db stuff": [check DB invariants (e.g. duplicates and/or older entries)] -> [insert entry into the main DB]

You optimized the "file stuff" part. The biggest bottleneck is in the "db stuff", namely the checks and DB WAL, indexes creation, supporting entry creation (like torrent checker table), etc.
Actually, there is a much simpler and faster way to solve the problem you tried to solve in your latest performance test - instead of serialized entries blobs, use SQLite DBs as a storage format. That will allow us to skip Python completely (except for signature checks - but that's a different story). As I have stated above, using mdblobs for storage was mostly a way to simplify development by unifying the routines and format for torrent-based storage and network transmission.

Anyways, it would be great if, using your new routines, you will measure processing/committing some big channel that is already in the system.

@hbiyik
Copy link
Author

hbiyik commented Aug 4, 2020

Actually, there is a much simpler and faster way to solve the problem you tried to solve in your latest performance test - instead of serialized entries blobs, use SQLite DBs as a storage format.

That mos.def the best way to do this, but also requires redesign of the database, maybe multiple tables, and even maybe multiple databases and almost rewriting of the channels. So i would be watching closely this approach as well. But again my target here is to make things "good" not "perfect". And generally "perfect" is the best enemy of the "good", which is quite "bad".

Let me elaborate the discussion little bit.

aiofile: "Async file access is always slower than sync one (because of the thread pool overhead)". Except for Linux, where aiofile uses native Linux filesystem AIO interface.

The bottleneck when interfacing mdblob is on CPU side, so asynching IO call to the disk would merit but not hat much.

The idea here is slicing the overall routine (with routine i mean DB -> somestuff -> FILE orr vice versa as a whole) to sub coroutines, so the long operation time would not block the reactor, and could be cancelled on demand. and also balacing the resources in the meanwhile (CPU - RAM - IO)
On DB stuff i am using generators (with recursive selection) which are semicoroutines and their scopes can be switched by their nature, on the file stuff this class uses directly coroutines, so in each row iteration, the reactor can switch another coro or cancel the task.

Now the next and final step is the updating DB for this I have several ideas and should be fine as long as I can access the correct locks/semaphores in the Pony, lets see how that will work, i am expecting similar performance with recursive selection.

Anyways, it would be great if, using your new routines, you will measure processing/committing some big channel that is already in the system.

If i had such a thing i would put a PR already :)

@ichorid
Copy link
Contributor

ichorid commented Aug 5, 2020

Generators in Python always run on the same thread where they were called. They do not use threading implicitly. Ergo, stuff in a generator will still block the reactor. Essentially, generators are just syntactic sugar to make up for Python's lack of pointers and goto command. To make your generators stuff truly async, you have to use the asynchronous generator protocol.

Even more important, the reactor does not know when it can switch to another coroutine. The programmer must tell it when to do that - by using the await keyword. However, it only makes sense to await on async methods. Yep, that means that the thing should be async to the bottom down! Well, you may ask then, how does the chain of async calls terminate? At the top, it terminates by wrapping it in an asyncio.Task. And at the bottom, it terminates either by a call to a library function that adheres to the async protocol (e.g. aiofile.Reader ) or by wrapping your (synchronous) code into a thread by using loop.run_in_executor(your_sync_callable).

Task -> await-async -> run_in_executor

Correct me if I'm wrong, but I see neither run_in_executor calls in your generator code, nor bottom awaits for calls to external library functions.

@ichorid
Copy link
Contributor

ichorid commented Aug 5, 2020

There is a simple way to test if the reactor is hogged: create a Task at the top level of your app that will loop endlessly with asyncio.sleep and print something to the console. That is your "heartbeat". If the heartbeat is steady during the performance tests, the reactor is free enough.

@hbiyik
Copy link
Author

hbiyik commented Aug 6, 2020

Correct me if I'm wrong, but I see neither run_in_executor calls in your generator code, nor bottom awaits for calls to external library functions.

Yep, i wasnt very clear on that, I shouldnt write messages in the midnight. The reason is that i did not make them async yet, my point in generators is that in terms of syncronous programming generators are the closest thing to coroutines in asynch, because they are kind of semi-coros running in sync.

The reason i did not do them yet is db execution and cursor movement, those calls should run at the end of the day in a thread (at least the selection part db.execute), because i dont want to block the spinning reactor with those costly db calls. (around 20/30 sec). But the real problem here is

Python _sqlite module is not thread safe at all (even though c-level sqlite3 is). possible solutions:

  1. you can use _sqlite in threaded mode if you serialize the cursor and connection accesses with mutexes. This will run in a serialized single thread mode, but the python call will not block the main reactor. You will still block the reactor in case a coro tries to access db, it will wait for mutex to unlock
  2. you can queue all the sql commands and access db in a seperate single thread, this way you will truly have async flavour in your main reactor. but now the problem is, you have to somehow integrate this model to Pony as well. Actually there is already a wrapper doing this https://github.com/omnilib/aiosqlite

Since I am fetchone method, in either case using (method1: use mutex to serailze the access, method2: queue the sql statements and run sequentially), the cursor will be blocked because sqlite to my knowledge has no concept of cell/row/column/table locks https://sqlite.org/threadsafe.html, even if there is, there is no python library exposes those. (may be option 2 is but i am not sure. option 3 is somehow the same concept as above but in sqlite library itself.)

So bottom line, when cursor is moved accross the db, or a costly statement is executed in sqlite, rest of the the accesses must be blocked from the the app, because the rationale here is again, the costly operations we are doing in the sqlite itself can never be done faster in python, also it introduces bunch memory issues. What I am targetting is leave the reactor free at that point of time so it can handle rest of the tasks related to ipv8 or things not related to DB in general.

At the end of the day I target a design as below.

async def commitnodes():
    with Pony.getcon() as con:
        iterator = await costlydbselect(con)
        upd_cur = con.cursor()
        for row in iterator:
            await MDBlob.write(row)
            upd_cur.execute("sql statement to update stuff")

And currently i am focusing on slicing the iteration to minimum and most affective points, thus currently it is on snyc with generatos, when i deal with SQLite issues, i can easily convert them to coros and taks.

@hbiyik
Copy link
Author

hbiyik commented Aug 6, 2020

by the way, above is just a pseudo code, with context managers those stuff can be made prettier without exposing cursors and connections to all around.

Also in the case with the new design with multiple DBs and attach methods, there will still be an option receive old mdblobs from the older DBs right, so if that works it will be usefull for future as well, regardless of whatever the next redesign is.

@hbiyik
Copy link
Author

hbiyik commented Aug 10, 2020

to be practical:

something like that https://github.com/hbiyik/tribler/commit/ea449281e1ddbc55e9c8d0c18abe6f31085ba4e0, + transaction control on updating

@ichorid
Copy link
Contributor

ichorid commented Aug 11, 2020

Well, the DBExecutor class you provided should work nicely for direct interactions with the database. Basically, you combined ThreadPoolExecutor with sqlite3.row to create a thing that produces sqlite rows from a background thread asynchronously. However, I don't see how we can use it in its current form, aside from adding some syntactic sugar to your recursive SQL channel commit procedure or running some performance tests. This is a step in the right direction though. Last year I was thinking about subclassing the executor myself, until I realized that we will still have transactions blocking each other. Then, we discussed it with @kozlovsky and he formulated the only viable architecture:
an asynchronous "fetcher" for Pony objects that should internally run two queues:

  • a queue for blocking (e.g. read-write) transactions
  • a queue for non-blocking (read-only) transactions

It should work similarly as the thing you developed: you put the transaction in, you get Pony objects out, asynchronously. If we want to be able to do direct SQL transactions, we want them to use the same two queues. Otherwise, there will be deadlocks.

One of the reasons why I am not doing the thing at the moment is that I wanted to wait for @kozlovsky to get here in September and do it together. Nonetheless, your work on recursive commits is very appreciated because it does not touch this topic directly and can be later integrated easily in the queue-based framework.

@synctext
Copy link
Member

quite fast: 1 Million record writer in about 2 minutes, 1 million record reader under 1 minute.

@hbiyik impressive work! It has been a month since there was activity in this thread. @kozlovsky also joined the team now. Can this work be turned into production code which does not block the reactor?
Please get "Good Code" into production before re-doing everything from the ground up to make it perfect. The current channel implementation is too broken for users to use. It's not about insert time, downloads should start instantaneous.

@hbiyik
Copy link
Author

hbiyik commented Sep 11, 2020

@synctext I was in vacation for the last month so couldnt really put focus on this one because of sand, summer sun, and beaches. :)

However from my POV & understanding i can provide executive summary as below:

Best way to manage this is to use multiple dbs/tables and plus attach dbs i think this is the consensus.

Nevertheless old db design will still be usable in the wild, then this re design would bring benefit to commit or merge legacy mdblobs from/to the torrent transport from/to sqlite db.

To answer the question precisely, the async part discussed here, better to be implemented in the pony with the proper queue or locking methodolgy. An iterative way of traversing cursor to be also provided by pony as well to prevent memory issues, i think this part is as vital as ever. So the snippets here are not close to PR quality, due to the fact it somehow introduces redesign of several components ( even pony itself). But the recursive sql code is quite solid in production level, even though this sql snippet is just the tip of iceberg.

@qstokkink
Copy link
Contributor

There are some great insights in this issue, like deep understanding of IPv8 perfection:

i am amazed that serialization performance from ipv8 is quite near perfect in terms python performance

Even though I would like the IPv8 appreciation going, I believe this issue is now mostly out-of-date and should be closed:

  • The channels have been removed.
  • The remaining problematic database interactions have since received their own issues (or should be in their own issue).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
GigaChannels
  
In progress
Development

No branches or pull requests

8 participants