-
-
Notifications
You must be signed in to change notification settings - Fork 992
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Couldn't generate valid v2/hybrid torrent file from torrent_handle::torrent_file()
#6283
Comments
yes, this is a complicated aspect of v2 torrents and the current split between The issue boils down to so when you ask for the https://libtorrent.org/reference-Torrent_Handle.html#torrent-file-torrent-file-with-hashes I would like to work towards a clearer separation of the immutable parts of a torrent (i.e. the That way it would be a lot clearer what belongs where and what's mutable and immutable. I think the first step towards this goal is to add a function that loads a .torrent file and returns an |
this should be mentioned in the |
OK, I see. |
Piece layers are mandatory in .torrent files, and are sufficient to start downloading. The fact that the internal torrent_info object has the piece layers stripped is just because the hashes are already using quite a lot of memory, so it can't really be left in there as a copy. It's unfortunate that this fact is exposed to clients, I wish it wasn't. However, if you just change the call to torrent_file_with_hashes(), it should work. |
But the documentation says that this may not be enough:
|
right. it has always been the case that a magnet link may not have the metadata, and not be able to create a .torrent file. There isn't a huge difference there, except that the piece hashes are downloaded lazily for the most part. |
So, do I have it right that
The v1/v2 difference is surprising, I didn't realize it until this thread as I've only worked with v1 so far |
that's right, except that the merkle hash trees in v2 are downloaded on-demand, so the piece layers may not be available immediately after the metadata is received. |
I talk about the case when metadata is received. Previously we provide the user ability to save .torrent file at this time.
Sorry, I'm confusing. One more problem is downloading torrent added via magnet link during several sessions. Previously we just store metadata in file once it is received and restore torrent next time using this metadata. It looks like it won't work the same way now, because if the metadata is possible saved without "piece layers", then when the torrent is restored (added with this metadata and stored "resume data"), we won't be able to parse metadata successfully, will we? |
👍
It would be best if
👍 |
@arvidn |
yes, I agree. I will look into addressing that. I overlooked this use case and just focused on saving metadata along with the resume data. |
I've just been a bit busy with my day job lately. |
@ssiloti Do you have any comments on this? It seems unnecessary to require .torrent files have To quote glassez:
|
Libtorrent can handle such torrents just fine. The8472 argued in favor of requiring them:
|
Requiring |
Anyway we need convenient way to store immutable torrent data (e.g. info section, author etc.) separately from other "resume" data. Otherwise we have to use some sort of workaround like qbittorrent/qBittorrent#15191. |
You might want to add an option to eagerly download the piece layers so a full torrent can be created. |
I ran into this issue a couple of days ago, where I noticed that anytime I'd try to download a v2 only torrent file from a magnet URL I could not generate a torrent file. Moreover, since the torrent info hash is only ever a sha1 or sha256 on the info section and never includes the piece layers or anything outside the info section, all information outside of that can be changed or omitted. Due to this limitation and the way this should be implemented via BEP9 all other information is missing from a magnet torrent anyway. Wouldn't it make sense to just use the info section then anyway? At least for now that is my workaround, where I just use the info_section() function on the torrent_info object to get a bencoded representation of the info section and then add a libtorrent is absolutely fine with parsing v2 torrents that are missing the piece layers. Integrity of the files is ensured through the pieces root anyway. |
if the reason you're saving a .torrent file it to be able to re-add it if you restart, the resume data is much more practical for that. Then you'll get the partial merkle trees included as well.
Yes, iirc that was a recent change to mitigate issues around this. |
I see. I may not have reproduced the same issue then. The exception I get from a v2-only torrent is in the call to |
I can't actually reproduce this. For me, it either works or |
I suspect it's related to v2 torrents that are so small that not all files have a merkle tree. i.e. when the a file is smaller than one piece. |
Exactly! I've used this hash (AABB9549B326A811616DD0D6617EAC3353A2A8AE) to test to see if there is anything that could tell me if this assertion would fail, but so far I couldn't find anything. My only solution currently is to manually add these hashes to a blacklist so they won't crash the program... As I said I'm sadly not primarily a c++ developer, so this is the best I was able to come up with so far, but if I can get any more information about this issue than I'm happy to help! This is my current list of hashes that are causing this assertion to fail: |
Interesting, I will look into that with my examples! |
I've run through them all, and for sure, all of them seems to have at least one file with the size of 0 bytes, I guess that would be smaller than one piece :) I will make some changes to my program so that it logs the hashes that are like this, to see if I can find any that do not fail in this case! |
Really? Interesting, maybe it has to do something with GCC then? |
I wrote this to try to reproduce it: #6856 |
This part of BEP52 contradicts common sense, IMO. Why require something that is inherently optional?.. |
Built your sample, and run it, same result: Running on Arch (linux 5.11.10-arch1-1) Also, per your recommendation, this workaround seems to be able to detect, which torrent would cause a crash: It also wrote down all the hashes that I collected over the past couple weeks, and they seem to align. So you were right, there should be a problem with small files (although I would argue that it has to be 0 bytes exactly). |
I don't think that it's "optional", as I understand, we only take the merkle tree out of the info section for compatibility/performance reasons, the whole idea behind this is that we could be able to identify the same files between different torrents, without downloading them and checking the files hashes manually. Although, I have to agree, that it is kind of contradictory that we have to download the files in order to calculate the tree itself, which defeats the purpose. I guess this is the reason why @arvidn is reluctant to make it optional, because then nobody would write them down to the torrent files, and we would eventually lose the data. I prefer the idea of making a function that forces the client to download the whole merkle tree without having to download the files, and failing to save it until it is downloaded. |
@hekkr000 that output you took a screenshot of suggests it's not the |
|
Alright, nevermind. I tried to build the project from source so I could do a bit more investigation, but it looks like the official arch repository was a bit out of date not so long ago. It looks like you have already fixed this issue in 2.0.6. I bet this is the one:
Apparently they only uploaded 2.0.6 last Wednesday and I haven't updated since posting this issue last Monday. |
heh, right. I should have remembered that :) |
Piece layers are not optional in v2 torrents, so where's the contradiction @glassez ? |
Piece layers are inherently optional but they are not optional in v2 torrents (BEP52). That's what the contradiction with common sense is. |
Hmm yes I have not found @the8472 's reason for a v2 torrent to be considered valid unless it includes piece layers to itself be valid:
What is necessary (and sufficient) for partial resumes is the root hash. I am not quite sure what is meant by stateless torrent clients, though up/downloading data requires state. On the other hand Vladimir, aren't the file hashes in a v1 torrent also optional given your reasoning? Only the infohash would be necessary for a torrent file. |
I think he means that anything that isn't part of the info-dictionary is (inherently) optional. It's not committed to by the inf-hash. In that sense, the piece hashes in a v1 torrent are not optional, but the piece layers in a v2 torrent are. The rationale for this decision was to allow shorter start-up times when downloading a v2 magnet link, where the info-dict is very small and you download portions of the merkle tree as you go. |
Oh yes I forgot that the infohash in a v1 torrent only commits to the piece hash data (and other metadata) and not the file itself. This is unlike with a v2 torrent where the root hash commits to the file data, hence the piece layer hashes are not necessary. But that means the reason for requiring them in a v2 torrent needs reexamination. Resumption of downloading only needs the root hash... (Maybe the piece layers are also there and can be used as an integrity check on the root hash, to ensure it hasn't experienced a bit flip? If so then there could be a much cheaper integrity check on the root.) |
Partial resume here means partially downloaded files. You can only verify which pieces in a file are already complete if you have the piece hashes. The root hash can only verify a complete file, not a partial one.
Clients that can be pointed at bunch of .torrent files and a filesystem and figure out the rest by looking for partial or complete matches in the filesystem. This way no client-specific state is needed. |
But through the Merkle root hash you can get the piece hash, and through the piece hash you can verify the piece of a partial file you have received correct? So you can still incrementally verify the pieces of a file as it is downloaded, you don't need the complete file to start verifying it.
In this scenario of the filesystem for complete files the client can verify files simply from the Merkle root hashes (and can always recreate the piece layers.) For partially downloaded files they would likely already be augmented with some saved resume data anyway (corresponding to the partially downloaded piece layer hashes) for the client to "figure out the rest." Otherwise you'd be groping around in the dark since "stateless" partial files are what people call corrupt files. |
Get? Over the network, if there's another peer, yes. But you wouldn't know which ones to get, so you'd have to get all of them if a file doesn't verify as a whole, which means you need all piece hashes to be available. Having the
I'm not talking about verifying-while-downloading. I'm talking about having a partially downloaded file on disk and wanting to continue downloading it by importing it to a client. But to do so you need to determine how much you already have.
That "resume data" is part of the client-specific state that's excluded by being stateless.
Not at all. The |
For incrementally downloading a file through the network you would know which further piece hashes to get since you're the one downloading it, as you already have the partial piece layer hashes (this is not your stateless client scenario.) I gather the scenario you have in mind for stateless clients is that you're handed a filesystem with (complete or otherwise) files (it's a mystery as there is no saved resume data) as well as a bunch of torrent files, and no network access. So for every file you have to hash check every piece of it to figure out if it is complete or not, and if it is not which pieces are missing. So for the incomplete files you'd need the all piece layer data to be there already in the torrent files to know this. That is a rather specific circumstance. How common is this? Like how often would one run into this stateless client scenario?
Would that not be partially conflating the partial resumes scenario with the stateless client scenario. (Unless you already meant to equate them in your original remark years ago?) Regardless if you have network access you can always ask for the piece layers from peers through the root hash and then determine how much of the incomplete file's data you already have. See that's the thing: network access is key, and bittorrenting inherently assumes a network. Without it you can indeed determine (with the full piece layers) which pieces of an incomplete file are there but you can't do anything about it; you can't download and then use the file. And after all that's what you want to do in the end, not merely to feel satisfied that you know which parts of an incomplete file you have. |
It's not really one single scenario. It's multiple scenarios that are rather similar. It's moving terabytes of storage between machines. It's renaming files. It's switching clients. Currently clients keep implementation-specific state. The file renames. The progress information. With bittorrent v2 (including Yes, sure, you could try to recover without that information. But the point is to make it dead-simple so one doesn't have to worry about all that junk anymore. Another minor concern was just creating another availability-bottleneck. In addition to pieces dropping below availability of 1 you'd also have to worry about hash information dropping below 1 in some edge-cases. clients cannot share proof-hashes unless they either have to the complete file or store them in yet another client-specific format. Having the piece hashes in the torrent file provides portable format for information you need to store anyway. Sure, for a magnet transfer you only need the infodictionary. But eventually you have to transfer the piece hashes in one form or another anyway. So a .torrent provides a more complete version of that. The spec just makes it a guarantee so that clients don't store it in a non-portable format and other clients can rely on it. |
Yeah I see your point of view here. Having the piece layer requirement in the torrent file guarantees they will be stored somewhere, so that other clients can readily transmit it to you. Without it clients might skip it to save space when finished then you'll not have it easily available, or take a long time to get it. This data should still be stored somewhere after a file is complete so this guarantees it'll be there, and as you said, in a portable format to boot. |
In any case, this does not change the fact that this requirement is "unnatural". It looks more like a way to achieve some unification of
Of course, all these problems should go away with improved support for everything that BitTorrent v2 has brought us. It also revealed a number of shortcomings in libtorrent, which should also be eliminated (for example, the need for a clearer separation of immutable and mutable torrent data). |
Existing applications only know the v1 spec. From their perspective a hybrid torrent without the piece layers is valid. To be affected by the validation logic you must make changes to add v2 logic to comply with the v2 spec. This is one part of that.
You can still do that, it just doesn't form a proper torrent file. I'd recommend naming it differently though. Maybe Another option is to fetch the piece layers eagerly after downloading the metadata.
Agreed |
I mean whether it is natural or not is subjective. I agree that piece layers are strictly unnecessary if you just have the root hash. It is just that as a practical matter, having the piece layer requirement guarantees the benefits that were discussed, which arise from common scenarios of people torrenting in the the wild. |
libtorrent version (or branch): RC_2_0 latest
When I try to generate torrent file from
torrent_info
obtained fromtorrent_handle
(i.e.torrent_handle::torrent_file()
) it either fails in case of "pure" v2 torrent or produces invalid file (with missingpiece layers
field) in case of hybrid torrent. The issue is caused by the following line:libtorrent/src/torrent.cpp
Line 7416 in abd51db
The text was updated successfully, but these errors were encountered: