-
-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pikpak: implement custom hash to replace wrong sha1 #7905
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the hash is quite right yet - or there is something I don't undersdand!
Please see inline.
Thank you :-)
backend/pikpak/pikpak.go
Outdated
@@ -128,6 +129,7 @@ func pikpakAuthorize(ctx context.Context, opt *Options, name string, m configmap | |||
|
|||
// Register with Fs | |||
func init() { | |||
pphashType = hash.RegisterHash("pikpak", "PikPakHash", 40, sha1.New) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just the same as hash.SHA1
isn't it?
In order for hash checking to work with I think you need to implement hash.Hash
here and pass that in instead for sha1.New
.
There are some examples in the rclone code
Either that or I don't understand what is going on here!
This is probably working becuase rclone will choose MD5
by default.
Check the hashes match properly with rclone hashsum
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the quick review! You're absolutely right, I'm having some difficulty registering a new hash type that requires a hash.New()
.
In the calcGcid()
function, Pikpak's custom hash algorithm needs an additional variable, the file size, to calculate the size of each block within the file. While the Dropbox hash seems like the closest comparison in this case, it uses a fixed value of 4MB for bytesPerBlock
.
My question is, how can I modify the custom hasher to be compatible with hash.RegisterHash()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hashes don't take a length function so short answer would be that you can't :-(
You could read all the data in your hash function before choosing the block size - that doesn't sound like a good solution to me though.
However I see that there are only 8 possible block sizes from 0x40000 to 0x200000 so you could register 8 hash functions but I don't think this will work well with the rclone core as it won't know which hash function to choose.
We've thought about size dependent hashes before when thinking about multipart Etags for S3. We decided not to use them though.
I don't think this is possible with rclone's hashing framework as it stands today :-(
Looking at the code for pikpak I see we have an MD5 which seems to work, so I'd say we don't need the support for the pikpak hash in the rclone core as MD5 is very good for detecting errors.
What is this hash used for? If it is just a content hash then I think we should probably remove the broken sha1 implementation from pikpak and just use MD5. It may have other uses I'm not aware of though.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see the revised version and the comment.
As suggested, we have opted to make |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks fine now :-)
Happy for you to squash to one commit and merge when it passes the CI and the integration tests.
Unfortunately, many fails in the integration tests. This might be a breaking change to |
2024/06/20 00:25:49 PASS: All tests finished OK in 6m14.431740163s Merging it now! |
What is the purpose of this change?
This PR improves PikPak's file integrity verification by implementing a custom hash function named
gcid
and replacing the previously used SHA-1 hash.Was the change discussed in an issue or in the forum before?
Fixes #7838
Checklist