Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compute checksum in checks #32

Merged
merged 32 commits into from
Nov 22, 2022
Merged

Compute checksum in checks #32

merged 32 commits into from
Nov 22, 2022

Conversation

maudetes
Copy link
Contributor

@maudetes maudetes commented Nov 16, 2022

Close #26

Compute checksum when downloading file.

Also add some enhancements:

  • update resource url in catalog on resource modification
  • store mime type, checksum, additional errors and filesize in checks

⚠️ You should reinitialize your DBs with the new columns using udata-hydra init-db --drop

@maudetes maudetes changed the base branch from main to Kafka-eventectomie November 16, 2022 17:43
udata_hydra/config.py Outdated Show resolved Hide resolved
Base automatically changed from Kafka-eventectomie to main November 21, 2022 17:36
@maudetes maudetes changed the title WIP Compute checksum in checks Compute checksum in checks Nov 21, 2022
@maudetes maudetes requested a review from abulte November 22, 2022 09:19
udata_hydra/app.py Outdated Show resolved Hide resolved
udata_hydra/crawl.py Outdated Show resolved Hide resolved
udata_hydra/datalake_service.py Outdated Show resolved Hide resolved
udata_hydra/datalake_service.py Outdated Show resolved Hide resolved
udata_hydra/datalake_service.py Outdated Show resolved Hide resolved
udata_hydra/datalake_service.py Outdated Show resolved Hide resolved
udata_hydra/datalake_service.py Outdated Show resolved Hide resolved
udata_hydra/datalake_service.py Outdated Show resolved Hide resolved
udata_hydra/datalake_service.py Outdated Show resolved Hide resolved
udata_hydra/datalake_service.py Outdated Show resolved Hide resolved
@maudetes maudetes requested a review from abulte November 22, 2022 11:23
Copy link
Contributor

@abulte abulte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should use MD5 instead of SHA1. Thus we could leverage Content-MD5 header https://www.rfc-editor.org/rfc/rfc1864.html. But I don't think it's widely used and we can iterate on that later.

@maudetes
Copy link
Contributor Author

I wonder if we should use MD5 instead of SHA1. Thus we could leverage Content-MD5 header https://www.rfc-editor.org/rfc/rfc1864.html. But I don't think it's widely used and we can iterate on that later.

Indeed. We had chosen sha1 because it is the default one in udata -> don't know if we would want to compare with values in udata. Let's merge this and iterate when we improve content modification detection!

@maudetes maudetes merged commit 37b3772 into main Nov 22, 2022
@maudetes maudetes deleted the feat/add-checksum branch November 22, 2022 14:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Rentrer dans le code d'hydra pour évaluer les modifications futures
3 participants