Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add checksum for data from huggingface #133

Merged
merged 8 commits into from
Nov 8, 2023

Conversation

segyges
Copy link
Contributor

@segyges segyges commented Nov 8, 2023

Adds a scraper to go get parity data (which might be extra) and a parity check script to verify downloads. Had a bad git lfs pull and didn't want to have to debug anything else related to it.

Could maybe use some cleanup so that it doesn't assume folder structure as much and maybe doesn't really need the scraper to be included.

@segyges segyges changed the title Add parity check for data from huggingface Add checksum for data from huggingface Nov 8, 2023
Adds trailing space
Copy link
Collaborator

@haileyschoelkopf haileyschoelkopf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much for this! Will approve once I check hash of the unsharded deduped file

@haileyschoelkopf
Copy link
Collaborator

Thanks a bunch for the contribution!

@haileyschoelkopf haileyschoelkopf merged commit 0c3f210 into EleutherAI:main Nov 8, 2023
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants