Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preconfigured Datasets are not available #949

Closed
vogt31337 opened this issue May 23, 2023 · 3 comments
Closed

Preconfigured Datasets are not available #949

vogt31337 opened this issue May 23, 2023 · 3 comments
Labels
bug Something isn't working

Comments

@vogt31337
Copy link

Describe the bug
The step:
"Several preconfigured datasets are available, including most components from the Pile, as well as the Pile train set itself, for straightforward tokenization using the prepare_data.py entry point.

E.G, to download and tokenize the enwik8 dataset with the GPT2 Tokenizer, saving them to ./data you can run:

python prepare_data.py -d ./data

"
creates an Download error. https://data.deepai.org/enwik8.zip is down / not available

To Reproduce
Steps to reproduce the behavior:

  1. Fresh install / git clone
  2. python prepare_data.py -d ./data

Expected behavior
Getting a dataset.

Proposed solution
Maybe provide another source or another description how to find such a dataset.

Screenshots
If applicable, add screenshots to help explain your problem.

Environment (please complete the following information):

  • GPUs: 1
  • Configs: 20B.yml

Additional context
Add any other context about the problem here.

@vogt31337 vogt31337 added the bug Something isn't working label May 23, 2023
@vogt31337
Copy link
Author

wget delivers a certificate issue for data.deepai.org. The certificate has expired.

@daneren
Copy link

daneren commented May 24, 2023

I open the link through chrome to download

@StellaAthena
Copy link
Member

StellaAthena commented May 24, 2023

Clicking the link that you say is down downloads it for me as well. Perhaps they had a short outage?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants