Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when running unshard_memmap.py #114

Closed
ShaneeyS opened this issue Jul 31, 2023 · 2 comments
Closed

Error when running unshard_memmap.py #114

ShaneeyS opened this issue Jul 31, 2023 · 2 comments

Comments

@ShaneeyS
Copy link

Hi, when i try to run the following command:

python utils/unshard_memmap.py --input_file ./pythia_deduped_pile_idxmaps/pile_0.87_deduped_text_document-00000-of-00082.bin --num_shards 83 --output_dir ./pythia_pile_idxmaps/

an error always raises:

pythia_deduped_pile_idxmaps/pile_0.87_deduped_text_document-00023-of-00082.bin
29%|?????????????????????????????????????????????????????????? | 24/83 [6:09:46<15:01:09, 916.43s/it]pythia_deduped_pile_idxmaps/pile_0.87_deduped_text_document-00024-of-00082.bin
30%|????????????????????????????????????????????????????????????? | 25/83 [6:25:14<14:49:06, 919.76s/it]pythia_deduped_pile_idxmaps/pile_0.87_deduped_text_document-00025-of-00082.bin
31%|??????????????????????????????????????????????????????????????? | 26/83 [6:40:51<14:38:46, 925.03s/it]pythia_deduped_pile_idxmaps/pile_0.87_deduped_text_document-00026-of-00082.bin
33%|?????????????????????????????????????????????????????????????????? | 27/83 [6:56:36<14:28:56, 931.02s/it]pythia_deduped_pile_idxmaps/pile_0.87_deduped_text_document-00027-of-00082.bin
34%|???????????????????????????????????????????????????????????????????? | 28/83 [7:12:14<14:15:21, 933.12s/it]pythia_deduped_pile_idxmaps/pile_0.87_deduped_text_document-00028-of-00082.bin
35%|??????????????????????????????????????????????????????????????????????? | 29/83 [7:28:12<14:06:25, 940.47s/it]pythia_deduped_pile_idxmaps/pile_0.87_deduped_text_document-00029-of-00082.bin
36%|????????????????????????????????????????????????????????????????????????? | 30/83 [7:44:13<13:56:16, 946.72s/it]pythia_deduped_pile_idxmaps/pile_0.87_deduped_text_document-00030-of-00082.bin
37%|??????????????????????????????????????????????????????????????????????????? | 31/83 [8:00:12<13:43:41, 950.42s/it]pythia_deduped_pile_idxmaps/pile_0.87_deduped_text_document-00031-of-00082.bin
Bus error (core dumped)

Could you please tell me how to solve this problem?

@Lisennlp
Copy link

Lisennlp commented Nov 3, 2023

I also encountered this problem, it was caused by insufficient hard disk space.

@StellaAthena
Copy link
Member

The dataset is very big and unpacking it requires even more space. We recommend using a drive with 2 TB of available space (the final product takes up about 1.6 TB).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants