Skip to content

Commit

Permalink
Merge branch 'master' of github.com:google-research/deduplicate-text-…
Browse files Browse the repository at this point in the history
…datasets
  • Loading branch information
carlini committed May 1, 2022
2 parents 0008e61 + 2407ab0 commit 6ba5f99
Show file tree
Hide file tree
Showing 3 changed files with 3 additions and 3 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -249,7 +249,7 @@ Okay so maybe you don't like reading. You skipped the entire section above. (Hon
Then just do this

```
bash scripts/scripts/run_pipeline.sh
bash scripts/run_pipeline.sh
python3 scripts/finish_dedup_wiki40b.py --data_dir ~/tensorflow_datasets/ --save_dir /tmp/dedup --name wiki40b --split test --suffixarray_dir data --remove /tmp/wiki40b.test.remove.byterange
```

Expand Down
2 changes: 1 addition & 1 deletion scripts/finish_dedup_wiki40b.py
Original file line number Diff line number Diff line change
Expand Up @@ -110,7 +110,7 @@ def _generate_examples(self, split):
data_dir=args.data_dir)


p = mp.Pool(96)
p = mp.get_context("fork").Pool(mp.cpu_count())
i = -1
for batch in ds:
i += 1
Expand Down
2 changes: 1 addition & 1 deletion scripts/load_dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -74,7 +74,7 @@ def tok(x):

fout = open(os.path.join(save_dir, dataset_name+"."+split), "wb")

with mp.Pool(mp.cpu_count()) as p:
with mp.get_context("fork").Pool(mp.cpu_count()) as p:
i = 0
sizes = [0]
for b in ds:
Expand Down

0 comments on commit 6ba5f99

Please sign in to comment.