Skip to content

Commit

Permalink
Print name when sampling
Browse files Browse the repository at this point in the history
  • Loading branch information
leogao2 committed Nov 10, 2020
1 parent 07edce4 commit 0776650
Showing 1 changed file with 1 addition and 0 deletions.
1 change: 1 addition & 0 deletions the_pile/pile.py
Original file line number Diff line number Diff line change
Expand Up @@ -315,6 +315,7 @@ def lang_stats(pile):
def sample_from_sets(datasets, n_docs):
random.seed(42)
for dset, _ in datasets:
print(dset.name())
fname = 'dataset_samples/{}.json'.format(dset.name().replace(' ', '_'))
if os.path.exists(fname): continue

Expand Down

0 comments on commit 0776650

Please sign in to comment.