Skip to content

Commit

Permalink
Merge pull request EleutherAI#783 from Keith-Hon/patch-1
Browse files Browse the repository at this point in the history
Update README.md
  • Loading branch information
StellaAthena committed Feb 7, 2023
2 parents 26ef16d + c9bc330 commit e48b0c4
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -181,7 +181,7 @@ The tokenized data will be saved out to two files: `[data-dir]/[dataset-name]/[d

## Using Custom Data

To prepare your own dataset for training with custom data, format it as one large [jsonl](https://jsonlines.org/)-formatted file with each item in the list of dictionaries being a separate document. The document text should be grouped under one JSON key, i.e `"text"`. Any auxiliary data stored in other fields will not be
To prepare your own dataset for training with custom data, format it as one large [jsonl](https://jsonlines.org/)-formatted file with each item in the list of dictionaries being a separate document. The document text should be grouped under one JSON key, i.e `"text"`. Any auxiliary data stored in other fields will not be used.

Next make sure to download the GPT2 tokenizer vocab, and merge files from the following links:

Expand Down

0 comments on commit e48b0c4

Please sign in to comment.