Skip to content

Commit

Permalink
Update gpt2_dataset.py (#974)
Browse files Browse the repository at this point in the history
* Update gpt2_dataset.py

* Update NeoXArgs docs automatically

---------

Co-authored-by: github-actions <[email protected]>
  • Loading branch information
StellaAthena and github-actions committed Jun 22, 2023
1 parent cfce548 commit 2534e3d
Show file tree
Hide file tree
Showing 2 changed files with 4 additions and 3 deletions.
5 changes: 3 additions & 2 deletions configs/neox_arguments.md
Original file line number Diff line number Diff line change
Expand Up @@ -111,7 +111,7 @@ Logging Arguments

- **git_hash**: str

Default = 07da9fc
Default = 6b22c39

current git hash of repository

Expand Down Expand Up @@ -1061,11 +1061,12 @@ Training Arguments
List of paths to train datasets.



- **label_data_paths**: list

Default = None

List of paths to label datasets (should be fully in sync with train data, not shifted by 1!).
List of paths to label datasets (not shifted by 1 yet!).



Expand Down
2 changes: 1 addition & 1 deletion megatron/data/gpt2_dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,7 @@ def __init__(
self.shuffle_idx_len = self.shuffle_idx.shape[0] - 1
self.sample_idx_len = self.sample_idx.shape[0] - 1

if self.shuffle_idx_len != self.sample_idx_len:
if self.shuffle_idx_len != self.sample_idx_len - 1:
print(
f"WARNING: shuffle index length ({self.shuffle_idx_len}) is not equal to sample index length ({self.sample_idx_len})"
)
Expand Down

0 comments on commit 2534e3d

Please sign in to comment.