Skip to content

Commit

Permalink
fix ZeroDivisionError while only one train data (EleutherAI#810)
Browse files Browse the repository at this point in the history
  • Loading branch information
HuangLK authored Mar 4, 2023
1 parent 192022f commit a5c2229
Showing 1 changed file with 4 additions and 1 deletion.
5 changes: 4 additions & 1 deletion megatron/data/data_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -249,7 +249,7 @@ def build_weighted_datasets(
return train_datasets, valid_datasets, test_datasets


def weights_by_num_docs(l, alpha=0.3):
def weights_by_num_docs(l: list, alpha=0.3):
"""
Builds weights from a multinomial distribution over groups of data according to the number of
samples in each group.
Expand All @@ -263,6 +263,9 @@ def weights_by_num_docs(l, alpha=0.3):
See https://arxiv.org/abs/1911.02116 for more details
"""
if len(l) == 1:
return [1.0]

total_n_docs = sum(l)
unbiased_sample_probs = [i / total_n_docs for i in l]

Expand Down

0 comments on commit a5c2229

Please sign in to comment.