Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add dynamic dataset for processing/tokenizing examples lazily #46

Closed
wants to merge 7 commits into from
Closed
Prev Previous commit
Next Next commit
add exception
  • Loading branch information
trisongz committed Jan 12, 2021
commit abb5e46909e1886d110f13e6b14cc2f949ee0287
4 changes: 2 additions & 2 deletions gpt_neox/datasets.py
Original file line number Diff line number Diff line change
Expand Up @@ -229,7 +229,7 @@ def tokenize_example(self, ex):
if len(out) < self.max_seq_len:
_to_pad = self.max_seq_len - len(out)
out.extend([self.pad_token for i in range(_to_pad)])
if tokenized['overflowing_tokens']:
if tokenized.get('overflowing_tokens', None):
self.token_cache = tokenized['overflowing_tokens'].append(self.sep_token)

else:
Expand All @@ -238,7 +238,7 @@ def tokenize_example(self, ex):
if len(out) < self.max_seq_len:
_to_pad = self.max_seq_len - len(out)
out.extend([self.pad_token for i in range(_to_pad)])
if tokenized['overflowing_tokens']:
if tokenized.get('overflowing_tokens', None):
self.token_cache = tokenized['overflowing_tokens'].append(self.sep_token)

return torch.tensor(out, dtype=torch.long)
Expand Down