Skip to content

Commit

Permalink
Remove the NeoX implementation of GPT2Tokenizer (#1042)
Browse files Browse the repository at this point in the history
* Try out just using the HF implementation

Signed-off-by: Dashiell Stander <[email protected]>

* Rely solely on HF tokenizer.

Signed-off-by: Dashiell Stander <[email protected]>

* Update NeoXArgs docs automatically

---------

Signed-off-by: Dashiell Stander <[email protected]>
Co-authored-by: github-actions <[email protected]>
  • Loading branch information
dashstander and github-actions committed Sep 25, 2023
1 parent e431ff5 commit 2ab05be
Show file tree
Hide file tree
Showing 3 changed files with 1 addition and 370 deletions.
2 changes: 1 addition & 1 deletion configs/neox_arguments.md
Original file line number Diff line number Diff line change
Expand Up @@ -111,7 +111,7 @@ Logging Arguments

- **git_hash**: str

Default = 1f832c1
Default = 3f8c63c

current git hash of repository

Expand Down
368 changes: 0 additions & 368 deletions megatron/tokenizer/gpt2_tokenization.py

This file was deleted.

1 change: 0 additions & 1 deletion megatron/tokenizer/tokenizer.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,6 @@
import numpy as np
import sentencepiece as spm
from typing import List, Union
from .gpt2_tokenization import GPT2Tokenizer


def build_tokenizer(args):
Expand Down

0 comments on commit 2ab05be

Please sign in to comment.