Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CLS] [SEP] bug #3

Closed
dzabraev opened this issue Nov 11, 2020 · 2 comments
Closed

[CLS] [SEP] bug #3

dzabraev opened this issue Nov 11, 2020 · 2 comments

Comments

@dzabraev
Copy link

if not hasattr(self.tokenizer, "cls_token_ids"):

The class transformers.tokenization_bert.BertTokenizer does not have cls_token_ids. Here must be cls_token_id
Due to this mistake all text examples do not have SEP and CLS tokens.

@gabeur
Copy link
Owner

gabeur commented Nov 13, 2020

Good catch, thanks!
Fixing this bug seems to improve the results a bit. I will re-run pretraining and finetuning experiments and update the repo once the checkpoints are ready.

@gabeur gabeur closed this as completed in e670f70 Dec 3, 2020
@gabeur
Copy link
Owner

gabeur commented Dec 3, 2020

I have pushed a fix, re-run the experiments and updated the checkpoints accordingly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants