pythia ckpts in hf config vocab_size(50304, 70m ckpt) and tokenizer.json(50257) are mismatch #65

zhangyichang · 2023-01-31T09:56:11Z

I modified the https://github.com/EleutherAI/lm-evaluation-harness/blob/f9eca2c8160be8c20ecc956b7ff545f880160d0e/lm_eval/models/gpt2.py#L50
add transformers.GPTNeoXTokenizerFast,

command is:
python main.py --model gpt2 --model_args pretrained=/work/lm-evaluation-harness/ckpts/pythia-70m/step143000/models--EleutherAI--pythia-70m/snapshots/1c607732430c35e6387a86528d857887e87cae1f --tasks lambada_openai,hellaswag --device 1

traceback is:

Running loglikelihood requests
0%| | 8/45296 [00:01<1:46:22, 7.10it/s]
Traceback (most recent call last):
File "/work/lm-evaluation-harness/main.py", line 108, in
main()
File "/work/lm-evaluation-harness/main.py", line 79, in main
results = evaluator.simple_evaluate(
File "/work/lm-evaluation-harness/lm_eval/utils.py", line 161, in _wrapper
return fn(*args, **kwargs)
File "/work/lm-evaluation-harness/lm_eval/evaluator.py", line 86, in simple_evaluate
results = evaluate(
File "/work/lm-evaluation-harness/lm_eval/utils.py", line 161, in _wrapper
return fn(*args, **kwargs)
File "/work/lm-evaluation-harness/lm_eval/evaluator.py", line 247, in evaluate
resps = getattr(lm, reqtype)([req.args for req in reqs])
File "/work/lm-evaluation-harness/lm_eval/base.py", line 820, in fn
rem_res = getattr(self.lm, attr)(remaining_reqs)
File "/work/lm-evaluation-harness/lm_eval/base.py", line 185, in loglikelihood
return self._loglikelihood_tokens(new_reqs)
File "/work/lm-evaluation-harness/lm_eval/base.py", line 317, in _loglikelihood_tokens
logits = torch.gather(logits, 2, cont_toks.unsqueeze(-1)).squeeze(
RuntimeError: index 50276 is out of bounds for dimension 2 with size 50257

haileyschoelkopf · 2023-02-04T20:28:54Z

Hi! I believe this is a bug in the evaluation harness.

if you replace the following function (https://github.com/EleutherAI/lm-evaluation-harness/blob/f9eca2c8160be8c20ecc956b7ff545f880160d0e/lm_eval/models/gpt2.py#L121)

    def _model_call(self, inps):
        """
        inps: a torch tensor of shape [batch, sequence]
        the size of sequence may vary from call to call
        returns: a torch tensor of shape [batch, sequence, vocab] with the
        logits returned from the model
        """
        with torch.no_grad():
            return self.gpt2(inps)[0][:, :, :50257]

With this instead:

   def _model_call(self, inps):
        """
        inps: a torch tensor of shape [batch, sequence]
        the size of sequence may vary from call to call
        returns: a torch tensor of shape [batch, sequence, vocab] with the
        logits returned from the model
        """
        with torch.no_grad():
            return self.gpt2(inps)[0]

All should work! cc @jon-tow

StellaAthena · 2023-02-06T00:23:55Z

The eval harness has been patched, so this should work fine now.

jon-tow mentioned this issue Feb 4, 2023

hotfix(gpt2): Remove vocab-size logits slice EleutherAI/lm-evaluation-harness#384

Merged

StellaAthena closed this as completed Feb 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pythia ckpts in hf config vocab_size(50304, 70m ckpt) and tokenizer.json(50257) are mismatch #65

pythia ckpts in hf config vocab_size(50304, 70m ckpt) and tokenizer.json(50257) are mismatch #65

zhangyichang commented Jan 31, 2023

haileyschoelkopf commented Feb 4, 2023 •

edited

Loading

StellaAthena commented Feb 6, 2023

pythia ckpts in hf config vocab_size(50304, 70m ckpt) and tokenizer.json(50257) are mismatch #65

pythia ckpts in hf config vocab_size(50304, 70m ckpt) and tokenizer.json(50257) are mismatch #65

Comments

zhangyichang commented Jan 31, 2023

haileyschoelkopf commented Feb 4, 2023 • edited Loading

StellaAthena commented Feb 6, 2023

haileyschoelkopf commented Feb 4, 2023 •

edited

Loading