Running on a single GPU #612

huey2531 · 2022-04-24T01:14:35Z

tried merging the checkpoints as described for single GPU
python tools/merge20b.py --input_dir ./20B_checkpoints --output_dir ./20B_checkpoints_merged

However Im getting this error when generating
RuntimeError: Error(s) in loading state_dict for EmbeddingPipe:
size mismatch for word_embeddings.weight: copying a param with shape torch.Size([50432, 6144]) from checkpoint, the shape in current model is torch.Size([50304, 6144]).

How can I adjust to make the current model match size 50432? or is it the other way around?

StellaAthena · 2022-04-24T01:29:03Z

@zphang

StellaAthena · 2022-04-24T01:39:34Z

@huey2531 Does #613 solve your problem?

huey2531 · 2022-04-24T02:00:39Z

@StellaAthena it does not. had to uncomment those lines or it won't even merge the layer checkpoints

huey2531 · 2022-04-24T02:11:00Z

before running merge20b.py the error was

RuntimeError: Error(s) in loading state_dict for EmbeddingPipe:
size mismatch for word_embeddings.weight: copying a param with shape torch.Size([25216, 6144]) from checkpoint, the shape in current model is torch.Size([50304, 6144]).

HughPH · 2022-05-05T10:37:08Z

I would suggest redownloading the slim weights into a new directory, to be sure that you're starting from a known point.

StellaAthena · 2022-05-06T02:51:26Z

I’ve been trying to reproduce your issue and failing… I concur with @HughPH that it’s probably worth deleting everything and starting again.

HughPH · 2022-05-11T15:18:30Z

@huey2531 Did you make any progress with this?

huey2531 · 2022-05-17T06:07:25Z

ok will delete everything and start from scratch

StellaAthena · 2022-05-17T13:52:08Z

@huey2531 I can confirm that another individual got this running last week without hitting that error.

HughPH · 2022-05-24T22:06:56Z

@huey2531 Did you get it working?

igor0 · 2022-05-25T20:09:16Z

This looks like a tokenizer mismatch to me:

Checkpoint assumes 50432 tokens
Model assumes 50304 tokens

Do you have the right tokenizer for the 20B model configured when running generate.py? Should be something like this

    "tokenizer_type": "HFTokenizer",
    "vocab-file": "/mnt/data/20B_tokenizer.json",

HughPH · 2022-06-07T18:01:30Z

@huey2531 are you still trying to get this going?

@StellaAthena It's been 3 weeks, I'd suggest this could probably be closed if another week passes without activity.

StellaAthena · 2022-06-07T18:21:09Z

Closing due to inactivity.

huey2531 · 2022-06-08T17:01:04Z

I'm still working on this. stuck on some dependency issues and need to reinstall the OS...

huey2531 · 2022-06-08T18:20:00Z

This looks like a tokenizer mismatch to me:

Checkpoint assumes 50432 tokens

Model assumes 50304 tokens

Do you have the right tokenizer for the 20B model configured when running generate.py? Should be something like this
    "tokenizer_type": "HFTokenizer",
    "vocab-file": "/mnt/data/20B_tokenizer.json",

yes, i have the correct tokenizer

zphang · 2022-06-08T18:22:53Z

Do you have make_vocab_size_divisible_by set to 50432 in the config?

huey2531 · 2022-06-08T18:36:51Z

Do you have make_vocab_size_divisible_by set to 50432 in the config?

probably not. I simply changed the path of my original config to 20B_checkpoints_merged
will test it after I reinstall the OS

huey2531 · 2022-06-08T22:54:31Z

now I'm stuck at #628 during installation. did not encounter this issue a few weeks ago.

StellaAthena · 2022-06-08T22:55:39Z

@huey2531 This seems to be something that broke recently inside of Triton. I can't install on a fresh machine but my previously existing implementations (from a couple weeks ago) work fine lol

huey2531 · 2022-06-09T20:18:34Z

@StellaAthena what version is your Triton? pip show triton
what does it say?

StellaAthena · 2022-06-09T21:20:22Z

It says

Name: triton
Version: 1.0.0
Summary: A language and compiler for custom Deep Learning operations
Home-page: https://github.com/ptillet/triton/
Author: Philippe Tillet
Author-email: [email protected]
License: UNKNOWN
Location: /home/mchorse/.local/lib/python3.8/site-packages
Requires: torch
Required-by: deepspeed

StellaAthena · 2022-07-18T12:07:47Z

Closing due to inactivity

huey2531 added the bug Something isn't working label Apr 24, 2022

StellaAthena closed this as completed Jun 7, 2022

StellaAthena reopened this Jun 8, 2022

StellaAthena closed this as completed Jul 18, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Running on a single GPU #612

Running on a single GPU #612

huey2531 commented Apr 24, 2022

StellaAthena commented Apr 24, 2022

StellaAthena commented Apr 24, 2022

huey2531 commented Apr 24, 2022

huey2531 commented Apr 24, 2022

HughPH commented May 5, 2022

StellaAthena commented May 6, 2022

HughPH commented May 11, 2022

huey2531 commented May 17, 2022

StellaAthena commented May 17, 2022

HughPH commented May 24, 2022

igor0 commented May 25, 2022 •

edited

Loading

HughPH commented Jun 7, 2022 •

edited

Loading

StellaAthena commented Jun 7, 2022

huey2531 commented Jun 8, 2022

huey2531 commented Jun 8, 2022

zphang commented Jun 8, 2022

huey2531 commented Jun 8, 2022

huey2531 commented Jun 8, 2022

StellaAthena commented Jun 8, 2022

huey2531 commented Jun 9, 2022

StellaAthena commented Jun 9, 2022

StellaAthena commented Jul 18, 2022

Running on a single GPU #612

Running on a single GPU #612

Comments

huey2531 commented Apr 24, 2022

StellaAthena commented Apr 24, 2022

StellaAthena commented Apr 24, 2022

huey2531 commented Apr 24, 2022

huey2531 commented Apr 24, 2022

HughPH commented May 5, 2022

StellaAthena commented May 6, 2022

HughPH commented May 11, 2022

huey2531 commented May 17, 2022

StellaAthena commented May 17, 2022

HughPH commented May 24, 2022

igor0 commented May 25, 2022 • edited Loading

HughPH commented Jun 7, 2022 • edited Loading

StellaAthena commented Jun 7, 2022

huey2531 commented Jun 8, 2022

huey2531 commented Jun 8, 2022

zphang commented Jun 8, 2022

huey2531 commented Jun 8, 2022

huey2531 commented Jun 8, 2022

StellaAthena commented Jun 8, 2022

huey2531 commented Jun 9, 2022

StellaAthena commented Jun 9, 2022

StellaAthena commented Jul 18, 2022

igor0 commented May 25, 2022 •

edited

Loading

HughPH commented Jun 7, 2022 •

edited

Loading