Weights of "step0" and "step1" checkpoints are identical for all pythia models #83

byungdoh · 2023-04-05T15:16:22Z

Dear EleutherAI team,

I've noticed that the weights associated with the recently added "step0" and "step1" checkpoints are identical for all pythia models:

def main():
    print(f"========== {sys.argv[1]} ==========")
    model_step0 = GPTNeoXForCausalLM.from_pretrained(sys.argv[1], revision="step0", cache_dir=f"./test")
    model_step1 = GPTNeoXForCausalLM.from_pretrained(sys.argv[1], revision="step1", cache_dir=f"./test")

    for (name0, param0), (name1, param1) in zip(model_step0.named_parameters(), model_step1.named_parameters()):
        print(name0, name1, name0 == name1, torch.all(param0==param1))

This yields something like the following for all eight pythia models:

========== EleutherAI/pythia-70m ==========
gpt_neox.embed_in.weight gpt_neox.embed_in.weight True tensor(True)
gpt_neox.layers.0.input_layernorm.weight gpt_neox.layers.0.input_layernorm.weight True tensor(True)
...
gpt_neox.final_layer_norm.weight gpt_neox.final_layer_norm.weight True tensor(True)
gpt_neox.final_layer_norm.bias gpt_neox.final_layer_norm.bias True tensor(True)
embed_out.weight embed_out.weight True tensor(True)

Would it be possible for you to clarify whether these identical weights correspond to those from "step0" or "step1?" I've noticed that the conditional probabilities calculated using these weights aren't perfectly uniform, which leads me to believe these are actually weights from "step1."

Thanks!
Byung-Doh

The text was updated successfully, but these errors were encountered:

haileyschoelkopf · 2023-04-05T20:22:54Z

Hi, thanks very much for reporting this! I'll look into it and get back to you as soon as I'm able.

StellaAthena · 2023-04-18T16:36:06Z

@haileyschoelkopf did you end up looking into this?

haileyschoelkopf · 2023-04-19T03:14:31Z

I have not yet unfortunately, I'll look at this tomorrow and report back!

StellaAthena · 2023-04-19T19:59:25Z

Looking around the checkpointing code, it looks to me like we should be saving the 0th checkpoint before we do any weight updates. That's the obvious failure mode that could be causing this.

haileyschoelkopf · 2023-04-20T00:02:47Z

Continuing to investigate, but upon digging in I'm finding that the info reported by Deepspeed's checkpoint metadata for the NeoX-library checkpoints reports that all is ok! For the EleutherAI/pythia-160m model, the step0 checkpoint reports global_samples: 0 and global_steps: 0 while the step1 checkpoint reports global_samples: 1024 and global_steps: 1.

I therefore suspect that this is an artifact of LR warmup starting from 0, causing weights to not yet update on the first step, but am looking into this further. On a scan of a couple parameters in a layer of the 160M model, many (but not all) of the individual floating point parameters printed as the same for step as for step 0, indicating that some parameters will show up as equal to the step 0 checkpoint even after multiple train steps for these super early warmup steps.

I'm therefore pretty confident I did in fact save and upload the correct early checkpoints.

Hope this answers your question @byungdoh !

(Aside:

Note that there was an issue in which "step0" checkpoints in NeoX would be overwritten if a job was resumed by the step being resumed from, but that issue was patched before these models were trained.)

byungdoh · 2023-04-20T12:53:28Z

I think it is indeed because the learning rate is 0.0 at the first step, as self.num_iters (and thereby num_iters_) here is initialized from 0. Thank you both @haileyschoelkopf @StellaAthena !

byungdoh closed this as completed Apr 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Weights of "step0" and "step1" checkpoints are identical for all pythia models #83

Weights of "step0" and "step1" checkpoints are identical for all pythia models #83

byungdoh commented Apr 5, 2023 •

edited

Loading

haileyschoelkopf commented Apr 5, 2023

StellaAthena commented Apr 18, 2023

haileyschoelkopf commented Apr 19, 2023

StellaAthena commented Apr 19, 2023

haileyschoelkopf commented Apr 20, 2023

byungdoh commented Apr 20, 2023

Weights of "step0" and "step1" checkpoints are identical for all pythia models #83

Weights of "step0" and "step1" checkpoints are identical for all pythia models #83

Comments

byungdoh commented Apr 5, 2023 • edited Loading

haileyschoelkopf commented Apr 5, 2023

StellaAthena commented Apr 18, 2023

haileyschoelkopf commented Apr 19, 2023

StellaAthena commented Apr 19, 2023

haileyschoelkopf commented Apr 20, 2023

byungdoh commented Apr 20, 2023

byungdoh commented Apr 5, 2023 •

edited

Loading