Implement Generation / Eval with deepspeed model engine #58

sdtblck · 2021-01-13T00:24:48Z

currently Generation / Eval are happening with the pytorch model, not the model engine. This is already causing memory problems and won't allow us to scale up - we'll need to implement this with the deepspeed model engine.

StellaAthena · 2021-01-23T07:48:03Z

Where in the code does this happen?

srulikbd · 2021-01-24T17:01:50Z

at the end of train_enwik8.py for example-the commented code.
I can try doing it but I'm not an expert in deepspeed yet.
should we use those examples?
https://github.com/microsoft/DeepSpeedExamples/blob/master/Megatron-LM/evaluate_gpt2.py
https://github.com/microsoft/DeepSpeedExamples/blob/master/Megatron-LM/generate_samples.py

srulikbd · 2021-01-24T18:59:56Z

but actually I see that it's already changed on train.py:
if params.get("validate_every") is not None: if is_main and i % params["validate_every"] == 0: model_engine.eval() with torch.no_grad(): val_data = next(val_loader).cuda() loss = model_engine(val_data) pbar.write(f'Validation Loss: {loss.item()}')
but not on:

if params.get("generate_every") is not None: if is_main and i % params["generate_every"] == 0: model.eval() val_data = next(val_loader).cuda() inp = random.choice(val_data)[:-1] prime = tokenizer.decode(inp) pbar.write(f"{prime} \n\n {'*' * 100}") sample = model.generate(inp.cuda(), params["generate_length"]) output_str = tokenizer.decode(sample) pbar.write(output_str)

StellaAthena · 2021-01-24T21:21:21Z

Huh, funny oversight. Yeah, push a patch to the generate function and we’ll close this issue.

srulikbd · 2021-01-26T19:21:13Z

actually it's not possible to just change "model" to "model_engine" for generation.
is it implemented here
https://github.com/microsoft/DeepSpeedExamples/blob/master/Megatron-LM/generate_samples.py
to generate using multiple GPUs?

StellaAthena · 2021-02-15T01:02:14Z

Superseded by codebase refactoring.

StellaAthena added the feature request New feature or request label Jan 14, 2021

StellaAthena added this to To do in 1T or BUST via automation Jan 24, 2021

StellaAthena closed this as completed Feb 15, 2021

1T or BUST automation moved this from To do to Done Feb 15, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement Generation / Eval with deepspeed model engine #58

Implement Generation / Eval with deepspeed model engine #58

sdtblck commented Jan 13, 2021

StellaAthena commented Jan 23, 2021

srulikbd commented Jan 24, 2021 •

edited

Loading

srulikbd commented Jan 24, 2021 •

edited

Loading

StellaAthena commented Jan 24, 2021

srulikbd commented Jan 26, 2021

StellaAthena commented Feb 15, 2021

Implement Generation / Eval with deepspeed model engine #58

Implement Generation / Eval with deepspeed model engine #58

Comments

sdtblck commented Jan 13, 2021

StellaAthena commented Jan 23, 2021

srulikbd commented Jan 24, 2021 • edited Loading

srulikbd commented Jan 24, 2021 • edited Loading

StellaAthena commented Jan 24, 2021

srulikbd commented Jan 26, 2021

StellaAthena commented Feb 15, 2021

srulikbd commented Jan 24, 2021 •

edited

Loading

srulikbd commented Jan 24, 2021 •

edited

Loading