Support Mistral Models #1050

Quentin-Anthony · 2023-09-29T20:03:07Z

Mistral just released a nice 7B. Let's support loading it into gpt-neox.

malteos · 2023-10-27T09:54:57Z

Is somebody already actively working on this?

AIproj · 2023-10-27T16:50:09Z

Hello, yes I have the model implemented in the adding-mistral-0.1 branch of my fork, but I'm currently still testing it. The items left are:

adapt the convert_hf_to_sequential.py script, which currently does not support it.
adapt the convert_to_hf.py script, which currently does not support it.
eval the model, converted model to neox, and the converted model to neox converted back to hf to ensure nothing broke.
train on some dummy task.

StellaAthena · 2023-10-28T01:09:36Z

@AIproj Sounds great! Good progress :)

StellaAthena · 2023-11-25T23:04:29Z

@AIproj any updates on this?

AIproj · 2023-11-26T00:58:10Z

Yep, had a meeting today with @haileyschoelkopf to figure out some bugs and test training. By the way one of the bugs we ran into has a PR needing merging in the DeeperSpeed repo, link. We're meeting again tomorrow hopefully to wrap this up ASAP.

AIproj · 2023-11-27T07:54:37Z

Training works. Current issues are revolving around lm-eval on a neox model (haven't converted to hf yet), since I'm using DS 0.12 based DeeperSpeed and it seems some things broke.

To give more details, some attributes like self.model.is_pipe_parallel, self.model.is_data_parallel or self.model.micro_batches have been moved to the PipelineEngine class, which inherits from the DeepSpeedEngine class, but (I am guessing that) since I'm using pp=0, the model gets initialised as DeepSpeedEngine, leading to errors. I found other workarounds for the 2 first ones, but haven't given much thought to the 3rd one. Will need to think of how to best preserve retrocompatibility (e.g. forcing gpt-neox to initialise self.model as PipelineEngine even with pp=0, since PipelineEngine is supposed to check for that in deepspeed's engine.py:

        self.is_pipe_parallel = self.grid.pipe_parallel_size > 1
        self.is_data_parallel = self.grid.data_parallel_size > 1
        self.is_model_parallel = self.grid.model_parallel_size > 1

I had no error during training since training doesn't access the self.model attributes I mentioned. It's really gpt-neox/eval_tasks/eval_adapter.py.

StellaAthena · 2024-01-08T03:29:35Z

Training works. Current issues are revolving around lm-eval on a neox model (haven't converted to hf yet), since I'm using DS 0.12 based DeeperSpeed and it seems some things broke.

To give more details, some attributes like self.model.is_pipe_parallel, self.model.is_data_parallel or self.model.micro_batches have been moved to the PipelineEngine class, which inherits from the DeepSpeedEngine class, but (I am guessing that) since I'm using pp=0, the model gets initialised as DeepSpeedEngine, leading to errors. I found other workarounds for the 2 first ones, but haven't given much thought to the 3rd one. Will need to think of how to best preserve retrocompatibility (e.g. forcing gpt-neox to initialise self.model as PipelineEngine even with pp=0, since PipelineEngine is supposed to check for that in deepspeed's engine.py:
        self.is_pipe_parallel = self.grid.pipe_parallel_size > 1
        self.is_data_parallel = self.grid.data_parallel_size > 1
        self.is_model_parallel = self.grid.model_parallel_size > 1
I had no error during training since training doesn't access the self.model attributes I mentioned. It's really gpt-neox/eval_tasks/eval_adapter.py.

We have updated main to be compatible with the latest eval harness version.

haileyschoelkopf · 2024-02-26T16:30:17Z

Closed by #1131 , which allows for Mistral-7b-v0.1, and instruct versions 0.1 and 0.2 to be converted from meta / Mistral distributed weights format, trained in NeoX and exported to HF.

Quentin-Anthony added the feature request New feature or request label Sep 29, 2023

Quentin-Anthony assigned AIproj Sep 29, 2023

haileyschoelkopf mentioned this issue Dec 24, 2023

Add Instructions for Loading Llama2 Models #1051

Closed

AIproj mentioned this issue Jan 25, 2024

Draft PR Adding mistral 0.1 #1131

Merged

haileyschoelkopf closed this as completed Feb 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support Mistral Models #1050

Support Mistral Models #1050

Quentin-Anthony commented Sep 29, 2023

malteos commented Oct 27, 2023

AIproj commented Oct 27, 2023 •

edited

StellaAthena commented Oct 28, 2023

StellaAthena commented Nov 25, 2023

AIproj commented Nov 26, 2023 •

edited

AIproj commented Nov 27, 2023 •

edited

StellaAthena commented Jan 8, 2024

haileyschoelkopf commented Feb 26, 2024

Support Mistral Models #1050

Support Mistral Models #1050

Comments

Quentin-Anthony commented Sep 29, 2023

malteos commented Oct 27, 2023

AIproj commented Oct 27, 2023 • edited

StellaAthena commented Oct 28, 2023

StellaAthena commented Nov 25, 2023

AIproj commented Nov 26, 2023 • edited

AIproj commented Nov 27, 2023 • edited

StellaAthena commented Jan 8, 2024

haileyschoelkopf commented Feb 26, 2024

AIproj commented Oct 27, 2023 •

edited

AIproj commented Nov 26, 2023 •

edited

AIproj commented Nov 27, 2023 •

edited