Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Mistral Models #1050

Closed
Quentin-Anthony opened this issue Sep 29, 2023 · 8 comments
Closed

Support Mistral Models #1050

Quentin-Anthony opened this issue Sep 29, 2023 · 8 comments
Assignees
Labels
feature request New feature or request

Comments

@Quentin-Anthony
Copy link
Member

Mistral just released a nice 7B. Let's support loading it into gpt-neox.

@Quentin-Anthony Quentin-Anthony added the feature request New feature or request label Sep 29, 2023
@malteos
Copy link

malteos commented Oct 27, 2023

Is somebody already actively working on this?

@AIproj
Copy link
Contributor

AIproj commented Oct 27, 2023

Hello, yes I have the model implemented in the adding-mistral-0.1 branch of my fork, but I'm currently still testing it. The items left are:

  • adapt the convert_hf_to_sequential.py script, which currently does not support it.
  • adapt the convert_to_hf.py script, which currently does not support it.
  • eval the model, converted model to neox, and the converted model to neox converted back to hf to ensure nothing broke.
  • train on some dummy task.

@StellaAthena
Copy link
Member

@AIproj Sounds great! Good progress :)

@StellaAthena
Copy link
Member

@AIproj any updates on this?

@AIproj
Copy link
Contributor

AIproj commented Nov 26, 2023

Yep, had a meeting today with @haileyschoelkopf to figure out some bugs and test training. By the way one of the bugs we ran into has a PR needing merging in the DeeperSpeed repo, link. We're meeting again tomorrow hopefully to wrap this up ASAP.

@AIproj
Copy link
Contributor

AIproj commented Nov 27, 2023

Training works. Current issues are revolving around lm-eval on a neox model (haven't converted to hf yet), since I'm using DS 0.12 based DeeperSpeed and it seems some things broke.

To give more details, some attributes like self.model.is_pipe_parallel, self.model.is_data_parallel or self.model.micro_batches have been moved to the PipelineEngine class, which inherits from the DeepSpeedEngine class, but (I am guessing that) since I'm using pp=0, the model gets initialised as DeepSpeedEngine, leading to errors. I found other workarounds for the 2 first ones, but haven't given much thought to the 3rd one. Will need to think of how to best preserve retrocompatibility (e.g. forcing gpt-neox to initialise self.model as PipelineEngine even with pp=0, since PipelineEngine is supposed to check for that in deepspeed's engine.py:

        self.is_pipe_parallel = self.grid.pipe_parallel_size > 1
        self.is_data_parallel = self.grid.data_parallel_size > 1
        self.is_model_parallel = self.grid.model_parallel_size > 1

I had no error during training since training doesn't access the self.model attributes I mentioned. It's really gpt-neox/eval_tasks/eval_adapter.py.

@StellaAthena
Copy link
Member

Training works. Current issues are revolving around lm-eval on a neox model (haven't converted to hf yet), since I'm using DS 0.12 based DeeperSpeed and it seems some things broke.

To give more details, some attributes like self.model.is_pipe_parallel, self.model.is_data_parallel or self.model.micro_batches have been moved to the PipelineEngine class, which inherits from the DeepSpeedEngine class, but (I am guessing that) since I'm using pp=0, the model gets initialised as DeepSpeedEngine, leading to errors. I found other workarounds for the 2 first ones, but haven't given much thought to the 3rd one. Will need to think of how to best preserve retrocompatibility (e.g. forcing gpt-neox to initialise self.model as PipelineEngine even with pp=0, since PipelineEngine is supposed to check for that in deepspeed's engine.py:

        self.is_pipe_parallel = self.grid.pipe_parallel_size > 1
        self.is_data_parallel = self.grid.data_parallel_size > 1
        self.is_model_parallel = self.grid.model_parallel_size > 1

I had no error during training since training doesn't access the self.model attributes I mentioned. It's really gpt-neox/eval_tasks/eval_adapter.py.

We have updated main to be compatible with the latest eval harness version.

@haileyschoelkopf
Copy link
Contributor

Closed by #1131 , which allows for Mistral-7b-v0.1, and instruct versions 0.1 and 0.2 to be converted from meta / Mistral distributed weights format, trained in NeoX and exported to HF.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants