Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove GUI and Update Docs #48

Merged
merged 4 commits into from
Apr 22, 2024
Merged

Remove GUI and Update Docs #48

merged 4 commits into from
Apr 22, 2024

Conversation

hamelsmu
Copy link
Contributor

This PR is in lieu of #43. I also am removing the GUI component per discussion in slack

This PR closes #45

(reopening this from a branch instead of a fork).

@hamelsmu hamelsmu mentioned this pull request Apr 22, 2024
@hamelsmu
Copy link
Contributor Author

hamelsmu commented Apr 22, 2024

Ok looks like an error with wandb ... 👀

EDIT: I see that already fixed this in #46

@hamelsmu hamelsmu requested a review from mwaskom April 22, 2024 06:09
@charlesfrye charlesfrye merged commit 1280fa1 into main Apr 22, 2024
5 checks passed
@mwaskom
Copy link
Collaborator

mwaskom commented Apr 22, 2024

Can we please not vary parameters across base models where we don't mean to communicate that the parameterizations are model specific?. i.e. if we're going to switch to deepspeed zero1 let's do it for all of the models not just mistral.

Copy link
Collaborator

@mwaskom mwaskom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some nits on the new readme text.

README.md Show resolved Hide resolved
README.md Show resolved Hide resolved
Some important caveats about the `train` command:

- The `--data` flag is used to pass your dataset to axolotl. This dataset is then written to the `datasets.path` as specified in your config file. If you alraedy have a dataset at `datasets.path`, you must be careful to also pass the same path to `--data` to ensure the dataset is correctly loaded.
- Unlike axolotl, you cannot pass additional flags to the `train` command. However, you can specify all your desired options in the config file instead.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already say this above, probably only need to say it once (most people coming to this repo will not be very experienced axolotilians).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My observations:

  • I think this repo is pretty difficult to reason about if you aren't familiar with axolotl IMO. Like what are these configs? How does it work? How are my prompts assembled exactly? What does the dataset format need to be? Are there other dataset formats? How do I check the prompt construction? etc. I was actually assuming that the user is indeed familiar with axolotl.
  • If you are very familiar with axoltol, this --data flag was really confusing to me, because a key parameter in my config that I am used to using is being completely ignored with an extra layer of indirection. I actually got stuck on this personally as an experienced axolotl user, so I found the need to provide these two caveats.

cc: @charlesfrye @winglian curious what you think

README.md Show resolved Hide resolved
README.md Show resolved Hide resolved
config/mistral.yml Show resolved Hide resolved
@hamelsmu
Copy link
Contributor Author

hamelsmu commented Apr 22, 2024

@mwaskom about your question

Can we please not vary parameters across base models where we don't mean to communicate that the parameterizations are model specific?. i.e. if we're going to switch to deepspeed zero1 let's do it for all of the models not just mistral.

I checked and all the other example configs in this repo either do not use any deepspeed config, or are mixtral 8x8b parameter models which are too big to use DS Zero 1. DeepSpeed parameterizations are indeed model specific IMO in that they are specific to the model's size relative to the GPUs being used.

  • codellama: No DeepSpeed config was present
  • llama-2: No DS config was present
  • mistral already had a DS config, but I think it was incorrect IMO that is why I changed it. I don't think we need DS Z3 and all the additional communication overhead that comes with it for a 7B model on a H100 GPU.
  • mixtral (there are two configs) - they are both ok in that they are using DS Z2
  • pythia - no DS config was present

I was inferring from this prior structure that we didn't always assume multi-gpu training and its just the example in the README.

Let me know if I'm misunderstanding or missing something! cc: @charlesfrye

@mwaskom
Copy link
Collaborator

mwaskom commented Apr 22, 2024

Hm are we looking in different places?

$ git grep zero config
config/codellama.yml:deepspeed: /root/axolotl/deepspeed_configs/zero3_bf16.json
config/mistral.yml:deepspeed: /root/axolotl/deepspeed_configs/zero3_bf16.json
config/mixtral.yml:deepspeed: /root/axolotl/deepspeed_configs/zero2.json
config/mixtral_out_of_box.yml:deepspeed: /root/axolotl/deepspeed_configs/zero2.json

That said, already kind of a mess I guess.

@mwaskom
Copy link
Collaborator

mwaskom commented Apr 22, 2024

But as I mentioned above, I think that "what deepspeed to use" is a tricky question, i.e. i think that the original intention here was to demonstrate that zero3 works on modal even if it's not strictly necessary for the model size that we include in the quickstart. But I appreciate that many users will just pick up that config and run with it. So it's hard — a broader question about the point of this repo, IMO.

@hamelsmu
Copy link
Contributor Author

Hm are we looking in different places?

I was looking at some old commit ong GitHub sorry about that, not sure how that happened

@hamelsmu
Copy link
Contributor Author

But I appreciate that many users will just pick up that config and run with it. So it's hard — a broader question about the point of this repo, IMO.

That is an interesting point, perhaps we should huddle and discuss who the think the audience is, what we assume they come with etc. I probably have a much different mental model. I think it would be good to understand that so I can make better choices!

@hamelsmu
Copy link
Contributor Author

re: Deepspeed configs

To resolve the gridlock I can just make everything z3 for now; its not like its going to kill anyone even if it is suboptimal

@mwaskom
Copy link
Collaborator

mwaskom commented Apr 22, 2024

its not like its going to kill anyone even if it is suboptimal

Famous last words in LLM finetuning ;)

@hamelsmu
Copy link
Contributor Author

hamelsmu commented Apr 22, 2024

Famous last words in LLM finetuning ;)

I'll actually just revert that one specific change :) I actually don't want to screw up other configs. They can be quite fiddly and I'm not sure about bf16 etc

@mwaskom mwaskom deleted the rm-gui branch April 22, 2024 19:23
This was referenced Apr 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Proposal: Get Rid of Gradio Interface
3 participants