-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove GUI and Update Docs #48
Conversation
Ok looks like an error with EDIT: I see that already fixed this in #46 |
Can we please not vary parameters across base models where we don't mean to communicate that the parameterizations are model specific?. i.e. if we're going to switch to deepspeed zero1 let's do it for all of the models not just mistral. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some nits on the new readme text.
Some important caveats about the `train` command: | ||
|
||
- The `--data` flag is used to pass your dataset to axolotl. This dataset is then written to the `datasets.path` as specified in your config file. If you alraedy have a dataset at `datasets.path`, you must be careful to also pass the same path to `--data` to ensure the dataset is correctly loaded. | ||
- Unlike axolotl, you cannot pass additional flags to the `train` command. However, you can specify all your desired options in the config file instead. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already say this above, probably only need to say it once (most people coming to this repo will not be very experienced axolotilians).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My observations:
- I think this repo is pretty difficult to reason about if you aren't familiar with axolotl IMO. Like what are these configs? How does it work? How are my prompts assembled exactly? What does the dataset format need to be? Are there other dataset formats? How do I check the prompt construction? etc. I was actually assuming that the user is indeed familiar with axolotl.
- If you are very familiar with
axoltol
, this--data
flag was really confusing to me, because a key parameter in my config that I am used to using is being completely ignored with an extra layer of indirection. I actually got stuck on this personally as an experienced axolotl user, so I found the need to provide these two caveats.
cc: @charlesfrye @winglian curious what you think
@mwaskom about your question
I checked and all the other example configs in this repo either do not use any deepspeed config, or are mixtral 8x8b parameter models which are too big to use DS Zero 1. DeepSpeed parameterizations are indeed model specific IMO in that they are specific to the model's size relative to the GPUs being used.
I was inferring from this prior structure that we didn't always assume multi-gpu training and its just the example in the README. Let me know if I'm misunderstanding or missing something! cc: @charlesfrye |
Hm are we looking in different places?
That said, already kind of a mess I guess. |
But as I mentioned above, I think that "what deepspeed to use" is a tricky question, i.e. i think that the original intention here was to demonstrate that zero3 works on modal even if it's not strictly necessary for the model size that we include in the quickstart. But I appreciate that many users will just pick up that config and run with it. So it's hard — a broader question about the point of this repo, IMO. |
I was looking at some old commit ong GitHub sorry about that, not sure how that happened |
That is an interesting point, perhaps we should huddle and discuss who the think the audience is, what we assume they come with etc. I probably have a much different mental model. I think it would be good to understand that so I can make better choices! |
re: Deepspeed configs To resolve the gridlock I can just make everything z3 for now; its not like its going to kill anyone even if it is suboptimal |
Famous last words in LLM finetuning ;) |
I'll actually just revert that one specific change :) I actually don't want to screw up other configs. They can be quite fiddly and I'm not sure about bf16 etc |
This PR is in lieu of #43. I also am removing the GUI component per discussion in slack
This PR closes #45
(reopening this from a branch instead of a fork).