-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Implement FSDP, drop usage of GeneratorBuilder, DIY caching #221
Conversation
This reverts commit 9945aeb.
This reverts commit 5ce2725.
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
fbf58dd
to
7fabb36
Compare
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
result_queue_dict=self._result_queues, | ||
) | ||
) | ||
return result[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here
raise ValueError(f"Invalid token_loc: {token_loc}") | ||
converted_hiddens = pytree_map(lambda x: x.cpu().share_memory_(), hiddens) | ||
|
||
return SmallerOutput(lm_logits=returned_logits, hidden_states=converted_hiddens) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the function to run on the server. note that you need to shift the shared tensor to the worker's device properly to run, then you'll also need to remember to bring it back to cpu shared so it can be added back to the queue
Unfortunately this is pretty big. Had to change a couple of stuff to get FSDP working.
You can run it out with
Issues
Figuring out the memory required
The min_gpu_mem can be passed. I indicates the memory for the whole model.
mkl
You may encounter an error like this:
Github issue
To fix, run this before running. Still figuring out why this is happening. Its supposed to be fixed with the latest mkl package, but it aint for me.
export MKL_THREADING_LAYER=GNU
too many open files
Sometimes it'll complain about too many open files inccrease the ulimit ``` ulimit -n 4048 ```
❤️ QA instructions
checkout to this branch
refactor-datasets-usage
Run elicit with huggyllama 7b with these variations.
For each of the runs, check that the eval.csv are roughly the same. and lmk if it crashes.
Note that we are disabling the cache for extracting here. Otherwise subsequent elicit runs won't actually run the extraction with llama, it will just reuse it.
With fsdp. This shards the model on each device.
Without fsdp, but multigpu. This duplicates the model on each device
Now compare this to the main branch. Does llama-7b take significantly slower?
If the above seems to work without crashing, and if you are feeling ambitious.
you can merge in the latest changes into this branchand fix the conflicts. may be confusing though.