Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Implement FSDP, drop usage of GeneratorBuilder, DIY caching #221

Closed
wants to merge 168 commits into from

Conversation

thejaminator
Copy link
Collaborator

@thejaminator thejaminator commented Apr 29, 2023

Unfortunately this is pretty big. Had to change a couple of stuff to get FSDP working.

  • We now call an InferenceServer which serves the model predictions. It works for both FSDP and just one model per gpu style serving.
  • We aren't depending on huggingface's dataset builder anymore. Previously we were using it for multiprocessing + one model per process. But now we're use our inferenceserver which manages it. And you can't use multiprocessing to call our InferencecServer on separate processes.
  • Instead we just manually get the results and create our dataset ourselves.
  • Because of that, we need to DIY our own cache
  • And to make sure our workers in the inferenceserver are fully utilized, when calling the inference server we call it on multiple threads. The InferenceServer is designed to be threadsafe, so hopefully it works.

You can run it out with

elk elicit huggyllama/llama-{7b,13b,30b,65b} imdb --fsdp_enabled --num_gpus {2-8}

Issues

Figuring out the memory required

The min_gpu_mem can be passed. I indicates the memory for the whole model.

--min_gpu_mem {memory_required_for_whole_model}

mkl

You may encounter an error like this:
Github issue

Error: mkl-service + Intel(R) MKL: MKL_THREADING_LAYER=INTEL is incompatible with libgomp.so.1 library.
Try to import numpy first or set the threading layer accordingly. Set MKL_SERVICE_FORCE_INTEL to force it.

To fix, run this before running. Still figuring out why this is happening. Its supposed to be fixed with the latest mkl package, but it aint for me.

export MKL_THREADING_LAYER=GNU

too many open files

image

Sometimes it'll complain about too many open files inccrease the ulimit ``` ulimit -n 4048 ```

❤️ QA instructions

checkout to this branch refactor-datasets-usage
Run elicit with huggyllama 7b with these variations.
For each of the runs, check that the eval.csv are roughly the same. and lmk if it crashes.
Note that we are disabling the cache for extracting here. Otherwise subsequent elicit runs won't actually run the extraction with llama, it will just reuse it.

With fsdp. This shards the model on each device.

elk elicit huggyllama/llama-7b imdb --fsdp_enabled --num_gpus 2 --disable_cache

Without fsdp, but multigpu. This duplicates the model on each device

elk elicit huggyllama/llama-7b imdb ----num_gpus 2 --disable_cache

Now compare this to the main branch. Does llama-7b take significantly slower?

If the above seems to work without crashing, and if you are feeling ambitious.
you can merge in the latest changes into this branchand fix the conflicts. may be confusing though.

@thejaminator thejaminator force-pushed the refactor-datasets-usage branch 2 times, most recently from fbf58dd to 7fabb36 Compare April 29, 2023 20:11
result_queue_dict=self._result_queues,
)
)
return result[0]
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here

raise ValueError(f"Invalid token_loc: {token_loc}")
converted_hiddens = pytree_map(lambda x: x.cpu().share_memory_(), hiddens)

return SmallerOutput(lm_logits=returned_logits, hidden_states=converted_hiddens)
Copy link
Collaborator Author

@thejaminator thejaminator Apr 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the function to run on the server. note that you need to shift the shared tensor to the worker's device properly to run, then you'll also need to remember to bring it back to cpu shared so it can be added back to the queue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant