Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Robust testing suite #957

Closed
25 tasks
StellaAthena opened this issue May 28, 2023 · 8 comments · Fixed by #1080
Closed
25 tasks

Robust testing suite #957

StellaAthena opened this issue May 28, 2023 · 8 comments · Fixed by #1080
Assignees
Labels
feature request New feature or request good first issue Good for newcomers help wanted This issue needs assistance

Comments

@StellaAthena
Copy link
Member

StellaAthena commented May 28, 2023

Recently we’ve been having issues with new features breaking old ones. We do not currently have a detailed testing suite that validates all of the features of GPT-NeoX, but the codebase is large and complicated enough that we definitely should. This is going to be the start of a list of tests we should support off the top of my head, feel free to post comments with additional ones and I’ll add them. I’ll also start going through things and building it up systematically.

We have some tests here but they are missing notable new features and don’t seem to be updated as the library changes.

Data Processing

  • Download scripts run
  • Preprocessing with each supported tokenizer works
  • Training a new tokenizer

Primary Functions

  • Launcher scripts
  • Training (on one GPU, one node, and one pod)
  • Finetuning (especially loading and training without optimizer states)
  • Inference
  • Evaluation

Optimizations and Parallelizations

  • ZeRO works and memory usage is within prescribed limits
  • fp16 and bf16
  • Various MP and PP values
  • Flash Attention

Model Options

  • GPT-J residual
  • LLaMA MLP
  • Positional embeddings
  • Sparse attention
  • Dropout and Weight decay
  • Kernel fusions
  • With / without bias terms

Conversion Scripts

  • NeoX -> HF transformers library
  • NeoX -> Megatron-DS
  • NeoX -> SafeTensors
  • NeoX V1 -> NeoX V2

Misc Features

@StellaAthena StellaAthena added feature request New feature or request good first issue Good for newcomers help wanted This issue needs assistance labels May 28, 2023
@Quentin-Anthony
Copy link
Member

The DeepSpeed unit test suite is very good, and I suggest that whoever picks this up use them as a template.

Once we have a solid test suite, they should be applied to every PR as a github action, similar to how DeepSpeed does it.

@whiz-Tuhin
Copy link

@StellaAthena @Quentin-Anthony I can pick this up. I'll need a day or two to go through the codebase as well as the DeepSpeed unit test suite. Will keep this thread updated.

@StellaAthena
Copy link
Member Author

StellaAthena commented Jun 5, 2023

@whiz-Tuhin That's awesome, welcome to the team! I went ahead and edited the OP with some additional things that should probably be incorporated though I'm sure it's still not totally comprehensive. That said, don't get overwhelmed by the number of tests to write. A small and reliable test suite that doesn't cover every feature but does cover major ones would be a huge value add. We can then build on that over time to be more and more comprehensive.

Quentin knows what he's talking about, so I would definitely start by working on porting the DeepSpeed tests over, removing the stuff we don't need and adding tests for things DeepSpeed doesn't support.

I've sent you an invite to the EleutherAI Org that will allow you to work on a non-main branch without having to mess around with forking the library.

@Quentin-Anthony
Copy link
Member

Looking again for someone to pick this up!

@mkerin
Copy link
Contributor

mkerin commented Nov 7, 2023

Hi @Quentin-Anthony @StellaAthena - just checking if you're still looking for a volunteer here? I have good bandwidth to work on this over the next 2 weeks, which I think should be more than enough time (& I've been looking for an opportunity to contribue for a while!).

@StellaAthena
Copy link
Member Author

Hi @Quentin-Anthony @StellaAthena - just checking if you're still looking for a volunteer here? I have good bandwidth to work on this over the next 2 weeks, which I think should be more than enough time (& I've been looking for an opportunity to contribue for a while!).

Yes! That would be phenomenal

@Quentin-Anthony
Copy link
Member

Hi @Quentin-Anthony @StellaAthena - just checking if you're still looking for a volunteer here? I have good bandwidth to work on this over the next 2 weeks, which I think should be more than enough time (& I've been looking for an opportunity to contribue for a while!).

@mkerin -- Great to hear! Please reach out to me here or over Discord if you need any help or have any questions. A good first step would be to just run our existing CPU tests locally, then start with some of the simple model option tests.

@mkerin
Copy link
Contributor

mkerin commented Nov 7, 2023

Hey Quentin - thank you for the tips!

I have got the existing CPU tests running & started working on integration tests for the data processing functionality (I didn’t see your message about focussing on model options until just now).

I have dropped you a line on discord. My handle there is mkez.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request good first issue Good for newcomers help wanted This issue needs assistance
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants