Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Best practice for train and validation set separation #1181

Closed
denisergashbaev opened this issue Jun 20, 2024 · 5 comments
Closed

Best practice for train and validation set separation #1181

denisergashbaev opened this issue Jun 20, 2024 · 5 comments

Comments

@denisergashbaev
Copy link

Hello! I am interested to know, how we should approach the compilation step. I thought of the following but not sure whether it is a correct practice:

  • provide to optimizer train and validation dataset
  • then, use the compiled module and evaluator to run it on the validation dataset. This should be the reported metric.

Now, some questions:

  • I have noticed, that BootstrapFewShot optimizer added examples from the validation dataset to the demos in the json file. Isn´t it overfitting?
  • Is it ok, if the training set examples come from a different distribution than validation dataset. My training set examples are shorter
  • How do we prevent overfitting to validation set, as the dataset is not split randomly to train/validation?

Thank you!

@okhat
Copy link
Collaborator

okhat commented Jun 22, 2024

Hey @denisergashbaev , which version of DSPy are you on?

In general, when using DSPy there are four data splits to keep in mind: Train, Validation, Development, and Test. (Many optimizers only take 'train' and then internally re-split it to train and validation.)

In the most general case, optimizers are free to do anything with Train and Validation, because they're either 'training' (on Train) or 'hyperparameter tuning' (on Validation). Typically, optimizers should not be provided any access to Development (which you can use to tweak your algorithm), except in very low-data regimes where Validation = Development. Test is test, it's held out for final evaluation.

In practice, optimizers will generally not use Validation for direct fitting, but only for blackbox optimization. This is not guaranteed but it's the case throughout every current optimizer. The instance you saw of BootstrapFewShot using validation is just a bug resulting from a using an undocumented path in the code. (Unlike other teleprompters, BootstrapFewShot is not an optimizer, it's just a meta-prompting approach, so it shouldn't even be given a validation set. The bugfix was to remove the ability to even provide a valset to it. The name valset was being overriden for other uses.)

This was fixed a long time ago, though, so make sure you're on a recent version of DSPy.

Is it ok, if the training set examples come from a different distribution than validation dataset. My training set examples are shorter

Which distribution do you ultimately care about? Make sure you have that in your dev set and track progress until you're satisfied.

@denisergashbaev
Copy link
Author

denisergashbaev commented Jun 23, 2024

Hello @okhat thank you very much for your response!

Hey @denisergashbaev , which version of DSPy are you on?

I am using DSPy v2.4.9 and could reproduce the above error with it. Here is the code that I have used:

    from dspy.teleprompt import BootstrapFewShot
    bfs_optimizer = BootstrapFewShot(metric=metric, teacher_settings=teacher_settings, max_bootstrapped_demos=3, max_labeled_demos=len(train_set), max_rounds=1, max_errors=0)
    page_data_extractor = bfs_optimizer.compile(page_data_extractor, trainset=train_set, valset=val_set)

If I inspect the json file for compiled prompt, I can see that some examples from the validation set end up in there.

Also, BootstrapFewShot documentation mentions valset explicitly as well.

train, validation, dev, test datasets

Let me rephrase your answer to make sure I understand it properly. Could you please correct if I am wrong:

  • to most optimizers we should only provide trainset. They will split it into train (for training the optimizer) and validation (for hyperparameter optimization). Unless we explicitly provide validation set to the optimizer, it would do the split by itself
  • dev set should be used by the developer only. For instance, for manual prompt adjustments or for architecting the modules (ie, splitting the program into modules or deciding how many signatures a module may comprise)
    • how, in this case we should validate the work performed based on devset? via another validation set?

very low-data regimes where Validation = Development

What would be your estimate for a low data regime that would necessitate validation=dev dataset?

Thank you

@Gsizm
Copy link

Gsizm commented Jun 26, 2024

Omar (@okhat), could you please look at @denisergashbaev query above if you could. I happen to have a similar one.

@okhat
Copy link
Collaborator

okhat commented Jun 27, 2024

Hey @denisergashbaev and @Gsizm,

Thanks for the note on the docs page. I've removed the mention of valset from there: it wasn't a correct reference, BootstrapFewShot is not an optimizer (it's an auto-prompting technique, or a non-optimizing teleprompter) and as such has no proper use for a validation set.

BootstrapFewShotWithRandomSearch, on the other hand, is an optimizer. You can and should give it separate trainset and valset. It will build examples from trainset and will score candidate programs on the valset.

If you have several hundreds of examples, I recommend using a devset != valset and not passing devset to any optimizers. That way, you have a way to measure progress before you eventually evaluate on the held-out test set. That said, using valset == devset is often OK too, especially if total data is less than 200-400 examples.

Only two rules are crucial:

  • The final test set should not overlap with any sets used for training or validation or development.
  • Don't have trainset == valset == devset, but if you know what you're doing you can have any 2 of the three.

@okhat okhat closed this as completed Jun 27, 2024
@okhat
Copy link
Collaborator

okhat commented Jun 27, 2024

I assume this is resolved. Feel free to re-open if necessary. I forgot to add that DSPy 2.4.10 should certainly not have a valset argument in BootstrapFewShot; we removed that field in April iirc. Let me know if it's still there or if you have any other thoughts or questions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants