Default behavior bootstrapping #1121

tom-doerr · 2024-06-07T18:27:12Z

I think during bootstrapping there should be a special case for a situations where all scores and scalars are non zero.
To me it seems that bootstrapping fails completely and decreases performance when a metric is used that never or almost never is zero.

arnavsinghvi11 · 2024-06-19T23:01:57Z

Hey @tom-doerr , I think I understand the issue here, but I'd love to hear more on what you mean, potentially with an example if possible.

Is the idea that there should be more dynamic feedback during the bootstrapping to ensure the demonstration selection will lead to equal or better performance compared to the uncompiled program, and avoid such cases where performance decreases? We could definitely explore some improvements to the existing BootstrapFewShot optimizer.

tom-doerr · 2024-06-19T23:15:36Z

I'm not sure what a good solution would be, but the current behavior isn't optimal, in my opinion.

Example:
My objective is to generate great tweets.

tweet, score
Hdhuhhdhdh, 0.01
U88hdju, 0.1
Jhdjdjdjjd if d, 0.05
Hdhjjd, 0.02
Good morning to all my followers! 🌞, 0.8

Bootstrap will use the first 4 nonsense samples in its prompt, making performance much worse. This also happened to me on real data.
Sure, you can do random search, but in this case, it would also deliver worse than uncompiled performance.

arnavsinghvi11 · 2024-06-19T23:19:52Z

I see! Ironically, as I was updating documentation for Bootstrap from your other issue #1118, the arg metric_threshold would be very useful in this case.
You could set a threshold of 0.75 for example to avoid having any example with a non-zero score being selected, and then the bootstrapping would only consider "passing examples" as ones that follow that condition. Let me know if that make sense!

okhat closed this as completed Jun 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Default behavior bootstrapping #1121

Default behavior bootstrapping #1121

tom-doerr commented Jun 7, 2024

arnavsinghvi11 commented Jun 19, 2024

tom-doerr commented Jun 19, 2024

arnavsinghvi11 commented Jun 19, 2024

Default behavior bootstrapping #1121

Default behavior bootstrapping #1121

Comments

tom-doerr commented Jun 7, 2024

arnavsinghvi11 commented Jun 19, 2024

tom-doerr commented Jun 19, 2024

arnavsinghvi11 commented Jun 19, 2024