How do I optimise for F1 score? #556

umarbutler · 2024-03-05T07:39:00Z

The documentation states that '[f]or simple tasks, [a metric] could be just "accuracy" or "exact match" or "F1 score". This may be the case for simple classification or short-form QA tasks', yet it does not clarify how F1 scores can be used to optimize programs. Is that possible? And if so, how?

The metric function that is passed to teleprompter.compile seems to take gold and pred as inputs, which are single classifications, so I am unable to see how you could calculate an F1 score based off of that. It would be really helpful to have that ability since not all tasks should be optimised based on accuracy.

The text was updated successfully, but these errors were encountered:

umarbutler · 2024-03-05T10:31:43Z

It would be great if we could define a function that indicates whether a prediction is a true positive, false positive, true negative or false negative and then another function could be passed that takes a confusion matrix and calculates a score based on that. I don't just want to optimize F1 scores, I also want to be able to optimize for F-beta scores, MCC, precision, recall and other metrics calculatable based on confusion matricies.

Another way would be to input a sklearn.metrics function and a function for normalising values into True and False and have that function run before data is passed to the sklearn.metrics function. That would also be very helpful.

okhat · 2024-04-29T16:21:10Z

https://dspy-docs.vercel.app/docs/building-blocks/metrics

umarbutler · 2024-05-17T12:15:07Z

@okhat This does not answer my question. Please see:

The documentation states that '[f]or simple tasks, [a metric] could be just "accuracy" or "exact match" or "F1 score". This may be the case for simple classification or short-form QA tasks', yet it does not clarify how F1 scores can be used to optimize programs. Is that possible? And if so, how?
The metric function that is passed to teleprompter.compile seems to take gold and pred as inputs, which are single classifications, so I am unable to see how you could calculate an F1 score based off of that. It would be really helpful to have that ability since not all tasks should be optimised based on accuracy.

KevinGregory · 2024-06-20T20:48:10Z

https://discord.com/channels/1161519468141355160/1161519642985111593/1234543339404267620

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How do I optimise for F1 score? #556

How do I optimise for F1 score? #556

umarbutler commented Mar 5, 2024

umarbutler commented Mar 5, 2024 •

edited

Loading

okhat commented Apr 29, 2024

umarbutler commented May 17, 2024

KevinGregory commented Jun 20, 2024

How do I optimise for F1 score? #556

How do I optimise for F1 score? #556

Comments

umarbutler commented Mar 5, 2024

umarbutler commented Mar 5, 2024 • edited Loading

okhat commented Apr 29, 2024

umarbutler commented May 17, 2024

KevinGregory commented Jun 20, 2024

umarbutler commented Mar 5, 2024 •

edited

Loading