Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Zeno Integration #1175

Open
4 tasks
haileyschoelkopf opened this issue Dec 20, 2023 · 4 comments
Open
4 tasks

Update Zeno Integration #1175

haileyschoelkopf opened this issue Dec 20, 2023 · 4 comments
Assignees
Labels
feature request A feature that isn't implemented yet.

Comments

@haileyschoelkopf
Copy link
Contributor

There are a few things we'll need to do to fix edge cases in our Zeno integration.

  • log info on the groups tasks may or may not have been called as part of, as well as which tasknames are themselves groups, as part of the results.json file we save
  • Ensure Winograd Schema / minimal pair -like tasks (Winogrande, BLIMP, CrowS-Pairs) are handled intelligently in the Zeno export script
  • Any other changes on our end to beautify Zeno projects created?
  • Any tests we can set up to ensure Zeno support is not broken?

cc @lintangsutawika for your awareness because changing how metrics / aggregations are computed to unify them might harm Zeno's reliance on per-example metrics.

cc @Sparkier !

@haileyschoelkopf haileyschoelkopf added the feature request A feature that isn't implemented yet. label Dec 20, 2023
@haileyschoelkopf haileyschoelkopf self-assigned this Dec 20, 2023
@Sparkier
Copy link
Contributor

Regarding tests, we could run a test on our end. E.g., we have integration tests set up where we have a project created on push. Would not alert you if anything breaks, though. I can also help set up a test that uploads a project to Zeno in your codebase if you want to.

@Sparkier
Copy link
Contributor

Any other changes on our end to beautify Zeno projects created?

We've seen some patterns in useful metadata recently. For example, the model output length in freeform answers is often interesting. We could think about additional metadata that would make sense per task to further enhance the created Zeno project.

@Sparkier
Copy link
Contributor

For tests, see #1221

@Sparkier
Copy link
Contributor

For additional metadata, see #1222

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request A feature that isn't implemented yet.
Projects
None yet
Development

No branches or pull requests

2 participants