Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Result evaluation and comparison #16

Open
DonggeLiu opened this issue Feb 4, 2020 · 1 comment
Open

Result evaluation and comparison #16

DonggeLiu opened this issue Feb 4, 2020 · 1 comment
Assignees
Labels
Benchmarking Issues related to benchmarking

Comments

@DonggeLiu
Copy link
Owner

  1. Can we access the results of other tools in the competition?
    It's best to have them in CSV files (e.g. like what we generated in pre-competition experiments) so that we can cherry-pick benchmarks according to Legion's compatibility and only compare those scores.

  2. I failed to reproduce the final score of each tool in the competition from the score of each category with the formula from the Google Sheets of our pre-competition experiments:

    • This is important as we want to compute the score of our new experiments
    • How are the final scores computed from the scores of each category
    • Did they remove the results of some benchmarks? For example, SQLite-MemSafety has only 1 task where everyone got 0; some benchmarks from other sets have the same problem. How did they deal with them?
    • By normalisation, do they mean simply taking averages? (i.e. like we did in our pre-competition experiments)
@DonggeLiu DonggeLiu added the Benchmarking Issues related to benchmarking label Feb 4, 2020
@DonggeLiu
Copy link
Owner Author

Issue 2 is explained by rounding errors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Benchmarking Issues related to benchmarking
Projects
None yet
Development

No branches or pull requests

2 participants