- Date: 2024-05-29
- Change: Added MATH, GSM8K, and aqua-rat.
- Total Number of Tasks: 92
- Total Number of Instances: 45655
Index | Task Name | Version | Construction | Arxiv Link | License | Dataset Size |
---|---|---|---|---|---|---|
1 | aqua-rat | v1 | {'class': 'mcq', 'n_choices': 5} | https://arxiv.org/abs/1705.04146 | apache-2.0 | 254 |
2 | arc-challenge | v1 | {'class': 'mcq', 'n_choices': 'mixed'} | https://arxiv.org/abs/1803.05457 | cc-by-sa-4.0 | 1172 |
3 | arc-easy | v1 | {'class': 'mcq', 'n_choices': 'mixed'} | https://arxiv.org/abs/1803.05457 | cc-by-sa-4.0 | 2376 |
4 | commonsenseqa | v1 | {'class': 'mcq', 'n_choices': 5} | https://arxiv.org/abs/1811.00937 | mit | 1221 |
5 | gsm8k | v1 | {'class': 'frq', 'type': 'simple'} | https://arxiv.org/abs/2110.14168 | mit | 1319 |
6 | hellaswag | v1 | {'class': 'mcq', 'n_choices': 4} | https://arxiv.org/abs/1905.07830 | mit | 10042 |
7 | hhh-alignment-harmless | v1 | {'class': 'mcq', 'n_choices': 2} | https://arxiv.org/abs/2112.00861 | apache-2.0 | 58 |
8 | hhh-alignment-helpful | v1 | {'class': 'mcq', 'n_choices': 2} | https://arxiv.org/abs/2112.00861 | apache-2.0 | 59 |
9 | hhh-alignment-honest | v1 | {'class': 'mcq', 'n_choices': 2} | https://arxiv.org/abs/2112.00861 | apache-2.0 | 61 |
10 | hhh-alignment-other | v1 | {'class': 'mcq', 'n_choices': 2} | https://arxiv.org/abs/2112.00861 | apache-2.0 | 43 |
11 | htest-end-ly | v1 | {'class': 'mcq', 'n_choices': 2} | https://arxiv.org/abs/2110.14168 | mit | 200 |
12 | htest-end-punctuation | v1 | {'class': 'mcq', 'n_choices': 2} | https://arxiv.org/abs/2110.14168 | mit | 200 |
13 | htest-hyphenated-word | v1 | {'class': 'mcq', 'n_choices': 2} | https://arxiv.org/abs/2110.14168 | mit | 200 |
14 | htest-palindrome | v1 | {'class': 'mcq', 'n_choices': 2} | https://arxiv.org/abs/2110.14168 | mit | 200 |
15 | htest-repeated-word | v1 | {'class': 'mcq', 'n_choices': 2} | https://arxiv.org/abs/2110.14168 | mit | 200 |
16 | htest-rhyme | v1 | {'class': 'mcq', 'n_choices': 2} | https://arxiv.org/abs/2110.14168 | mit | 200 |
17 | htest-spelled-math | v1 | {'class': 'mcq', 'n_choices': 2} | https://arxiv.org/abs/2110.14168 | mit | 200 |
18 | htest-spelled-number | v1 | {'class': 'mcq', 'n_choices': 2} | https://arxiv.org/abs/2110.14168 | mit | 200 |
19 | htest-start-vowel | v1 | {'class': 'mcq', 'n_choices': 2} | https://arxiv.org/abs/2110.14168 | mit | 200 |
20 | htest-uppercase | v1 | {'class': 'mcq', 'n_choices': 2} | https://arxiv.org/abs/2110.14168 | mit | 200 |
21 | math-algebra | v1 | {'class': 'frq', 'type': 'simple'} | https://arxiv.org/abs/2103.03874 | mit | 1187 |
22 | math-counting-and-probability | v1 | {'class': 'frq', 'type': 'simple'} | https://arxiv.org/abs/2103.03874 | mit | 474 |
23 | math-geometry | v1 | {'class': 'frq', 'type': 'simple'} | https://arxiv.org/abs/2103.03874 | mit | 479 |
24 | math-intermediate-algebra | v1 | {'class': 'frq', 'type': 'simple'} | https://arxiv.org/abs/2103.03874 | mit | 903 |
25 | math-number-theory | v1 | {'class': 'frq', 'type': 'simple'} | https://arxiv.org/abs/2103.03874 | mit | 540 |
26 | math-prealgebra | v1 | {'class': 'frq', 'type': 'simple'} | https://arxiv.org/abs/2103.03874 | mit | 871 |
27 | math-precalculus | v1 | {'class': 'frq', 'type': 'simple'} | https://arxiv.org/abs/2103.03874 | mit | 546 |
28 | medqa-usmle | v1 | {'class': 'mcq', 'n_choices': 5} | https://arxiv.org/abs/2009.13081 | mit | 1273 |
29 | mfq-30 | v1 | {'class': 'mcq-survey', 'n_choices': 6} | https://psycnet.apa.org/doiLanding?doi=10.1037%2Ft05651-000 | None | 32 |
30 | mmlu-abstract-algebra | v1 | {'class': 'mcq', 'n_choices': 4} | https://arxiv.org/abs/2009.03300 | mit | 100 |
31 | mmlu-anatomy | v1 | {'class': 'mcq', 'n_choices': 4} | https://arxiv.org/abs/2009.03300 | mit | 135 |
32 | mmlu-astronomy | v1 | {'class': 'mcq', 'n_choices': 4} | https://arxiv.org/abs/2009.03300 | mit | 152 |
33 | mmlu-business-ethics | v1 | {'class': 'mcq', 'n_choices': 4} | https://arxiv.org/abs/2009.03300 | mit | 100 |
34 | mmlu-clinical-knowledge | v1 | {'class': 'mcq', 'n_choices': 4} | https://arxiv.org/abs/2009.03300 | mit | 265 |
35 | mmlu-college-biology | v1 | {'class': 'mcq', 'n_choices': 4} | https://arxiv.org/abs/2009.03300 | mit | 144 |
36 | mmlu-college-chemistry | v1 | {'class': 'mcq', 'n_choices': 4} | https://arxiv.org/abs/2009.03300 | mit | 100 |
37 | mmlu-college-computer-science | v1 | {'class': 'mcq', 'n_choices': 4} | https://arxiv.org/abs/2009.03300 | mit | 100 |
38 | mmlu-college-mathematics | v1 | {'class': 'mcq', 'n_choices': 4} | https://arxiv.org/abs/2009.03300 | mit | 100 |
39 | mmlu-college-medicine | v1 | {'class': 'mcq', 'n_choices': 4} | https://arxiv.org/abs/2009.03300 | mit | 173 |
40 | mmlu-college-physics | v1 | {'class': 'mcq', 'n_choices': 4} | https://arxiv.org/abs/2009.03300 | mit | 102 |
41 | mmlu-computer-security | v1 | {'class': 'mcq', 'n_choices': 4} | https://arxiv.org/abs/2009.03300 | mit | 100 |
42 | mmlu-conceptual-physics | v1 | {'class': 'mcq', 'n_choices': 4} | https://arxiv.org/abs/2009.03300 | mit | 235 |
43 | mmlu-econometrics | v1 | {'class': 'mcq', 'n_choices': 4} | https://arxiv.org/abs/2009.03300 | mit | 114 |
44 | mmlu-electrical-engineering | v1 | {'class': 'mcq', 'n_choices': 4} | https://arxiv.org/abs/2009.03300 | mit | 145 |
45 | mmlu-elementary-mathematics | v1 | {'class': 'mcq', 'n_choices': 4} | https://arxiv.org/abs/2009.03300 | mit | 378 |
46 | mmlu-formal-logic | v1 | {'class': 'mcq', 'n_choices': 4} | https://arxiv.org/abs/2009.03300 | mit | 126 |
47 | mmlu-global-facts | v1 | {'class': 'mcq', 'n_choices': 4} | https://arxiv.org/abs/2009.03300 | mit | 100 |
48 | mmlu-high-school-biology | v1 | {'class': 'mcq', 'n_choices': 4} | https://arxiv.org/abs/2009.03300 | mit | 310 |
49 | mmlu-high-school-chemistry | v1 | {'class': 'mcq', 'n_choices': 4} | https://arxiv.org/abs/2009.03300 | mit | 203 |
50 | mmlu-high-school-computer-science | v1 | {'class': 'mcq', 'n_choices': 4} | https://arxiv.org/abs/2009.03300 | mit | 100 |
51 | mmlu-high-school-european-history | v1 | {'class': 'mcq', 'n_choices': 4} | https://arxiv.org/abs/2009.03300 | mit | 165 |
52 | mmlu-high-school-geography | v1 | {'class': 'mcq', 'n_choices': 4} | https://arxiv.org/abs/2009.03300 | mit | 198 |
53 | mmlu-high-school-government-and-politics | v1 | {'class': 'mcq', 'n_choices': 4} | https://arxiv.org/abs/2009.03300 | mit | 193 |
54 | mmlu-high-school-macroeconomics | v1 | {'class': 'mcq', 'n_choices': 4} | https://arxiv.org/abs/2009.03300 | mit | 390 |
55 | mmlu-high-school-mathematics | v1 | {'class': 'mcq', 'n_choices': 4} | https://arxiv.org/abs/2009.03300 | mit | 270 |
56 | mmlu-high-school-microeconomics | v1 | {'class': 'mcq', 'n_choices': 4} | https://arxiv.org/abs/2009.03300 | mit | 238 |
57 | mmlu-high-school-physics | v1 | {'class': 'mcq', 'n_choices': 4} | https://arxiv.org/abs/2009.03300 | mit | 151 |
58 | mmlu-high-school-psychology | v1 | {'class': 'mcq', 'n_choices': 4} | https://arxiv.org/abs/2009.03300 | mit | 545 |
59 | mmlu-high-school-statistics | v1 | {'class': 'mcq', 'n_choices': 4} | https://arxiv.org/abs/2009.03300 | mit | 216 |
60 | mmlu-high-school-us-history | v1 | {'class': 'mcq', 'n_choices': 4} | https://arxiv.org/abs/2009.03300 | mit | 204 |
61 | mmlu-high-school-world-history | v1 | {'class': 'mcq', 'n_choices': 4} | https://arxiv.org/abs/2009.03300 | mit | 237 |
62 | mmlu-human-aging | v1 | {'class': 'mcq', 'n_choices': 4} | https://arxiv.org/abs/2009.03300 | mit | 223 |
63 | mmlu-human-sexuality | v1 | {'class': 'mcq', 'n_choices': 4} | https://arxiv.org/abs/2009.03300 | mit | 131 |
64 | mmlu-international-law | v1 | {'class': 'mcq', 'n_choices': 4} | https://arxiv.org/abs/2009.03300 | mit | 121 |
65 | mmlu-jurisprudence | v1 | {'class': 'mcq', 'n_choices': 4} | https://arxiv.org/abs/2009.03300 | mit | 108 |
66 | mmlu-logical-fallacies | v1 | {'class': 'mcq', 'n_choices': 4} | https://arxiv.org/abs/2009.03300 | mit | 163 |
67 | mmlu-machine-learning | v1 | {'class': 'mcq', 'n_choices': 4} | https://arxiv.org/abs/2009.03300 | mit | 112 |
68 | mmlu-management | v1 | {'class': 'mcq', 'n_choices': 4} | https://arxiv.org/abs/2009.03300 | mit | 103 |
69 | mmlu-marketing | v1 | {'class': 'mcq', 'n_choices': 4} | https://arxiv.org/abs/2009.03300 | mit | 234 |
70 | mmlu-medical-genetics | v1 | {'class': 'mcq', 'n_choices': 4} | https://arxiv.org/abs/2009.03300 | mit | 100 |
71 | mmlu-miscellaneous | v1 | {'class': 'mcq', 'n_choices': 4} | https://arxiv.org/abs/2009.03300 | mit | 783 |
72 | mmlu-moral-disputes | v1 | {'class': 'mcq', 'n_choices': 4} | https://arxiv.org/abs/2009.03300 | mit | 346 |
73 | mmlu-moral-scenarios | v1 | {'class': 'mcq', 'n_choices': 4} | https://arxiv.org/abs/2009.03300 | mit | 895 |
74 | mmlu-nutrition | v1 | {'class': 'mcq', 'n_choices': 4} | https://arxiv.org/abs/2009.03300 | mit | 306 |
75 | mmlu-philosophy | v1 | {'class': 'mcq', 'n_choices': 4} | https://arxiv.org/abs/2009.03300 | mit | 311 |
76 | mmlu-prehistory | v1 | {'class': 'mcq', 'n_choices': 4} | https://arxiv.org/abs/2009.03300 | mit | 324 |
77 | mmlu-professional-accounting | v1 | {'class': 'mcq', 'n_choices': 4} | https://arxiv.org/abs/2009.03300 | mit | 282 |
78 | mmlu-professional-law | v1 | {'class': 'mcq', 'n_choices': 4} | https://arxiv.org/abs/2009.03300 | mit | 1534 |
79 | mmlu-professional-medicine | v1 | {'class': 'mcq', 'n_choices': 4} | https://arxiv.org/abs/2009.03300 | mit | 272 |
80 | mmlu-professional-psychology | v1 | {'class': 'mcq', 'n_choices': 4} | https://arxiv.org/abs/2009.03300 | mit | 612 |
81 | mmlu-public-relations | v1 | {'class': 'mcq', 'n_choices': 4} | https://arxiv.org/abs/2009.03300 | mit | 110 |
82 | mmlu-security-studies | v1 | {'class': 'mcq', 'n_choices': 4} | https://arxiv.org/abs/2009.03300 | mit | 245 |
83 | mmlu-sociology | v1 | {'class': 'mcq', 'n_choices': 4} | https://arxiv.org/abs/2009.03300 | mit | 201 |
84 | mmlu-us-foreign-policy | v1 | {'class': 'mcq', 'n_choices': 4} | https://arxiv.org/abs/2009.03300 | mit | 100 |
85 | mmlu-virology | v1 | {'class': 'mcq', 'n_choices': 4} | https://arxiv.org/abs/2009.03300 | mit | 166 |
86 | mmlu-world-religions | v1 | {'class': 'mcq', 'n_choices': 4} | https://arxiv.org/abs/2009.03300 | mit | 171 |
87 | openbookqa | v1 | {'class': 'mcq', 'n_choices': 4} | https://arxiv.org/abs/1809.02789 | apache-2.0 | 500 |
88 | piqa | v1 | {'class': 'mcq', 'n_choices': 2} | https://arxiv.org/abs/1911.11641 | afl-3.0 | 1838 |
89 | pvq-rr | v1 | {'class': 'mcq-survey', 'n_choices': 6} | https://scholarworks.gvsu.edu/orpc/vol2/iss2/9/ | CC BY-NC-ND 3.0 DEED | 57 |
90 | socialiqa | v1 | {'class': 'mcq', 'n_choices': 3} | https://arxiv.org/abs/1904.09728 | cc-by-4.0 | 2224 |
91 | truthfulqa-mc1 | v1 | {'class': 'mcq', 'n_choices': 'mixed'} | https://arxiv.org/abs/2109.07958 | apache-2.0 | 817 |
92 | winogrande | v1 | {'class': 'mcq', 'n_choices': 2} | https://arxiv.org/abs/1907.10641 | apache-2.0 | 1267 |