Skip to content

Latest commit

 

History

History
84 lines (81 loc) · 23.5 KB

lowest.md

File metadata and controls

84 lines (81 loc) · 23.5 KB

Tatoeba Challenge Data - v2023-09-26

This is the "lowest" sub-set of the Tatoeba data. Download the data files from the link in the table below. There is a total of

  • 74 language pairs in this sub-set
lang-pair test dev train
Arabic - Berber languages ara-ber 2498 1058 4
Kotava - French avk-fra 1244 43 5
Berber languages - German ber-deu 671 5
Berber languages - English ber-eng 26088 108422 32
Berber languages - Esperanto ber-epo 1576 23 5
Berber languages - French ber-fra 14666 34705 1
Berber languages - Spanish ber-spa 10339 14044 5
Buriat - Russian bua-rus 806 1024
Chavacano - English cbk-eng 1498 1026 16
Mari (Russia) - Russian chm-rus 2750 1000 3699
Cornish - German cor-deu 821 896
Cornish - English cor-eng 3198 1000 2434
Cornish - Esperanto cor-epo 663 1062
Cornish - French cor-fra 555 1064
Cornish - Italian cor-ita 287 1121
Cornish - Russian cor-rus 218 1304
Cornish - Spanish cor-spa 206 1155
Crimean Tatar - Turkish crh-tur 208 9111
German - Ido deu-ido 1000 928 5210
German - Interlingue deu-ile 2278 1372 3
German - Lojban deu-jbo 1449 20 381
German - Klingon deu-tlh 1099 1081 113
German - Volapük deu-vol 214 7 1
English - Gothic eng-got 207 117
English - Ancient Greek (to 1453) eng-grc 623 6 19
English - Swiss German eng-gsw 218 3 106
English - Interlingue eng-ile 1711 1143 242
English - Lojban eng-jbo 4996 7050 451
English - Ladino eng-lad 867 58 29
English - Lingua Franca Nova eng-lfn 4594 3706 43
English - Novial eng-nov 222 14 4
English - Pampanga eng-pam 1000 494 30
English - Piemontese eng-pms 270 45
English - Klingon eng-tlh 5000 8717 136
English - Volapük eng-vol 1541 1356 70
English - Kalmyk eng-xal 281 28
Esperanto - Esperanto epo-epo 10000 9874 266
Esperanto - Ido epo-ido 1183 743 7173
Esperanto - Interlingue epo-ile 334 23 7
Esperanto - Lojban epo-jbo 1167 64 304
Esperanto - Ladino epo-lad 475 52 12
Esperanto - Lingua Franca Nova epo-lfn 997 604 6
Esperanto - Klingon epo-tlh 1930 247 136
French - Ido fra-ido 566 26 6885
French - Interlingue fra-ile 401 8 2
French - Lojban fra-jbo 1140 5 172
French - Picard fra-pcd 268 57
French - Klingon fra-tlh 648 15 129
Ido - Interlingua (International Auxiliary Language Association) ido-ina 414 52 455
Ido - Italian ido-ita 1460 73 6558
Ido - Latin ido-lat 214 32 3024
Ido - Spanish ido-spa 584 15 6706
Ido - Yiddish ido-yid 576 58 1418
Interlingua (International Auxiliary Language Association) - Latin ina-lat 1017 70 162
Interlingua (International Auxiliary Language Association) - Klingon ina-tlh 284 34 5
Interlingua (International Auxiliary Language Association) - Yiddish ina-yid 997 328 311
Italian - Ligurian ita-lij 216 6120
Italian - Piemontese ita-pms 233 91
Lojban - Japanese jbo-jpn 921 304
Lojban - Russian jbo-rus 1199 11 343
Lojban - Spanish jbo-spa 1507 15 77
Lojban - Swedish jbo-swe 243 302
Lojban - Chinese jbo-zho 518 23 507
Japanese - Klingon jpn-tlh 676 131
Ladino - Spanish lad-spa 336 25 18
Latin - Klingon lat-tlh 268 30 15
Lingua Franca Nova - Spanish lfn-spa 389 46 7
Russian - Yakut rus-sah 994 5953
Russian - Klingon rus-tlh 262 6 133
Russian - Kalmyk rus-xal 209 25
Spanish - Klingon spa-tlh 348 10 145
Klingon - Yiddish tlh-yid 408 49 3
Klingon - Chinese tlh-zho 448 1 119
Ukrainian - Ukrainian ukr-ukr 831 1921