MaLA-500: Massive Language Adaptation of Large Language Models
Abstract
Large language models (LLMs) have advanced the state of the art in natural language processing. However, their predominant design for English or a limited set of languages creates a substantial gap in their effectiveness for low-resource languages. To bridge this gap, we introduce MaLA-500, a novel large language model designed to cover an extensive range of 534 languages. To train MaLA-500, we employ vocabulary extension and continued pretraining on LLaMA 2 with Glot500-c. Our intrinsic evaluation demonstrates that MaLA-500 is better at predicting the given texts of low-resource languages than existing multilingual LLMs. Moreover, the extrinsic evaluation of in-context learning shows that MaLA-500 outperforms previous LLMs on SIB200 and Taxi1500 by a significant margin, i.e., 11.68% and 4.82% marco-average accuracy across languages. We release MaLA-500 at https://huggingface.co/MaLA-LM.
1 Introduction
Large Language Models (LLMs), e.g., LLaMA (Touvron et al., 2023a; b), Mistral (Jiang et al., 2023; 2024), and ChatGPT,111https://openai.com/blog/chatgpt have shown remarkable performance in natural language understanding and generation. Follow-up studies (Bang et al., 2023; Lai et al., 2023; Ahuja et al., 2023a; b) observe that these English-centric LLMs, such as LLaMA with mainly English as the training data, are capable of handling some high-resource non-English languages, benefiting from the inclusion of non-English language data during pretraining. However, their applicability to low-resource languages is still limited due to data scarcity.
Previous studies have released pretrained multilingual models with mostly encoder-only transformer architectures, e.g., multilingual BERT (Devlin et al., 2019) and XLM-R (Conneau et al., 2020), for around 100 languages. The paradigm shift from encoder-only to decoder-only achieves scalability for large language models with billions of model parameters, leading to the development of open multilingual models. Recently, several generative multilingual LLMs, such as XGLM (Lin et al., 2021), mGPT (Shliazhko et al., 2022), and BLOOM (Scao et al., 2022), have emerged. Notably, the current language coverage for these generative LLMs is limited to up to 60 languages, highlighting the remaining need for further work on massively multilingual LLMs for many natural languages.
ImaniGooghari et al. (2023) have achieved a significant milestone in the realm of massive language adaptation by extending the language coverage of a small-scale multilingual language model, XLM-R (Conneau et al., 2020) - an auto-encoding model with 278M parameters, from 100 languages to an impressive number of 534 languages, and introducing an extended model, Glot500-m with 395M parameters. ImaniGooghari et al. (2023) introduce the Glot500-c corpora spanning 534 languages from 47 language families, and then apply vocabulary extension and continued pretraining to create Glot500-m. The introduction of Glot500-c mitigates the challenge of data scarcity for low-resource languages. Moreover, the adaptation method is more favorable than training from scratch, as it requires fewer computational resources and emits a smaller carbon footprint. This success serves as a strong motivation for our exploration into the massive language adaptation of LLMs.
This work aims to extend the capabilities of LLMs to encompass a wider range of languages. Existing works like ImaniGooghari et al. (2023) on language adaptation of pretrained models provide extended coverage across a wide linguistic spectrum but are limited to relatively small model sizes - mostly at the hundred million scales, while other works like Yong et al. (2022) extended generative LLMs but are limited to a small number of languages. Our study pushes the boundaries by exploring language adaptation techniques for LLMs with model parameters scaling up to 10 billion for 534 languages. Our investigation delves into generative LLMs with a substantial increase in model parameters and their in-context learning capabilities in diverse languages, especially low-resource languages. This augmentation enables us to enhance contextual and linguistic relevance across a diverse range of languages.
We address the challenges of adapting LLMs to low-resource languages, such as data sparsity, domain-specific vocabulary, and linguistic diversity. Specifically, we study continued pretraining of open LLM, i.e., LLaMA 2 (Touvron et al., 2023b), vocabulary extension, and adaptation techniques, i.e., LoRA low-rank reparameterization (Hu et al., 2022). We deploy distributed training and release MaLA-500 that covers more than 500 languages in various domains. We evaluate MaLA-500 using intrinsic measures on held-out Glot500-c test set and parallel data and extrinsic metrics on downstream benchmarks: SIB200 and Taxi1500. The results show that MaLA-500 outperforms existing open LLMs of close or slightly larger model size. This work broadens the accessibility of LLMs, making them valuable for a more diverse set of language-specific use cases, especially for low-resource ones, and addressing the equality issue by removing language barriers for speakers of many languages, especially those underrepresented languages covered by existing LLMs.
2 Massive Language Adaptation
The principle of massive language adaptation of large language models accommodates the utilization of a massively multilingual corpus (Section 2.1), the strong base LLM (Section 2.2), and the technique for effective language adaptation: vocabulary extension (Section 2.3) and continued pretraining (Section 2.4).
2.1 Data
We use Glot500-c (ImaniGooghari et al., 2023) covering 534 languages222We define languages using the ISO 639-3 code combined with the corresponding written script. For example, “eng_Latn” represents English written in the Latin script. as the training data of MaLA-500. See §A for the list of languages with their data amounts. The original number of sentences ranges from 10 thousand to 63 million. Note that Glot500-c does not put full effort into collecting data for high-resource languages but focuses on low-resource languages. We sample languages from the imbalanced dataset according to a multinomial distribution, with for vocabulary extension and continued pretraining. We use different scales for sampling data to be used in model training and vocabulary construction. After sampling, the number of sentences for training ranges from 600 thousand to 8 million per language, leading to 1 billion sentences in total. The number of sentences for vocabulary construction ranges from 30 thousand to 400 thousand, making a total of 50 million sentences.
2.2 Model
We choose LLaMA 2 (Touvron et al., 2023b) to start continual training. LLaMA series models (Touvron et al., 2023a), with model weights released publicly, have gained popularity in the research community. Despite being English-centric compared to their multilingual counterparts, they have shown remarkable capacity for multiple languages (Ahuja et al., 2023b). We choose the latest LLaMA 2, trained on 2 trillion tokens, as our base model to benefit from its outstanding language capacity. Our study chooses the 7B model with 32 transformer layers, and leaves the extension of LLMs with larger sizes as a future work.
2.3 Vocabulary Extension
The original LLaMA 2’s 32,000 tokenizer covers English and a small fraction of other European languages using Latin or Cyrillic scripts. To enhance its capability and encoding efficiency for a broader range of languages, we extend the vocabulary with Glot500-c. Specifically, we initially train a multilingual tokenizer with SentencePiece (Kudo & Richardson, 2018) on the sampled Glot500-c with a vocabulary of 250,000. Subsequently, we merge the trained tokenizer with the original LLaMA 2 tokenizer by taking the union of their vocabularies. As a result, we obtain the MaLA-500’s tokenizer with a vocabulary size of 260,164. After vocabulary extension and resizing the embedding layer, the model size becomes 8.6B.
We measure the impact of vocabulary extension on the development set of Glot500-c by analyzing the reduction in segmentation length for each language. The results indicate that the effect of vocabulary extension varies, ranging from 8% (English, eng_Latn) to 88% (Oriya, ori_Orya). Unsurprisingly, vocabulary extension has a larger effect on languages written in non-Latin scripts than on those in the Latin script. However, for some low-resource languages written in the Latin script, e.g., Kabiyè (kbp_Latn) and Vietnamese (vie_Latn), the segmentation length is shortened by around 50%.
2.4 Continued Pretraining
We employ continued pretraining for language adaptation with low-rank adaptation (LoRA, Hu et al., 2022) to enable parameter-efficient training, given the limitation of our computing resources. LoRA injects trainable rank decomposition matrices, which approximate the large weight matrices with a lower rank, to the pretrained model weights. It reduces the computational complexity and thus saves the training cost while retaining high model quality (Hu et al., 2022). We continually train the casual language model to update the rank-decomposition matrices, embedding layer, and language modeling head while freezing the transformer weights of pretrained models, allowing the continually trained language model to learn from new data in new languages without completely losing its previous language capacity. Continual training of large language models requires substantial computational resources. We adopt efficient distributed training setups on supercomputers to make the training process feasible.
2.5 Training
Hardware and Software
We train our model on the computing cluster with the theoretical peak performance of 2 petaflops on GPU nodes. We deploy distributed training on 24 Nvidia Ampere A100 GPUs. As for software, we utilize the Huggingface Transformers (Wolf et al., 2020), PEFT (Parameter-Efficient Fine-Tuning),333https://huggingface.co/docs/peft/index and DeepSpeed (Rasley et al., 2020). We use the ZeRO redundancy optimizer (Rajbhandari et al., 2020) and maximize the batch size that fits the memory of each GPU. We employ mixed-precision training using the bfloat16 format.
Hyperparameters
The learning rate is set at 3e-4. A weight decay of 0.01 is applied to penalize large weights and mitigate overfitting. The trainable LoRA module targets the query and value matrices. The language model head is not decomposed by a LoRA module but is trained in a full-parameter manner. In our setting, the final model has 10B parameters in total, in which 2B parameters are trainable. The LoRA module is incorporated with a rank of 8, an alpha value of 32, and a dropout rate of 0.05, contributing to the model’s adaptability and regularization during training. The context window is 4k. We maximize the batch size to fit the memory, making a global batch size of 384. The model undergoes three training epochs. Checkpoints are saved every 500 steps, and we employ early stopping to select the checkpoint that exhibits the most favorable average performance on downstream tasks.
Environmental Impacts
We train our model on a carbon-neutral data center, with all electricity generated with renewable hydropower, and the waste heat is utilized in district heating to further reduce CO2 footprint.444https://www.csc.fi/sustainable-development
3 Evaluation
3.1 Benchmarks and Setup
We consider both intrinsic and extrinsic measures for evaluation. Evaluation dataset statistics are shown in Table 1.
Datasets | Metric | Data | Lang | Domain | |
Intrinsic | Glot500-c test (ImaniGooghari et al., 2023) | 1000 | 534 | Misc | |
PBC (Mayer & Cysouw, 2014) | 500 | 370 | Bible | ||
Extrinsic | SIB200 (Adelani et al., 2023) | ACC | 204 | 177 | Misc |
Taxi1500 (Ma et al., 2023) | ACC | 111 | 351 | Bible |
For intrinsic evaluation, perplexity is not comparable across models and languages due to different text segmentations. Inspired by Xue et al. (2022); Yu et al. (2023), we instead measure the negative log-likelihood () of the text using the given LLMs.
We concatenate the dataset as the input text and adopt the sliding-window strategy.555https://huggingface.co/docs/transformers/en/perplexity The evaluation of different LLMs uses the same data with the concatenation of sentences per language, thus making model-comparable. In addition, we consider language-comparable by measuring on parallel data, in which every sample in different languages contains the same semantic information. We report the model-comparable of Glot500-c test set covering all 534 considered languages (§3.2), and language-comparable on Parallel Bible Corpus (PBC, Mayer & Cysouw, 2014), covering 370 languages (§3.3).
For extrinsic evaluation, we evaluate the few-shot learning capability of MaLA-500 and compare it with other LLMs on SIB200 (Adelani et al., 2023) and Taxi1500 (Ma et al., 2023).
SIB200 is a topic classification dataset. The classification task involves seven classes, namely science/technology, travel, politics, sports, health, entertainment, and geography. Our evaluation spans a diverse set of 177 languages, obtained by intersecting the language sets of SIB200 and Glot500-c. Note that the flores200-based SIB200 evaluation set is included in the training data since Glot500-c includes flores200, but the classification labels are not provided.
Taxi1500 is another text classification dataset spanning 351 languages. It involves six classes, namely, Recommendation, Faith, Description, Sin, Grace, and Violence. Our evaluation efforts aim to cover as many languages as possible. However, the evaluation of massively multilingual language models is a challenging task. Due to the lack of real-world multilingual evaluation benchmarks, we use this benchmark that contains religious content.
For in-context learning evaluation, the evaluated LLM receives a structured prompt, which is the concatenation of few-shot examples and the sample intended for prediction. The format for both a few-shot example and the sample to predict is defined as follows:
Template for SIB200:
The topic of the news [sent] is [label]
Template for Taxi1500:
The topic of the verse [sent] is [label]
where [sent] is the sentence for classification, and [label] is the ground truth. [label] is included when the sample serves as a few-shot example but is omitted when predicting the sample. The constructed prompt is then used as input to the LLM. Subsequently, the evaluated LLM is prompted to estimate the probability of the label over the label set based on the provided prompt.
For SIB200, few-shots examples are randomly sampled from the in-language training sets. Since randomly selecting few-shot examples for in-context learning yields random results for both MaLA-500 and previous LLMs on Taxi1500, we consider the retriever-based in-context learning (Liu et al., 2022). Specifically, we use average word embeddings in layer 8 of the Glot500 (ImaniGooghari et al., 2023) for retrieving semantic-similar samples as suggested in previous work (Sabet et al., 2020) for all the compared models. The evaluation process is implemented using the lm-evaluation-harness,666https://github.com/EleutherAI/lm-evaluation-harness and we use accuracy (ACC) to measure the performance of classification.
3.2 Comparison across LLMs
We compare MaLA-500 with LLaMA 2-7B, mGPT-13B, BLOOM-7B1, and XGLM-7.5B on Glot500-c test set, SIB200, Taxi1500 by computing the averaged performance across languages, and the result are given in Table 2. Among the evaluated LLMs, LLaMA 2-7B performs second-best, indicating that LLaMA 2-7B has a strong multilingual capacity and that it is reasonable to select it as the base model. MaLA-500 outperforms all compared LLMs with a close or slightly larger model size across all the evaluated tasks. Notably, compared to LLaMA 2-7B, MaLA-500 gains a lower on the Glot500-c test set by 39.33, and has 14.94% and 4.82% improvements on SIB200 and Taxi1500, respectively. It highlights MaLA-500’s substantial contribution to enhancing the multilingual capacity of LLMs.
Model | Glot500-c test ( ) | SIB200 (ACC ) | Taxi1500 (ACC ) |
LLaMA 2-7B | 190.58 | 42.08 | 44.07 |
mGPT-13B | 282.46 | 45.34 | 40.98 |
BLOOM-7B1 | 202.95 | 44.63 | 43.98 |
XGLM-7.5B | 205.07 | 34.36 | 43.24 |
MaLA-500 | 151.25 | 57.02 | 48.89 |
Figures 1, 2 and 3 provide detailed performance analysis across languages on Glot500-c test, SIB200, and Taxi1500. In those figures, we group scores into different performance bins and display them in different colors. For Glot500-c test, MaLA-500 has more languages achieving better , i.e., 61 languages with less than 100 and 171 languages with between 100 and 150. Besides, MaLA-500 has 54 (10%) languages achieving larger than 200, which may indicate the languages are not well covered by the measured LLM. Nevertheless, the number is much less than other LLMs. For example, the second-best LLM, LLaMA 2-7B, has 231 (43%) languages achieving larger than 200. For both SIB200 and Taxi1500, MaLA-500 surpasses previous LLMs in the sense that it obtains random results in fewer languages and achieves impressive performance in more languages than its counterparts.
3.3 Comparison across Languages
To check in detail how MaLA-500 performs across languages, we check the performance across language families777We assign languages to families based on Glottolog: https://glottolog.org/glottolog/family. shown in Table 3. We observe that more high-resource language families, e.g., Indo-European (indo1319) and Dravidian (drav1251), achieve slightly better performance than low-resource language families, e.g., Sino-Tibetan (sino1245).
family | Sent | PBC ( ) | SIB200 (ACC ) | Taxi1500 (ACC ) |
indo1319 | 988M | 145.35 | 63.53 | 53.03 |
drav1251 | 135M | 131.29 | 56.25 | 54.65 |
aust1307 | 113M | 147.37 | 62.83 | 49.69 |
turk1311 | 109M | 161.71 | 57.08 | 52.55 |
afro1255 | 100M | 165.46 | 52.00 | 43.74 |
atla1278 | 57M | 141.92 | 42.90 | 45.52 |
ural1272 | 50M | 137.52 | 66.67 | 48.58 |
sino1245 | 29M | 155.64 | 49.30 | 49.31 |
other | 60M | 167.69 | 55.74 | 46.67 |
In Table 4, we present a comprehensive analysis of the top 5 performance improvements and declines across languages on SIB200 from MaLA-500 compared to LLaMA 2-7B. We observe that MaLA-500 has substantial improvements on low-resource scripts, e.g., Kannada (kan_Knda), while has worse performance on high-resource languages, e.g., Swedish (swe_Latn), which have been well covered by LLaMA 2-7B.
high end | low end | ||||||
Language | LLaMA 2-7B | MaLA-500 | Language | LLaMA 2-7B | MaLA-500 | ||
kan_Knda | 17.16 | 57.35 | 40.19 | swe_Latn | 71.08 | 60.29 | -10.79 |
ckb_Arab | 19.61 | 60.29 | 40.68 | rus_Cyrl | 71.57 | 65.20 | -06.37 |
asm_Beng | 17.16 | 58.82 | 41.66 | dan_Latn | 69.12 | 63.24 | -05.88 |
pan_Guru | 14.22 | 58.82 | 44.60 | pol_Latn | 74.51 | 68.63 | -05.88 |
sin_Sinh | 15.20 | 60.29 | 45.09 | ukr_Cyrl | 71.57 | 65.69 | -05.88 |
In our comprehensive analysis of contributing factors on SIB200, we note that the corpus size of a language exhibits a weak correlation of 0.13 with its performance gain. In contrast, the corpus size of the language family to which a language belongs demonstrates a moderate correlation of 0.40. A moderately high Pearson correlation of 0.53 is observed between the effect of vocabulary extension, i.e., the reduction in segmentation length, and the performance gain. This observation holds true for languages with both non-Latin scripts, such as Kannada (kan_Knda), Malayalam (mal_Mlym), and Tigrinya (tir_Ethi), as well as Latin scripts, such as Igbo (ibo_Latn) and Yoruba (yor_Latn). It demonstrates the effectiveness of vocabulary extension.
3.4 Effect of Number of Shots
Figure 4 illustrates the relationship between accuracy and the number of in-context examples (i.e., shots) on SIB200. As the number of in-context shots increases, there is a corresponding rise in accuracy. Notably, with just 1-shot, accuracy exhibits randomness at 30.88%, indicating 1-shot provides limited information for task learning. The transition from 1 shot to 2 shots/3 shots results in a notable improvement, with performances boosted by 19.83% and 26.14%, respectively. This highlights the effectiveness of increasing the number of shots. MaLA-500 achieves its peak performance at approximately 65% accuracy with 6-10 in-context shots. This may be attributed to the multi-class nature of the SIB200 dataset, necessitating more shots for learning intricate input-label mappings.
In Figure 5, a more nuanced portrayal of results aligns with the observations made in Figure 4. In the realm of 1-shot in-context learning, approximately 50 languages exhibit erratic results. As the number of shots increases, there is a reduction in the number of languages achieving low accuracy (25-50%), coupled with a growing cohort achieving high accuracy (75-100%).
Further examination into individual language trends reveals that some low-resource languages require more shots to achieve better performance (e.g., pes_Arab for Persian) or even exhibit poor performance with 10 shots (e.g., dzo_Tibt for Dzongkha and ayr_Latn for Central Aymara). In contrast, high-resource languages, such as fra_Latn for French, demonstrate impressive performance even with fewer shots, and increasing the number of shots results in only marginal improvement.
4 Related Work
4.1 Multilingual Language Models
Language model development has endeavored to broaden the scope of pretraining languages to address multilingual scenarios. Pretrained multilingual models have been able to accommodate up to a hundred or more languages. Noteworthy examples include mBERT Devlin et al. (2019), which supports 104 languages, XLM-R (Conneau et al., 2020) covering 100 languages, mBART (Liu et al., 2020) designed for 25 languages, mT5 (Xue et al., 2021) spanning 101 languages, XGLM (Lin et al., 2021) across 30 languages, GPT-3 covering 118 languages (93% English), mGPT (Shliazhko et al., 2022) accommodating 60 languages, and BLOOM (Scao et al., 2022) supporting 46 languages and 13 programming languages.
Surprisingly, two recent multilingual language models have surpassed the conventional limit by supporting more than 400 languages. Glot500-m (ImaniGooghari et al., 2023) spans 511 languages through vocabulary extension and continued training based on XLM-R. SERENGETI (Adebara et al., 2022) goes even further by supporting 517 African languages and language varieties, written in five different scripts, employing models inspired by both ELECTRA (Clark et al., 2020) and XLM-R. MADLAD (Kudugunta et al., 2023) covers 419 languages and trains an 8B language model from scratch with an adapted UL2 objective (Tay et al., 2022). Our work is concurrent with the MADLAD-400 language model. We distinguish it by: 1) language coverage. Our work covered more than 500 languages, a number comparable to that of encoder-only models and surpassing MADLAD-400 by an additional 100 languages. 2) training methods. We consider continual training to benefit from the learned knowledge of the original models. 3) model architecture. We adopt an open model architecture, i.e., LLaMA, while MADLAD uses decoder-only T5 architecture, which has not been supported by the HuggingFace ecosystem at the time of writing, thus leading to additional difficulty in usage.
4.2 Language Adaptation
Before the advent of LLMs, diverse approaches are employed to adapt small-scale multilingual language models to new languages. These methods include using adapters (Pfeiffer et al., 2020; Üstün et al., 2020; Pfeiffer et al., 2020; Nguyen et al., 2021; Faisal & Anastasopoulos, 2022; Yong et al., 2022), vocabulary extension and substitution (Chau et al., 2020; Wang et al., 2020; Müller et al., 2020; 2021; Pfeiffer et al., 2021; Chen et al., 2023; Downey et al., 2023), leveraging monolingual corpora (Ebrahimi & Kann, 2021; Alabi et al., 2022), and utilizing bilingual lexicons (Wang et al., 2022).
While language models have been scaled up notably, their coverage is limited to a specific set of languages. To address this constraint, various methods have been proposed to expand the applicability of these large language models across a broader range of languages, catering to both general-purpose tasks and specific applications like machine translation. These methods also involve vocabulary extension (Cui et al., 2023), continued pretraining and instruction-tuning (Yong et al., 2022; Cui et al., 2023; Chen et al., 2024; Zhao et al., 2024), and parallel corpora exploitation (Cahyawijaya et al., 2023; Yang et al., 2023; Zhu et al., 2023; Xu et al., 2023). Despite these efforts, massive language adaptation of LLMs for general-purpose tasks across diverse languages, e.g., covering many languages families and more than one hundred languages, remains an area yet to be thoroughly explored.
5 Conclusion and Future Work
We present a pioneering effort in massive language adaptation on LLMs, focusing on extending LLaMA 7B to our model, MaLA-500. This adaptation involves vocabulary extension and continued pretraining with LoRA. Our approach leads to MaLA-500 achieving state-of-the-art in-context learning capabilities, as demonstrated on the benchmarks of SIB200 and Taxi1500. We release the training scripts and model weights publicly to facilitate future research. This work marks a substantial advancement in applying LLMs to a diverse range of languages.
Our future work will focus on further improving the model capacity, for example, on machine translation across many language pairs. Alves et al. (2023) showed that LLMs (LLaMA-7B and LLaMA-13B) exhibited poor performance even on English-centric high-resource language pairs in some cases. Translation with LLMs on low-resource languages is more challenging. The LLaMA-7B model performed poorly in our preliminary experiments. Besides, our pretraining corpus does not intentionally include bilingual texts, and our MaLA-500 model is not instruction-tuned with translation data. We leave the inclusion of bilingual text during continual pretraining, instruction fine-tuning with translation data, and the evaluation on machine translation as future works.
Ethical Statement
LLMs have been known to exhibit biases present in their training data. When extending LLMs to low-resource languages, there is a risk of propagating biases from high-resource languages to underrepresented ones. Careful attention must be paid to mitigate bias and ensure fairness in data collection and model training. The paper aims to make LLMs more accessible for underrepresented languages. Still, there is a risk of creating a digital language divide if certain communities are left out due to limited technological access. Future work would address biases by conducting bias audits on the training data, debiasing the models during generation, and continuously monitoring model outputs.
Reproducibility Statement
We make the following efforts to ensure reproducible research. We release the model weights (https://huggingface.co/MaLA-LM) and codes for training and evaluation (https://github.com/MaLA-LM/mala-500). We use publicly available evaluation benchmarks which can be obtained freely or by request. The results are reproducible with our released model weights and evaluation scripts,
Acknowledgements
We thank José Pombal for constructive suggestions on training. This work is funded by The European Research Council (grants #740516, #771113 and #758969), EU’s Horizon Europe Research and Innovation Actions (UTTER, contract 101070631), and the European Union’s Horizon Europe research and innovation programme under grant agreement No 101070350 and from UK Research and Innovation (UKRI) under the UK government’s Horizon Europe funding guarantee [grant #10052546]. The authors wish to acknowledge CSC – IT Center for Science, Finland, for generous computational resources on the Mahti supercomputer and LUMI supercomputer through the LUMI extreme scale access (MOOMIN and LumiNMT). Shaoxiong Ji and Peiqin Lin acknowledge travel support from ELISE (GA no 951847).
References
- Adebara et al. (2022) Ife Adebara, AbdelRahim Elmadany, Muhammad Abdul-Mageed, and Alcides Alcoba Inciarte. SERENGETI: Massively multilingual language models for africa. arXiv preprint arXiv:2212.10785, 2022.
- Adelani et al. (2023) David Ifeoluwa Adelani, Hannah Liu, Xiaoyu Shen, Nikita Vassilyev, Jesujoba O. Alabi, Yanke Mao, Haonan Gao, and En-Shiun Annie Lee. SIB-200: A simple, inclusive, and big evaluation dataset for topic classification in 200+ languages and dialects. CoRR, abs/2309.07445, 2023. doi: 10.48550/arXiv.2309.07445. URL https://doi.org/10.48550/arXiv.2309.07445.
- Ahuja et al. (2023a) Kabir Ahuja, Rishav Hada, Millicent Ochieng, Prachi Jain, Harshita Diddee, Samuel Maina, Tanuja Ganu, Sameer Segal, Maxamed Axmed, Kalika Bali, and Sunayana Sitaram. MEGA: multilingual evaluation of generative AI. CoRR, abs/2303.12528, 2023a. doi: 10.48550/arXiv.2303.12528. URL https://doi.org/10.48550/arXiv.2303.12528.
- Ahuja et al. (2023b) Sanchit Ahuja, Divyanshu Aggarwal, Varun Gumma, Ishaan Watts, Ashutosh Sathe, Millicent Ochieng, Rishav Hada, Prachi Jain, Maxamed Axmed, Kalika Bali, and Sunayana Sitaram. MEGAVERSE: benchmarking large language models across languages, modalities, models and tasks. CoRR, abs/2311.07463, 2023b. doi: 10.48550/ARXIV.2311.07463. URL https://doi.org/10.48550/arXiv.2311.07463.
- Alabi et al. (2022) Jesujoba O. Alabi, David Ifeoluwa Adelani, Marius Mosbach, and Dietrich Klakow. Adapting pre-trained language models to african languages via multilingual adaptive fine-tuning. In Nicoletta Calzolari, Chu-Ren Huang, Hansaem Kim, James Pustejovsky, Leo Wanner, Key-Sun Choi, Pum-Mo Ryu, Hsin-Hsi Chen, Lucia Donatelli, Heng Ji, Sadao Kurohashi, Patrizia Paggio, Nianwen Xue, Seokhwan Kim, Younggyun Hahm, Zhong He, Tony Kyungil Lee, Enrico Santus, Francis Bond, and Seung-Hoon Na (eds.), Proceedings of the 29th International Conference on Computational Linguistics, COLING 2022, Gyeongju, Republic of Korea, October 12-17, 2022, pp. 4336–4349. International Committee on Computational Linguistics, 2022. URL https://aclanthology.org/2022.coling-1.382.
- Alves et al. (2023) Duarte Alves, Nuno Guerreiro, João Alves, José Pombal, Ricardo Rei, José de Souza, Pierre Colombo, and André FT Martins. Steering large language models for machine translation with finetuning and in-context learning. In Findings of the Association for Computational Linguistics: EMNLP 2023, pp. 11127–11148, 2023.
- Bang et al. (2023) Yejin Bang, Samuel Cahyawijaya, Nayeon Lee, Wenliang Dai, Dan Su, Bryan Wilie, Holy Lovenia, Ziwei Ji, Tiezheng Yu, Willy Chung, Quyet V. Do, Yan Xu, and Pascale Fung. A multitask, multilingual, multimodal evaluation of chatgpt on reasoning, hallucination, and interactivity. CoRR, abs/2302.04023, 2023. doi: 10.48550/arXiv.2302.04023. URL https://doi.org/10.48550/arXiv.2302.04023.
- Cahyawijaya et al. (2023) Samuel Cahyawijaya, Holy Lovenia, Tiezheng Yu, Willy Chung, and Pascale Fung. Instruct-align: Teaching novel languages with to LLMs through alignment-based cross-lingual instruction. CoRR, abs/2305.13627, 2023. doi: 10.48550/arXiv.2305.13627. URL https://doi.org/10.48550/arXiv.2305.13627.
- Chau et al. (2020) Ethan C. Chau, Lucy H. Lin, and Noah A. Smith. Parsing with multilingual bert, a small treebank, and a small corpus. In Trevor Cohn, Yulan He, and Yang Liu (eds.), Findings of the Association for Computational Linguistics: EMNLP 2020, Online Event, 16-20 November 2020, volume EMNLP 2020 of Findings of ACL, pp. 1324–1334. Association for Computational Linguistics, 2020. doi: 10.18653/v1/2020.findings-emnlp.118. URL https://doi.org/10.18653/v1/2020.findings-emnlp.118.
- Chen et al. (2024) Pinzhen Chen, Shaoxiong Ji, Nikolay Bogoychev, Barry Haddow, and Kenneth Heafield. Monolingual or multilingual instruction tuning: Which makes a better Alpaca. In Findings of the Association for Computational Linguistics: EACL, 2024. URL https://doi.org/10.48550/arXiv.2309.08958.
- Chen et al. (2023) Yihong Chen, Kelly Marchisio, Roberta Raileanu, David Ifeoluwa Adelani, Pontus Stenetorp, Sebastian Riedel, and Mikel Artetxe. Improving language plasticity via pretraining with active forgetting. CoRR, abs/2307.01163, 2023. doi: 10.48550/arXiv.2307.01163. URL https://doi.org/10.48550/arXiv.2307.01163.
- Clark et al. (2020) Kevin Clark, Minh-Thang Luong, Quoc V. Le, and Christopher D. Manning. ELECTRA: Pre-training text encoders as discriminators rather than generators. In International Conference on Learning Representations, 2020. URL https://openreview.net/forum?id=r1xMH1BtvB.
- Conneau et al. (2020) Alexis Conneau, Kartikay Khandelwal, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Édouard Grave, Myle Ott, Luke Zettlemoyer, and Veselin Stoyanov. Unsupervised cross-lingual representation learning at scale. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 8440–8451, 2020.
- Cui et al. (2023) Yiming Cui, Ziqing Yang, and Xin Yao. Efficient and effective text encoding for Chinese LLaMA and Alpaca. CoRR, abs/2304.08177, 2023. doi: 10.48550/ARXIV.2304.08177. URL https://doi.org/10.48550/arXiv.2304.08177.
- Devlin et al. (2019) Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019.
- Downey et al. (2023) C. M. Downey, Terra Blevins, Nora Goldfine, and Shane Steinert-Threlkeld. Embedding structure matters: Comparing methods to adapt multilingual vocabularies to new languages. CoRR, abs/2309.04679, 2023. doi: 10.48550/arXiv.2309.04679. URL https://doi.org/10.48550/arXiv.2309.04679.
- Ebrahimi & Kann (2021) Abteen Ebrahimi and Katharina Kann. How to adapt your pretrained multilingual model to 1600 languages. In Chengqing Zong, Fei Xia, Wenjie Li, and Roberto Navigli (eds.), Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, ACL/IJCNLP 2021, (Volume 1: Long Papers), Virtual Event, August 1-6, 2021, pp. 4555–4567. Association for Computational Linguistics, 2021. doi: 10.18653/v1/2021.acl-long.351. URL https://doi.org/10.18653/v1/2021.acl-long.351.
- Faisal & Anastasopoulos (2022) Fahim Faisal and Antonios Anastasopoulos. Phylogeny-inspired adaptation of multilingual models to new languages. CoRR, abs/2205.09634, 2022. doi: 10.48550/arXiv.2205.09634. URL https://doi.org/10.48550/arXiv.2205.09634.
- Hu et al. (2022) Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. LoRA: Low-rank adaptation of large language models. In International Conference on Learning Representations, 2022. URL https://openreview.net/forum?id=nZeVKeeFYf9.
- ImaniGooghari et al. (2023) Ayyoob ImaniGooghari, Peiqin Lin, Amir Hossein Kargaran, Silvia Severini, Masoud Jalili Sabet, Nora Kassner, Chunlan Ma, Helmut Schmid, André Martins, François Yvon, and Hinrich Schütze. Glot500: Scaling multilingual corpora and language models to 500 languages. In Anna Rogers, Jordan Boyd-Graber, and Naoaki Okazaki (eds.), Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1082–1117, Toronto, Canada, July 2023. Association for Computational Linguistics. doi: 10.18653/v1/2023.acl-long.61. URL https://aclanthology.org/2023.acl-long.61.
- Jiang et al. (2023) Albert Q. Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bamford, Devendra Singh Chaplot, Diego de Las Casas, Florian Bressand, Gianna Lengyel, Guillaume Lample, Lucile Saulnier, Lélio Renard Lavaud, Marie-Anne Lachaux, Pierre Stock, Teven Le Scao, Thibaut Lavril, Thomas Wang, Timothée Lacroix, and William El Sayed. Mistral 7b. CoRR, abs/2310.06825, 2023. doi: 10.48550/ARXIV.2310.06825. URL https://doi.org/10.48550/arXiv.2310.06825.
- Jiang et al. (2024) Albert Q Jiang, Alexandre Sablayrolles, Antoine Roux, Arthur Mensch, Blanche Savary, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Emma Bou Hanna, Florian Bressand, et al. Mixtral of experts. arXiv preprint arXiv:2401.04088, 2024.
- Kudo & Richardson (2018) Taku Kudo and John Richardson. SentencePiece: A simple and language independent subword tokenizer and detokenizer for neural text processing. In Eduardo Blanco and Wei Lu (eds.), Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, EMNLP 2018: System Demonstrations, Brussels, Belgium, October 31 - November 4, 2018, pp. 66–71. Association for Computational Linguistics, 2018. doi: 10.18653/v1/d18-2012. URL https://doi.org/10.18653/v1/d18-2012.
- Kudugunta et al. (2023) Sneha Kudugunta, Isaac Caswell, Biao Zhang, Xavier Garcia, Christopher A Choquette-Choo, Katherine Lee, Derrick Xin, Aditya Kusupati, Romi Stella, Ankur Bapna, et al. MADLAD-400: A multilingual and document-level large audited dataset. arXiv preprint arXiv:2309.04662, 2023.
- Lai et al. (2023) Viet Dac Lai, Nghia Trung Ngo, Amir Pouran Ben Veyseh, Hieu Man, Franck Dernoncourt, Trung Bui, and Thien Huu Nguyen. ChatGPT beyond English: Towards a comprehensive evaluation of large language models in multilingual learning. CoRR, abs/2304.05613, 2023. doi: 10.48550/arXiv.2304.05613. URL https://doi.org/10.48550/arXiv.2304.05613.
- Lin et al. (2021) Xi Victoria Lin, Todor Mihaylov, Mikel Artetxe, Tianlu Wang, Shuohui Chen, Daniel Simig, Myle Ott, Naman Goyal, Shruti Bhosale, Jingfei Du, Ramakanth Pasunuru, Sam Shleifer, Punit Singh Koura, Vishrav Chaudhary, Brian O’Horo, Jeff Wang, Luke Zettlemoyer, Zornitsa Kozareva, Mona T. Diab, Veselin Stoyanov, and Xian Li. Few-shot learning with multilingual language models. CoRR, abs/2112.10668, 2021. URL https://arxiv.org/abs/2112.10668.
- Liu et al. (2022) Jiachang Liu, Dinghan Shen, Yizhe Zhang, Bill Dolan, Lawrence Carin, and Weizhu Chen. What makes good in-context examples for gpt-3? In Eneko Agirre, Marianna Apidianaki, and Ivan Vulic (eds.), Proceedings of Deep Learning Inside Out: The 3rd Workshop on Knowledge Extraction and Integration for Deep Learning Architectures, DeeLIO@ACL 2022, Dublin, Ireland and Online, May 27, 2022, pp. 100–114. Association for Computational Linguistics, 2022. doi: 10.18653/V1/2022.DEELIO-1.10. URL https://doi.org/10.18653/v1/2022.deelio-1.10.
- Liu et al. (2020) Yinhan Liu, Jiatao Gu, Naman Goyal, Xian Li, Sergey Edunov, Marjan Ghazvininejad, Mike Lewis, and Luke Zettlemoyer. Multilingual denoising pre-training for neural machine translation. Transactions of the Association for Computational Linguistics, 8:726–742, 2020.
- Ma et al. (2023) Chunlan Ma, Ayyoob ImaniGooghari, Haotian Ye, Ehsaneddin Asgari, and Hinrich Schütze. Taxi1500: A multilingual dataset for text classification in 1500 languages, 2023.
- Mayer & Cysouw (2014) Thomas Mayer and Michael Cysouw. Creating a massively parallel bible corpus. In Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Hrafn Loftsson, Bente Maegaard, Joseph Mariani, Asunción Moreno, Jan Odijk, and Stelios Piperidis (eds.), Proceedings of the Ninth International Conference on Language Resources and Evaluation, LREC 2014, Reykjavik, Iceland, May 26-31, 2014, pp. 3158–3163. European Language Resources Association (ELRA), 2014. URL https://www.lrec-conf.org/proceedings/lrec2014/summaries/220.html.
- Müller et al. (2020) Benjamin Müller, Benoît Sagot, and Djamé Seddah. Can multilingual language models transfer to an unseen dialect? A case study on north african arabizi. CoRR, abs/2005.00318, 2020. URL https://arxiv.org/abs/2005.00318.
- Müller et al. (2021) Benjamin Müller, Antonios Anastasopoulos, Benoît Sagot, and Djamé Seddah. When being unseen from mbert is just the beginning: Handling new languages with multilingual language models. In Kristina Toutanova, Anna Rumshisky, Luke Zettlemoyer, Dilek Hakkani-Tür, Iz Beltagy, Steven Bethard, Ryan Cotterell, Tanmoy Chakraborty, and Yichao Zhou (eds.), Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2021, Online, June 6-11, 2021, pp. 448–462. Association for Computational Linguistics, 2021. doi: 10.18653/v1/2021.naacl-main.38. URL https://doi.org/10.18653/v1/2021.naacl-main.38.
- Nguyen et al. (2021) Minh Van Nguyen, Viet Dac Lai, Amir Pouran Ben Veyseh, and Thien Huu Nguyen. Trankit: A light-weight transformer-based toolkit for multilingual natural language processing. In Dimitra Gkatzia and Djamé Seddah (eds.), Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations, EACL 2021, Online, April 19-23, 2021, pp. 80–90. Association for Computational Linguistics, 2021. doi: 10.18653/v1/2021.eacl-demos.10. URL https://doi.org/10.18653/v1/2021.eacl-demos.10.
- Pfeiffer et al. (2020) Jonas Pfeiffer, Ivan Vulic, Iryna Gurevych, and Sebastian Ruder. MAD-X: an adapter-based framework for multi-task cross-lingual transfer. In Bonnie Webber, Trevor Cohn, Yulan He, and Yang Liu (eds.), Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, Online, November 16-20, 2020, pp. 7654–7673. Association for Computational Linguistics, 2020. doi: 10.18653/v1/2020.emnlp-main.617. URL https://doi.org/10.18653/v1/2020.emnlp-main.617.
- Pfeiffer et al. (2021) Jonas Pfeiffer, Ivan Vulic, Iryna Gurevych, and Sebastian Ruder. Unks everywhere: Adapting multilingual language models to new scripts. In Marie-Francine Moens, Xuanjing Huang, Lucia Specia, and Scott Wen-tau Yih (eds.), Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021, Virtual Event / Punta Cana, Dominican Republic, 7-11 November, 2021, pp. 10186–10203. Association for Computational Linguistics, 2021. doi: 10.18653/v1/2021.emnlp-main.800. URL https://doi.org/10.18653/v1/2021.emnlp-main.800.
- Rajbhandari et al. (2020) Samyam Rajbhandari, Jeff Rasley, Olatunji Ruwase, and Yuxiong He. ZeRO: Memory optimizations toward training trillion parameter models. In SC20: International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 1–16. IEEE, 2020.
- Rasley et al. (2020) Jeff Rasley, Samyam Rajbhandari, Olatunji Ruwase, and Yuxiong He. DeepSpeed: System optimizations enable training deep learning models with over 100 billion parameters. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 3505–3506, 2020.
- Sabet et al. (2020) Masoud Jalili Sabet, Philipp Dufter, François Yvon, and Hinrich Schütze. Simalign: High quality word alignments without parallel training data using static and contextualized embeddings. In Trevor Cohn, Yulan He, and Yang Liu (eds.), Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: Findings, EMNLP 2020, Online Event, 16-20 November 2020, volume EMNLP 2020 of Findings of ACL, pp. 1627–1643. Association for Computational Linguistics, 2020. doi: 10.18653/v1/2020.findings-emnlp.147. URL https://doi.org/10.18653/v1/2020.findings-emnlp.147.
- Scao et al. (2022) Teven Le Scao, Angela Fan, Christopher Akiki, Ellie Pavlick, Suzana Ilić, Daniel Hesslow, Roman Castagné, Alexandra Sasha Luccioni, François Yvon, Matthias Gallé, et al. BLOOM: A 176b-parameter open-access multilingual language model. arXiv preprint arXiv:2211.05100, 2022.
- Shliazhko et al. (2022) Oleh Shliazhko, Alena Fenogenova, Maria Tikhonova, Vladislav Mikhailov, Anastasia Kozlova, and Tatiana Shavrina. mGPT: Few-shot learners go multilingual. CoRR, abs/2204.07580, 2022. doi: 10.48550/arXiv.2204.07580. URL https://doi.org/10.48550/arXiv.2204.07580.
- Tay et al. (2022) Yi Tay, Mostafa Dehghani, Vinh Q Tran, Xavier Garcia, Jason Wei, Xuezhi Wang, Hyung Won Chung, Dara Bahri, Tal Schuster, Steven Zheng, et al. UL2: Unifying language learning paradigms. In The Eleventh International Conference on Learning Representations, 2022.
- Touvron et al. (2023a) Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, Aurélien Rodriguez, Armand Joulin, Edouard Grave, and Guillaume Lample. LLaMA: Open and efficient foundation language models. CoRR, abs/2302.13971, 2023a. doi: 10.48550/arXiv.2302.13971. URL https://doi.org/10.48550/arXiv.2302.13971.
- Touvron et al. (2023b) Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhosale, Dan Bikel, Lukas Blecher, Cristian Canton-Ferrer, Moya Chen, Guillem Cucurull, David Esiobu, Jude Fernandes, Jeremy Fu, Wenyin Fu, Brian Fuller, Cynthia Gao, Vedanuj Goswami, Naman Goyal, Anthony Hartshorn, Saghar Hosseini, Rui Hou, Hakan Inan, Marcin Kardas, Viktor Kerkez, Madian Khabsa, Isabel Kloumann, Artem Korenev, Punit Singh Koura, Marie-Anne Lachaux, Thibaut Lavril, Jenya Lee, Diana Liskovich, Yinghai Lu, Yuning Mao, Xavier Martinet, Todor Mihaylov, Pushkar Mishra, Igor Molybog, Yixin Nie, Andrew Poulton, Jeremy Reizenstein, Rashi Rungta, Kalyan Saladi, Alan Schelten, Ruan Silva, Eric Michael Smith, Ranjan Subramanian, Xiaoqing Ellen Tan, Binh Tang, Ross Taylor, Adina Williams, Jian Xiang Kuan, Puxin Xu, Zheng Yan, Iliyan Zarov, Yuchen Zhang, Angela Fan, Melanie Kambadur, Sharan Narang, Aurélien Rodriguez, Robert Stojnic, Sergey Edunov, and Thomas Scialom. Llama 2: Open foundation and fine-tuned chat models. CoRR, abs/2307.09288, 2023b. doi: 10.48550/arXiv.2307.09288. URL https://doi.org/10.48550/arXiv.2307.09288.
- Üstün et al. (2020) Ahmet Üstün, Arianna Bisazza, Gosse Bouma, and Gertjan van Noord. Udapter: Language adaptation for truly universal dependency parsing. In Bonnie Webber, Trevor Cohn, Yulan He, and Yang Liu (eds.), Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, Online, November 16-20, 2020, pp. 2302–2315. Association for Computational Linguistics, 2020. doi: 10.18653/v1/2020.emnlp-main.180. URL https://doi.org/10.18653/v1/2020.emnlp-main.180.
- Wang et al. (2022) Xinyi Wang, Sebastian Ruder, and Graham Neubig. Expanding pretrained models to thousands more languages via lexicon-based adaptation. In Smaranda Muresan, Preslav Nakov, and Aline Villavicencio (eds.), Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2022, Dublin, Ireland, May 22-27, 2022, pp. 863–877. Association for Computational Linguistics, 2022. URL https://aclanthology.org/2022.acl-long.61.
- Wang et al. (2020) Zihan Wang, Karthikeyan K, Stephen Mayhew, and Dan Roth. Extending multilingual BERT to low-resource languages. In Trevor Cohn, Yulan He, and Yang Liu (eds.), Findings of the Association for Computational Linguistics: EMNLP 2020, Online Event, 16-20 November 2020, volume EMNLP 2020 of Findings of ACL, pp. 2649–2656. Association for Computational Linguistics, 2020. doi: 10.18653/v1/2020.findings-emnlp.240. URL https://doi.org/10.18653/v1/2020.findings-emnlp.240.
- Wolf et al. (2020) Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, Rémi Louf, Morgan Funtowicz, et al. Transformers: State-of-the-art natural language processing. In Proceedings of the 2020 conference on empirical methods in natural language processing: system demonstrations, pp. 38–45, 2020.
- Xu et al. (2023) Haoran Xu, Young Jin Kim, Amr Sharaf, and Hany Hassan Awadalla. A paradigm shift in machine translation: Boosting translation performance of large language models. CoRR, abs/2309.11674, 2023. doi: 10.48550/ARXIV.2309.11674. URL https://doi.org/10.48550/arXiv.2309.11674.
- Xue et al. (2021) Linting Xue, Noah Constant, Adam Roberts, Mihir Kale, Rami Al-Rfou, Aditya Siddhant, Aditya Barua, and Colin Raffel. mT5: A massively multilingual pre-trained text-to-text transformer. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 483–498, 2021.
- Xue et al. (2022) Linting Xue, Aditya Barua, Noah Constant, Rami Al-Rfou, Sharan Narang, Mihir Kale, Adam Roberts, and Colin Raffel. Byt5: Towards a token-free future with pre-trained byte-to-byte models. Trans. Assoc. Comput. Linguistics, 10:291–306, 2022. doi: 10.1162/tacl“˙a“˙00461. URL https://doi.org/10.1162/tacl_a_00461.
- Yang et al. (2023) Wen Yang, Chong Li, Jiajun Zhang, and Chengqing Zong. Bigtrans: Augmenting large language models with multilingual translation capability over 100 languages. CoRR, abs/2305.18098, 2023. doi: 10.48550/arXiv.2305.18098. URL https://doi.org/10.48550/arXiv.2305.18098.
- Yong et al. (2022) Zheng Xin Yong, Hailey Schoelkopf, Niklas Muennighoff, Alham Fikri Aji, David Ifeoluwa Adelani, Khalid Almubarak, M. Saiful Bari, Lintang Sutawika, Jungo Kasai, Ahmed Baruwa, Genta Indra Winata, Stella Biderman, Dragomir Radev, and Vassilina Nikoulina. BLOOM+1: adding language support to BLOOM for zero-shot prompting. CoRR, abs/2212.09535, 2022. doi: 10.48550/arXiv.2212.09535. URL https://doi.org/10.48550/arXiv.2212.09535.
- Yu et al. (2023) Lili Yu, Daniel Simig, Colin Flaherty, Armen Aghajanyan, Luke Zettlemoyer, and Mike Lewis. MEGABYTE: predicting million-byte sequences with multiscale transformers. CoRR, abs/2305.07185, 2023. doi: 10.48550/arXiv.2305.07185. URL https://doi.org/10.48550/arXiv.2305.07185.
- Zhao et al. (2024) Jun Zhao, Zhihao Zhang, Qi Zhang, Tao Gui, and Xuanjing Huang. LLaMA beyond English: An empirical study on language capability transfer. arXiv preprint arXiv:2401.01055, 2024.
- Zhu et al. (2023) Wenhao Zhu, Yunzhe Lv, Qingxiu Dong, Fei Yuan, Jingjing Xu, Shujian Huang, Lingpeng Kong, Jiajun Chen, and Lei Li. Extrapolating large language models to non-english by aligning languages. CoRR, abs/2308.04948, 2023. doi: 10.48550/arXiv.2308.04948. URL https://doi.org/10.48550/arXiv.2308.04948.
Appendix A Languages
The list of languages of Glot500-c used to train MaLA-500 with the number of available sentences and language family information for each language is available in Tables 5, 6 and 7.
Lang | Family | Lang | Family | Lang | Family | |||
hbs_Latn | 63411156 | indo1319 | hin_Deva | 7046700 | indo1319 | ton_Latn | 1216118 | aust1307 |
mal_Mlym | 48098273 | drav1251 | kor_Hang | 6468444 | kore1284 | tah_Latn | 1190747 | aust1307 |
aze_Latn | 46300705 | ory_Orya | 6266475 | indo1319 | lat_Latn | 1179913 | indo1319 | |
guj_Gujr | 45738685 | indo1319 | urd_Arab | 6009594 | indo1319 | srn_Latn | 1172349 | indo1319 |
ben_Beng | 43514870 | indo1319 | swa_Latn | 5989369 | ewe_Latn | 1161605 | atla1278 | |
kan_Knda | 41836495 | drav1251 | sqi_Latn | 5526836 | indo1319 | bem_Latn | 1111969 | atla1278 |
tel_Telu | 41580525 | drav1251 | bel_Cyrl | 5319675 | indo1319 | efi_Latn | 1082621 | atla1278 |
mlt_Latn | 40654838 | afro1255 | afr_Latn | 5157787 | indo1319 | bis_Latn | 1070170 | indo1319 |
fra_Latn | 39197581 | indo1319 | nno_Latn | 4899103 | indo1319 | orm_Latn | 1067699 | |
spa_Latn | 37286756 | indo1319 | tat_Cyrl | 4708088 | turk1311 | haw_Latn | 1062491 | aust1307 |
eng_Latn | 36122761 | indo1319 | ast_Latn | 4683554 | indo1319 | hmo_Latn | 1033636 | pidg1258 |
fil_Latn | 33493255 | aust1307 | mon_Cyrl | 4616960 | mong1349 | kat_Geor | 1004297 | kart1248 |
nob_Latn | 32869205 | indo1319 | hbs_Cyrl | 4598073 | indo1319 | pag_Latn | 983637 | aust1307 |
rus_Cyrl | 31787973 | indo1319 | hau_Latn | 4368483 | afro1255 | loz_Latn | 964418 | atla1278 |
deu_Latn | 31015993 | indo1319 | sna_Latn | 4019596 | atla1278 | fry_Latn | 957422 | indo1319 |
tur_Latn | 29184662 | turk1311 | msa_Latn | 3929084 | mya_Mymr | 945180 | sino1245 | |
pan_Guru | 29052537 | indo1319 | som_Latn | 3916769 | afro1255 | nds_Latn | 944715 | indo1319 |
mar_Deva | 28748897 | indo1319 | srp_Cyrl | 3864091 | indo1319 | run_Latn | 943828 | atla1278 |
por_Latn | 27824391 | indo1319 | mlg_Latn | 3715802 | pnb_Arab | 899895 | indo1319 | |
nld_Latn | 25061426 | indo1319 | zul_Latn | 3580113 | atla1278 | rar_Latn | 894515 | aust1307 |
ara_Arab | 24524122 | arz_Arab | 3488224 | afro1255 | fij_Latn | 887134 | aust1307 | |
zho_Hani | 24143786 | nya_Latn | 3409030 | atla1278 | wls_Latn | 882167 | aust1307 | |
ita_Latn | 23539857 | indo1319 | tam_Taml | 3388255 | drav1251 | ckb_Arab | 874441 | indo1319 |
ind_Latn | 23018106 | aust1307 | hat_Latn | 3226932 | indo1319 | ven_Latn | 860249 | atla1278 |
ell_Grek | 22033282 | indo1319 | uzb_Latn | 3223485 | turk1311 | zsm_Latn | 859947 | aust1307 |
bul_Cyrl | 21823004 | indo1319 | sot_Latn | 3205510 | atla1278 | chv_Cyrl | 859863 | turk1311 |
swe_Latn | 20725883 | indo1319 | uzb_Cyrl | 3029947 | turk1311 | lua_Latn | 854359 | atla1278 |
ces_Latn | 20376340 | indo1319 | cos_Latn | 3015055 | indo1319 | que_Latn | 838486 | |
isl_Latn | 19547941 | indo1319 | als_Latn | 2954874 | indo1319 | sag_Latn | 771048 | atla1278 |
pol_Latn | 19339945 | indo1319 | amh_Ethi | 2862985 | afro1255 | guw_Latn | 767918 | atla1278 |
ron_Latn | 19190217 | indo1319 | sun_Latn | 2586011 | aust1307 | bre_Latn | 748954 | indo1319 |
dan_Latn | 19174573 | indo1319 | war_Latn | 2584810 | aust1307 | toi_Latn | 745385 | atla1278 |
hun_Latn | 18800025 | ural1272 | div_Thaa | 2418687 | indo1319 | pus_Arab | 731992 | indo1319 |
tgk_Cyrl | 18659517 | indo1319 | yor_Latn | 2392359 | atla1278 | che_Cyrl | 728201 | nakh1245 |
srp_Latn | 18371769 | indo1319 | fao_Latn | 2365271 | indo1319 | pis_Latn | 714783 | indo1319 |
fas_Arab | 18277593 | uzn_Cyrl | 2293672 | turk1311 | kon_Latn | 685194 | ||
ceb_Latn | 18149215 | aust1307 | smo_Latn | 2290439 | aust1307 | oss_Cyrl | 683517 | indo1319 |
heb_Hebr | 18128962 | afro1255 | bak_Cyrl | 2264196 | turk1311 | hyw_Armn | 679819 | indo1319 |
hrv_Latn | 17882932 | indo1319 | ilo_Latn | 2106531 | aust1307 | iso_Latn | 658789 | atla1278 |
glg_Latn | 17852274 | indo1319 | tso_Latn | 2100708 | atla1278 | nan_Latn | 656389 | sino1245 |
fin_Latn | 16730388 | ural1272 | mri_Latn | 2046850 | aust1307 | lub_Latn | 654390 | atla1278 |
slv_Latn | 15719210 | indo1319 | hmn_Latn | 1903898 | lim_Latn | 652078 | indo1319 | |
vie_Latn | 15697827 | aust1305 | asm_Beng | 1882353 | indo1319 | tuk_Latn | 649411 | turk1311 |
mkd_Cyrl | 14717004 | indo1319 | hil_Latn | 1798875 | aust1307 | tir_Ethi | 649117 | afro1255 |
slk_Latn | 14633631 | indo1319 | nso_Latn | 1619354 | atla1278 | tgk_Latn | 636541 | indo1319 |
nor_Latn | 14576191 | indo1319 | ibo_Latn | 1543820 | atla1278 | yua_Latn | 610052 | maya1287 |
est_Latn | 13600579 | kin_Latn | 1521612 | atla1278 | min_Latn | 609065 | aust1307 | |
ltz_Latn | 12997242 | indo1319 | hye_Armn | 1463123 | indo1319 | lue_Latn | 599429 | atla1278 |
eus_Latn | 12775959 | oci_Latn | 1449128 | indo1319 | khm_Khmr | 590429 | aust1305 | |
lit_Latn | 12479626 | indo1319 | lin_Latn | 1408460 | atla1278 | tum_Latn | 589857 | atla1278 |
kaz_Cyrl | 12378727 | turk1311 | tpi_Latn | 1401844 | indo1319 | tll_Latn | 586530 | atla1278 |
lav_Latn | 12143980 | indo1319 | twi_Latn | 1400979 | atla1278 | ekk_Latn | 582595 | ural1272 |
bos_Latn | 11014744 | indo1319 | kir_Cyrl | 1397566 | turk1311 | lug_Latn | 566948 | atla1278 |
epo_Latn | 8737198 | arti1236 | pap_Latn | 1360138 | indo1319 | niu_Latn | 566715 | aust1307 |
cat_Latn | 8648271 | indo1319 | nep_Deva | 1317291 | indo1319 | tzo_Latn | 540262 | maya1287 |
tha_Thai | 7735209 | taik1256 | azj_Latn | 1315834 | turk1311 | mah_Latn | 534614 | aust1307 |
ukr_Cyrl | 7462046 | indo1319 | bcl_Latn | 1284493 | aust1307 | tvl_Latn | 521556 | aust1307 |
tgl_Latn | 7411064 | aust1307 | xho_Latn | 1262364 | atla1278 | jav_Latn | 516833 | aust1307 |
sin_Sinh | 7293178 | indo1319 | cym_Latn | 1244783 | indo1319 | vec_Latn | 514240 | indo1319 |
gle_Latn | 7225513 | indo1319 | gaa_Latn | 1222307 | atla1278 | jpn_Jpan | 510722 | japo1237 |
Lang | Family | Lang | Family | Lang | Family | |||
lus_Latn | 509250 | sino1245 | kmb_Latn | 296269 | atla1278 | ncx_Latn | 162558 | utoa1244 |
crs_Latn | 508755 | indo1319 | zai_Latn | 277632 | otom1299 | qug_Latn | 162500 | quec1387 |
kqn_Latn | 507913 | atla1278 | gym_Latn | 274512 | chib1249 | rmn_Latn | 162069 | indo1319 |
ndo_Latn | 496613 | atla1278 | bod_Tibt | 273489 | sino1245 | cjk_Latn | 160645 | atla1278 |
snd_Arab | 488730 | indo1319 | nde_Latn | 269931 | atla1278 | arb_Arab | 159884 | afro1255 |
yue_Hani | 484700 | sino1245 | fon_Latn | 268566 | atla1278 | kea_Latn | 158047 | indo1319 |
tiv_Latn | 483064 | atla1278 | ber_Latn | 264426 | mck_Latn | 157521 | atla1278 | |
kua_Latn | 473535 | atla1278 | nbl_Latn | 259158 | atla1278 | arn_Latn | 155882 | arau1255 |
kwy_Latn | 473274 | atla1278 | kmr_Latn | 256677 | indo1319 | pdt_Latn | 155485 | indo1319 |
hin_Latn | 466175 | indo1319 | guc_Latn | 249044 | araw1281 | her_Latn | 154827 | atla1278 |
iku_Cans | 465011 | mam_Latn | 248348 | maya1287 | gla_Latn | 152563 | indo1319 | |
kal_Latn | 462430 | eski1264 | nia_Latn | 247406 | aust1307 | kmr_Cyrl | 151728 | indo1319 |
tdt_Latn | 459818 | aust1307 | nyn_Latn | 241992 | atla1278 | mwl_Latn | 150054 | indo1319 |
gsw_Latn | 449240 | indo1319 | cab_Latn | 240101 | araw1281 | nav_Latn | 147702 | atha1245 |
mfe_Latn | 447435 | indo1319 | top_Latn | 239232 | toto1251 | ksw_Mymr | 147674 | sino1245 |
swc_Latn | 446378 | atla1278 | tog_Latn | 231969 | atla1278 | mxv_Latn | 147591 | otom1299 |
mon_Latn | 437950 | mong1349 | mco_Latn | 231209 | mixe1284 | hif_Latn | 147261 | indo1319 |
mos_Latn | 437666 | atla1278 | tzh_Latn | 230706 | maya1287 | wol_Latn | 146992 | atla1278 |
kik_Latn | 437228 | atla1278 | pms_Latn | 227748 | indo1319 | sme_Latn | 146803 | ural1272 |
cnh_Latn | 436667 | sino1245 | wuu_Hani | 224088 | sino1245 | gom_Latn | 143937 | indo1319 |
gil_Latn | 434529 | aust1307 | plt_Latn | 220413 | aust1307 | bum_Latn | 141673 | atla1278 |
pon_Latn | 434522 | aust1307 | yid_Hebr | 220214 | indo1319 | mgr_Latn | 138953 | atla1278 |
umb_Latn | 431589 | atla1278 | ada_Latn | 219427 | atla1278 | ahk_Latn | 135068 | sino1245 |
lvs_Latn | 422952 | indo1319 | iba_Latn | 213615 | aust1307 | kur_Arab | 134160 | indo1319 |
sco_Latn | 411591 | indo1319 | kek_Latn | 209932 | maya1287 | bas_Latn | 133436 | atla1278 |
ori_Orya | 410827 | koo_Latn | 209375 | atla1278 | bin_Latn | 133256 | atla1278 | |
arg_Latn | 410683 | indo1319 | sop_Latn | 206501 | atla1278 | tsz_Latn | 133251 | tara1323 |
kur_Latn | 407169 | indo1319 | kac_Latn | 205542 | sino1245 | sid_Latn | 130406 | afro1255 |
dhv_Latn | 405711 | aust1307 | qvi_Latn | 205447 | quec1387 | diq_Latn | 128908 | indo1319 |
luo_Latn | 398974 | nilo1247 | cak_Latn | 204472 | maya1287 | srd_Latn | 127064 | |
lun_Latn | 395764 | atla1278 | kbp_Latn | 202877 | atla1278 | tcf_Latn | 126050 | otom1299 |
nzi_Latn | 394247 | atla1278 | ctu_Latn | 201662 | maya1287 | bzj_Latn | 124958 | indo1319 |
gug_Latn | 392227 | tupi1275 | kri_Latn | 201087 | indo1319 | udm_Cyrl | 121705 | ural1272 |
bar_Latn | 387070 | indo1319 | mau_Latn | 199134 | otom1299 | cce_Latn | 120636 | atla1278 |
bci_Latn | 384059 | atla1278 | scn_Latn | 199068 | indo1319 | meu_Latn | 120273 | aust1307 |
chk_Latn | 380596 | aust1307 | tyv_Cyrl | 198649 | turk1311 | chw_Latn | 119751 | atla1278 |
roh_Latn | 377067 | indo1319 | ina_Latn | 197315 | arti1236 | cbk_Latn | 118789 | indo1319 |
aym_Latn | 373329 | ayma1253 | btx_Latn | 193701 | aust1307 | ibg_Latn | 118733 | aust1307 |
yap_Latn | 358929 | aust1307 | nch_Latn | 193129 | utoa1244 | bhw_Latn | 117381 | aust1307 |
ssw_Latn | 356561 | atla1278 | ncj_Latn | 192962 | utoa1244 | ngu_Latn | 116851 | utoa1244 |
quz_Latn | 354781 | quec1387 | pau_Latn | 190529 | aust1307 | nyy_Latn | 115914 | atla1278 |
sah_Cyrl | 352697 | turk1311 | toj_Latn | 189651 | maya1287 | szl_Latn | 112496 | indo1319 |
tsn_Latn | 350954 | atla1278 | pcm_Latn | 187594 | indo1319 | ish_Latn | 111814 | atla1278 |
lmo_Latn | 348135 | indo1319 | dyu_Latn | 186367 | mand1469 | naq_Latn | 109747 | khoe1240 |
ido_Latn | 331239 | arti1236 | kss_Latn | 185868 | atla1278 | toh_Latn | 107583 | atla1278 |
abk_Cyrl | 321578 | abkh1242 | afb_Arab | 183694 | afro1255 | ttj_Latn | 106925 | atla1278 |
zne_Latn | 318871 | atla1278 | urh_Latn | 182214 | atla1278 | nse_Latn | 105189 | atla1278 |
quy_Latn | 311040 | quec1387 | quc_Latn | 181559 | maya1287 | hsb_Latn | 104802 | indo1319 |
kam_Latn | 310659 | atla1278 | new_Deva | 181427 | sino1245 | ami_Latn | 104559 | aust1307 |
bbc_Latn | 310420 | aust1307 | yao_Latn | 179965 | atla1278 | alz_Latn | 104392 | nilo1247 |
vol_Latn | 310399 | arti1236 | ngl_Latn | 178498 | atla1278 | apc_Arab | 102392 | afro1255 |
wal_Latn | 309873 | gong1255 | nyu_Latn | 177483 | atla1278 | vls_Latn | 101900 | indo1319 |
uig_Arab | 307302 | turk1311 | kab_Latn | 176015 | afro1255 | mhr_Cyrl | 100474 | ural1272 |
vmw_Latn | 306899 | atla1278 | tuk_Cyrl | 175769 | turk1311 | djk_Latn | 99234 | indo1319 |
kwn_Latn | 305362 | atla1278 | xmf_Geor | 174994 | kart1248 | wes_Latn | 98492 | indo1319 |
pam_Latn | 303737 | aust1307 | ndc_Latn | 174305 | atla1278 | gkn_Latn | 97041 | atla1278 |
seh_Latn | 300243 | atla1278 | san_Deva | 165616 | indo1319 | grc_Grek | 96986 | indo1319 |
tsc_Latn | 298442 | atla1278 | nba_Latn | 163485 | atla1278 | hbo_Hebr | 96484 | afro1255 |
nyk_Latn | 297976 | atla1278 | bpy_Beng | 162838 | indo1319 | swh_Latn | 95776 | atla1278 |
Lang | Family | Lang | Family | Lang | Family | |||
alt_Cyrl | 95148 | turk1311 | mny_Latn | 50581 | atla1278 | csy_Latn | 34126 | sino1245 |
rmn_Grek | 94533 | indo1319 | gkp_Latn | 50549 | mand1469 | azb_Arab | 33758 | turk1311 |
miq_Latn | 94343 | misu1242 | kat_Latn | 50424 | kart1248 | csb_Latn | 33743 | indo1319 |
kaa_Cyrl | 88815 | turk1311 | bjn_Latn | 49068 | aust1307 | tpm_Latn | 33517 | atla1278 |
kos_Latn | 88603 | aust1307 | acr_Latn | 48886 | maya1287 | quw_Latn | 33449 | quec1387 |
grn_Latn | 87568 | dtp_Latn | 48468 | aust1307 | rmy_Cyrl | 33351 | indo1319 | |
lhu_Latn | 87255 | sino1245 | lam_Latn | 46853 | atla1278 | ixl_Latn | 33289 | maya1287 |
lzh_Hani | 86035 | sino1245 | bik_Latn | 46561 | mbb_Latn | 33240 | aust1307 | |
ajp_Arab | 83297 | afro1255 | poh_Latn | 46454 | maya1287 | pfl_Latn | 33148 | indo1319 |
cmn_Hani | 80745 | sino1245 | phm_Latn | 45862 | atla1278 | pcd_Latn | 32867 | indo1319 |
gcf_Latn | 80737 | indo1319 | hrx_Latn | 45716 | indo1319 | tlh_Latn | 32863 | arti1236 |
rmn_Cyrl | 79925 | indo1319 | quh_Latn | 45566 | quec1387 | suz_Deva | 32811 | sino1245 |
kjh_Cyrl | 79262 | turk1311 | hyw_Cyrl | 45379 | indo1319 | gcr_Latn | 32676 | indo1319 |
rng_Latn | 78177 | atla1278 | rue_Cyrl | 45369 | indo1319 | jbo_Latn | 32619 | arti1236 |
mgh_Latn | 78117 | atla1278 | eml_Latn | 44630 | indo1319 | tbz_Latn | 32264 | atla1278 |
xmv_Latn | 77896 | aust1307 | acm_Arab | 44505 | afro1255 | bam_Latn | 32150 | mand1469 |
ige_Latn | 77114 | atla1278 | tob_Latn | 44473 | guai1249 | prk_Latn | 32085 | aust1305 |
rmy_Latn | 76991 | indo1319 | ach_Latn | 43974 | nilo1247 | jam_Latn | 32048 | indo1319 |
srm_Latn | 76884 | indo1319 | vep_Latn | 43076 | ural1272 | twx_Latn | 32028 | atla1278 |
bak_Latn | 76809 | turk1311 | npi_Deva | 43072 | indo1319 | nmf_Latn | 31997 | sino1245 |
gur_Latn | 76151 | atla1278 | tok_Latn | 42820 | arti1236 | caq_Latn | 31903 | aust1305 |
idu_Latn | 75106 | atla1278 | sgs_Latn | 42467 | indo1319 | rop_Latn | 31889 | indo1319 |
yom_Latn | 74818 | atla1278 | lij_Latn | 42447 | indo1319 | tca_Latn | 31852 | ticu1244 |
tdx_Latn | 74430 | aust1307 | myv_Cyrl | 42147 | ural1272 | yan_Latn | 31775 | misu1242 |
mzn_Arab | 73719 | indo1319 | tih_Latn | 41873 | aust1307 | xav_Latn | 31765 | nucl1710 |
cfm_Latn | 70227 | sino1245 | tat_Latn | 41640 | turk1311 | bih_Deva | 31658 | |
zpa_Latn | 69237 | otom1299 | lfn_Latn | 41632 | arti1236 | cuk_Latn | 31612 | chib1249 |
kbd_Cyrl | 67914 | abkh1242 | cgg_Latn | 41196 | atla1278 | kjb_Latn | 31471 | maya1287 |
lao_Laoo | 66966 | taik1256 | ful_Latn | 41188 | atla1278 | hne_Deva | 31465 | indo1319 |
nap_Latn | 65826 | indo1319 | gor_Latn | 41174 | aust1307 | wbm_Latn | 31394 | aust1305 |
qub_Latn | 64973 | quec1387 | ile_Latn | 40984 | arti1236 | zlm_Latn | 31345 | aust1307 |
oke_Latn | 64508 | atla1278 | ium_Latn | 40683 | hmon1336 | tui_Latn | 31161 | atla1278 |
ote_Latn | 64224 | otom1299 | teo_Latn | 40203 | nilo1247 | ifb_Latn | 30980 | aust1307 |
bsb_Latn | 63634 | aust1307 | kia_Latn | 40035 | atla1278 | izz_Latn | 30894 | atla1278 |
ogo_Latn | 61901 | atla1278 | crh_Cyrl | 39985 | turk1311 | rug_Latn | 30857 | aust1307 |
abn_Latn | 61830 | atla1278 | crh_Latn | 39896 | turk1311 | aka_Latn | 30704 | atla1278 |
ldi_Latn | 61827 | atla1278 | enm_Latn | 39809 | indo1319 | pxm_Latn | 30698 | book1242 |
ayr_Latn | 61570 | ayma1253 | sat_Olck | 39614 | aust1305 | kmm_Latn | 30671 | sino1245 |
gom_Deva | 61140 | indo1319 | mad_Latn | 38993 | aust1307 | mcn_Latn | 30666 | afro1255 |
bba_Latn | 61123 | atla1278 | cac_Latn | 38812 | maya1287 | ifa_Latn | 30621 | aust1307 |
aln_Latn | 60989 | indo1319 | hnj_Latn | 38611 | hmon1336 | dln_Latn | 30620 | sino1245 |
leh_Latn | 59944 | atla1278 | ksh_Latn | 38130 | indo1319 | ext_Latn | 30605 | indo1319 |
ban_Latn | 59805 | aust1307 | ikk_Latn | 38071 | atla1278 | ksd_Latn | 30550 | aust1307 |
ace_Latn | 59333 | aust1307 | sba_Latn | 38040 | cent2225 | mzh_Latn | 30517 | mata1289 |
pes_Arab | 57511 | indo1319 | zom_Latn | 37013 | sino1245 | llb_Latn | 30480 | atla1278 |
skg_Latn | 57228 | aust1307 | bqc_Latn | 36881 | mand1469 | hra_Latn | 30472 | sino1245 |
ary_Arab | 56933 | afro1255 | bim_Latn | 36835 | atla1278 | mwm_Latn | 30432 | cent2225 |
hus_Latn | 56176 | maya1287 | mdy_Ethi | 36370 | gong1255 | krc_Cyrl | 30353 | turk1311 |
glv_Latn | 55641 | indo1319 | bts_Latn | 36216 | aust1307 | tuc_Latn | 30349 | aust1307 |
fat_Latn | 55609 | atla1278 | gya_Latn | 35902 | atla1278 | mrw_Latn | 30304 | aust1307 |
frr_Latn | 55254 | indo1319 | ajg_Latn | 35631 | atla1278 | pls_Latn | 30136 | otom1299 |
mwn_Latn | 54805 | atla1278 | agw_Latn | 35585 | aust1307 | rap_Latn | 30102 | aust1307 |
mai_Deva | 54687 | indo1319 | kom_Cyrl | 35249 | ural1272 | fur_Latn | 30052 | indo1319 |
dua_Latn | 53392 | atla1278 | knv_Latn | 35196 | kaa_Latn | 30031 | turk1311 | |
dzo_Tibt | 52732 | sino1245 | giz_Latn | 35040 | afro1255 | prs_Arab | 26823 | indo1319 |
ctd_Latn | 52135 | sino1245 | hui_Latn | 34926 | nucl1709 | san_Latn | 25742 | indo1319 |
nnb_Latn | 52041 | atla1278 | kpg_Latn | 34900 | aust1307 | som_Arab | 14199 | afro1255 |
sxn_Latn | 51749 | aust1307 | zea_Latn | 34426 | indo1319 | uig_Latn | 9637 | turk1311 |
mps_Latn | 50645 | tebe1251 | aoj_Latn | 34349 | nucl1708 | hau_Arab | 9593 | afro1255 |
Appendix B Detailed Results
Detailed results of evaluation are shown in Tables 8-15 ( on Glot500-c), Tables 16-21 ( on PBC), Tables 22-23 (ACC on SIB200), and Tables 24-29 (ACC on Taxi1500).
Lang | LLaMA 2-7B | mGPT-13B | BLOOM-7B1 | XGLM-7.5B | MALA-500 |
abk_Cyrl | 234.09 | 249.16 | 258.26 | 231.44 | 164.61 |
abn_Latn | 140.01 | 197.81 | 153.58 | 152.90 | 111.86 |
ace_Latn | 235.15 | 332.18 | 244.00 | 259.64 | 168.79 |
ach_Latn | 179.03 | 227.84 | 194.55 | 197.05 | 161.01 |
acm_Arab | 119.15 | 153.09 | 106.29 | 101.35 | 135.82 |
acr_Latn | 301.73 | 399.80 | 321.79 | 316.49 | 194.71 |
ada_Latn | 132.76 | 168.56 | 150.19 | 137.99 | 103.17 |
afb_Arab | 134.03 | 169.73 | 112.55 | 110.59 | 152.58 |
afr_Latn | 52.43 | 84.47 | 73.24 | 75.60 | 64.25 |
agw_Latn | 228.22 | 318.95 | 246.48 | 242.04 | 152.59 |
ahk_Latn | 229.45 | 377.60 | 245.81 | 241.21 | 163.96 |
ajg_Latn | 146.48 | 185.41 | 170.89 | 155.21 | 113.83 |
ajp_Arab | 153.34 | 199.79 | 129.62 | 124.24 | 164.80 |
aka_Latn | 163.59 | 223.13 | 166.49 | 185.41 | 131.50 |
aln_Latn | 191.62 | 259.76 | 218.75 | 267.34 | 143.64 |
als_Latn | 191.60 | 271.51 | 219.17 | 260.14 | 155.23 |
alt_Cyrl | 199.25 | 220.77 | 200.70 | 215.71 | 139.18 |
alz_Latn | 167.89 | 214.64 | 185.35 | 171.34 | 155.03 |
amh_Ethi | 328.25 | 834.56 | 407.68 | 550.50 | 268.11 |
ami_Latn | 122.67 | 168.42 | 131.77 | 132.36 | 109.13 |
aoj_Latn | 318.62 | 495.44 | 340.07 | 316.36 | 196.64 |
apc_Arab | 131.19 | 153.97 | 106.78 | 109.24 | 145.81 |
ara_Arab | 111.05 | 155.64 | 80.72 | 84.86 | 140.73 |
arb_Arab | 166.93 | 318.76 | 135.76 | 137.80 | 173.03 |
arg_Latn | 173.62 | 306.23 | 171.32 | 178.40 | 160.08 |
arn_Latn | 202.09 | 292.40 | 204.32 | 216.04 | 163.87 |
ary_Arab | 198.80 | 309.90 | 184.82 | 176.58 | 173.37 |
arz_Arab | 122.74 | 248.72 | 95.61 | 100.43 | 131.75 |
asm_Beng | 264.49 | 409.59 | 172.35 | 311.81 | 184.77 |
ast_Latn | 208.41 | 325.35 | 184.93 | 192.86 | 178.77 |
aym_Latn | 143.36 | 183.42 | 149.06 | 154.45 | 117.28 |
ayr_Latn | 274.31 | 342.40 | 288.57 | 293.48 | 185.87 |
azb_Arab | 254.60 | 293.24 | 273.20 | 285.61 | 162.94 |
aze_Latn | 156.58 | 230.45 | 195.32 | 189.59 | 110.56 |
azj_Latn | 168.12 | 228.08 | 212.31 | 199.86 | 126.98 |
bak_Cyrl | 274.50 | 348.47 | 288.93 | 307.95 | 169.00 |
bak_Latn | 191.06 | 259.97 | 196.98 | 213.41 | 152.50 |
bam_Latn | 195.29 | 251.28 | 203.50 | 215.62 | 171.51 |
ban_Latn | 205.77 | 297.97 | 213.20 | 213.89 | 186.89 |
bar_Latn | 210.97 | 287.33 | 234.73 | 208.66 | 188.90 |
bas_Latn | 137.53 | 172.78 | 143.37 | 147.13 | 110.71 |
bba_Latn | 233.68 | 286.30 | 258.58 | 238.94 | 164.18 |
bbc_Latn | 172.78 | 216.78 | 181.59 | 170.06 | 148.89 |
bci_Latn | 176.81 | 223.93 | 190.52 | 189.46 | 171.00 |
bcl_Latn | 149.22 | 209.44 | 162.25 | 174.40 | 132.55 |
bel_Cyrl | 110.77 | 174.19 | 142.62 | 147.27 | 85.11 |
bem_Latn | 182.62 | 222.50 | 198.45 | 150.51 | 158.31 |
ben_Beng | 92.79 | 162.83 | 50.33 | 55.42 | 73.86 |
ber_Latn | 88.37 | 120.03 | 87.79 | 101.52 | 71.90 |
bhw_Latn | 186.42 | 245.14 | 194.41 | 188.81 | 155.12 |
bih_Deva | 248.12 | 422.46 | 176.37 | 204.17 | 180.31 |
bik_Latn | 151.63 | 218.03 | 173.42 | 187.11 | 137.28 |
bim_Latn | 229.29 | 284.29 | 244.21 | 245.34 | 166.16 |
bin_Latn | 137.28 | 175.41 | 152.32 | 152.02 | 109.51 |
bis_Latn | 165.83 | 250.17 | 179.61 | 190.13 | 130.32 |
bjn_Latn | 200.57 | 302.58 | 202.67 | 199.15 | 182.65 |
bod_Tibt | 437.54 | 1690.09 | 461.35 | 80.21 | 286.05 |
bos_Latn | 87.13 | 175.82 | 131.95 | 149.85 | 110.92 |
bpy_Beng | 251.20 | 471.67 | 154.31 | 172.17 | 155.64 |
bqc_Latn | 208.00 | 266.53 | 226.49 | 205.65 | 153.58 |
bre_Latn | 222.93 | 276.71 | 208.07 | 260.44 | 184.35 |
bsb_Latn | 236.62 | 358.90 | 275.10 | 306.64 | 204.50 |
bts_Latn | 214.80 | 292.93 | 232.31 | 217.74 | 156.31 |
btx_Latn | 169.13 | 227.44 | 181.86 | 174.25 | 148.25 |
bul_Cyrl | 47.01 | 90.81 | 77.70 | 42.90 | 57.12 |
bum_Latn | 183.88 | 237.35 | 194.64 | 195.91 | 156.33 |
bzj_Latn | 167.62 | 244.15 | 188.25 | 194.46 | 137.81 |
Lang | LLaMA 2-7B | mGPT-13B | BLOOM-7B1 | XGLM-7.5B | MALA-500 |
cab_Latn | 222.05 | 292.04 | 234.53 | 237.57 | 168.63 |
cac_Latn | 293.47 | 395.52 | 310.33 | 301.30 | 192.22 |
cak_Latn | 295.24 | 394.87 | 317.52 | 309.03 | 200.69 |
caq_Latn | 240.00 | 323.71 | 264.17 | 257.49 | 164.95 |
cat_Latn | 94.68 | 212.17 | 83.26 | 86.26 | 130.00 |
cbk_Latn | 143.05 | 221.60 | 145.69 | 159.41 | 137.96 |
cce_Latn | 178.45 | 226.07 | 190.01 | 192.54 | 152.70 |
ceb_Latn | 136.44 | 278.02 | 164.94 | 183.55 | 123.31 |
ces_Latn | 44.83 | 98.77 | 68.48 | 76.15 | 58.42 |
cfm_Latn | 240.20 | 305.25 | 252.92 | 256.79 | 185.94 |
cgg_Latn | 121.16 | 160.92 | 127.35 | 129.19 | 107.91 |
che_Cyrl | 199.15 | 272.63 | 203.57 | 197.17 | 158.57 |
chk_Latn | 189.52 | 258.69 | 201.19 | 200.61 | 145.98 |
chv_Cyrl | 246.19 | 292.36 | 252.81 | 229.56 | 157.91 |
chw_Latn | 139.07 | 174.73 | 142.88 | 121.98 | 121.16 |
cjk_Latn | 125.30 | 158.06 | 134.03 | 128.75 | 106.21 |
ckb_Arab | 372.24 | 437.95 | 370.20 | 521.30 | 243.30 |
cmn_Hani | 52.17 | 92.04 | 40.75 | 49.81 | 62.30 |
cnh_Latn | 185.01 | 242.39 | 198.20 | 198.57 | 147.90 |
cos_Latn | 192.02 | 323.30 | 210.38 | 211.96 | 185.03 |
crh_Cyrl | 236.43 | 282.79 | 239.67 | 260.03 | 141.08 |
crh_Latn | 149.67 | 240.28 | 168.79 | 157.01 | 131.91 |
crs_Latn | 153.11 | 202.53 | 153.34 | 87.81 | 129.39 |
csb_Latn | 238.86 | 336.99 | 261.46 | 294.41 | 166.29 |
csy_Latn | 226.53 | 299.52 | 249.53 | 245.14 | 172.03 |
ctd_Latn | 210.45 | 276.87 | 227.39 | 224.34 | 158.35 |
ctu_Latn | 216.90 | 310.89 | 226.68 | 220.32 | 157.27 |
cuk_Latn | 233.42 | 325.97 | 252.00 | 247.83 | 190.81 |
cym_Latn | 233.91 | 369.64 | 306.05 | 332.89 | 217.29 |
dan_Latn | 43.75 | 84.32 | 69.51 | 66.96 | 54.56 |
deu_Latn | 37.46 | 68.68 | 49.65 | 33.88 | 53.45 |
dhv_Latn | 121.21 | 170.85 | 126.68 | 128.57 | 95.81 |
diq_Latn | 174.75 | 265.78 | 180.00 | 190.78 | 147.56 |
div_Thaa | 314.55 | 565.83 | 314.34 | 17.32 | 153.76 |
djk_Latn | 188.44 | 249.39 | 201.50 | 207.16 | 163.39 |
dln_Latn | 217.51 | 288.73 | 231.93 | 238.10 | 165.40 |
dtp_Latn | 267.22 | 373.92 | 279.80 | 287.18 | 184.75 |
dua_Latn | 131.20 | 169.64 | 136.03 | 129.20 | 109.86 |
dyu_Latn | 186.37 | 237.65 | 193.19 | 205.47 | 157.89 |
dzo_Tibt | 238.61 | 842.40 | 244.70 | 47.40 | 154.48 |
efi_Latn | 178.91 | 251.07 | 205.96 | 203.93 | 134.40 |
ekk_Latn | 155.86 | 223.64 | 194.37 | 89.18 | 141.19 |
ell_Grek | 52.85 | 86.68 | 67.98 | 36.04 | 54.45 |
eml_Latn | 213.57 | 278.33 | 224.10 | 225.17 | 163.91 |
eng_Latn | 30.45 | 62.73 | 31.32 | 34.36 | 48.60 |
enm_Latn | 79.08 | 193.74 | 108.20 | 119.78 | 87.78 |
epo_Latn | 68.89 | 99.75 | 79.80 | 87.72 | 70.22 |
est_Latn | 70.18 | 100.28 | 88.33 | 40.53 | 67.38 |
eus_Latn | 79.07 | 87.15 | 48.33 | 45.59 | 70.49 |
ewe_Latn | 208.53 | 269.62 | 218.53 | 195.99 | 148.78 |
ext_Latn | 216.92 | 338.22 | 211.26 | 231.30 | 177.17 |
fao_Latn | 202.04 | 284.61 | 227.56 | 263.89 | 165.45 |
fas_Arab | 138.13 | 193.21 | 163.46 | 166.76 | 133.69 |
fat_Latn | 134.67 | 180.66 | 144.54 | 144.50 | 106.86 |
fij_Latn | 159.86 | 219.85 | 191.04 | 137.83 | 147.71 |
fil_Latn | 120.89 | 206.21 | 162.04 | 161.84 | 120.27 |
fin_Latn | 46.88 | 86.18 | 79.58 | 35.79 | 58.35 |
fon_Latn | 237.19 | 295.74 | 256.54 | 262.29 | 160.24 |
fra_Latn | 32.26 | 63.71 | 31.08 | 32.74 | 49.22 |
frr_Latn | 192.91 | 299.41 | 206.26 | 211.00 | 144.13 |
fry_Latn | 191.87 | 247.81 | 205.02 | 221.64 | 168.86 |
ful_Latn | 447.47 | 550.03 | 457.25 | 511.87 | 339.38 |
fur_Latn | 231.23 | 313.99 | 234.38 | 250.02 | 183.57 |
gaa_Latn | 188.66 | 232.67 | 222.71 | 158.83 | 146.37 |
gcf_Latn | 132.36 | 173.10 | 130.03 | 91.07 | 103.54 |
gcr_Latn | 113.22 | 157.83 | 115.02 | 79.46 | 94.40 |
gil_Latn | 175.92 | 237.54 | 187.79 | 181.71 | 154.60 |
giz_Latn | 244.47 | 332.32 | 268.61 | 266.29 | 168.09 |
Lang | LLaMA 2-7B | mGPT-13B | BLOOM-7B1 | XGLM-7.5B | MALA-500 |
gkn_Latn | 223.58 | 304.46 | 253.54 | 245.24 | 167.81 |
gkp_Latn | 261.56 | 358.97 | 280.80 | 270.48 | 186.41 |
gla_Latn | 220.92 | 382.20 | 293.89 | 315.23 | 210.51 |
gle_Latn | 203.10 | 345.45 | 276.11 | 299.80 | 206.52 |
glg_Latn | 120.88 | 204.76 | 108.43 | 122.45 | 132.58 |
glv_Latn | 232.86 | 326.69 | 247.79 | 265.04 | 182.93 |
gom_Deva | 328.82 | 462.17 | 324.77 | 358.50 | 233.15 |
gom_Latn | 244.57 | 318.36 | 259.71 | 257.90 | 209.13 |
gor_Latn | 217.70 | 326.26 | 232.98 | 239.37 | 168.23 |
grc_Grek | 126.86 | 277.73 | 181.00 | 127.62 | 141.80 |
grn_Latn | 293.70 | 382.11 | 298.10 | 316.62 | 204.94 |
gsw_Latn | 180.67 | 226.37 | 199.03 | 171.72 | 157.34 |
guc_Latn | 241.99 | 340.92 | 257.19 | 234.87 | 183.29 |
gug_Latn | 197.04 | 258.55 | 201.92 | 214.05 | 158.39 |
guj_Gujr | 118.82 | 291.38 | 74.12 | 194.71 | 90.02 |
gur_Latn | 222.48 | 311.22 | 243.52 | 233.99 | 173.11 |
guw_Latn | 210.29 | 215.37 | 235.91 | 246.28 | 146.55 |
gya_Latn | 242.48 | 350.56 | 274.82 | 258.26 | 170.00 |
gym_Latn | 231.32 | 324.92 | 249.32 | 191.13 | 178.06 |
hat_Latn | 237.00 | 341.48 | 251.07 | 150.39 | 201.88 |
hau_Arab | 173.08 | 330.75 | 130.96 | 129.69 | 230.02 |
hau_Latn | 228.21 | 300.72 | 257.22 | 265.65 | 191.68 |
haw_Latn | 190.18 | 300.25 | 217.54 | 213.20 | 174.30 |
hbo_Hebr | 140.73 | 315.06 | 194.19 | 200.98 | 155.08 |
hbs_Cyrl | 206.87 | 503.80 | 370.83 | 417.41 | 225.22 |
hbs_Latn | 209.02 | 451.95 | 333.95 | 375.92 | 223.11 |
heb_Hebr | 48.34 | 63.09 | 58.19 | 63.73 | 56.05 |
her_Latn | 140.31 | 172.32 | 146.72 | 136.86 | 109.29 |
hif_Latn | 396.80 | 613.23 | 471.65 | 465.81 | 371.92 |
hil_Latn | 145.89 | 207.79 | 161.39 | 182.01 | 126.09 |
hin_Deva | 142.07 | 289.53 | 105.86 | 106.38 | 166.12 |
hin_Latn | 150.11 | 247.31 | 166.34 | 164.94 | 176.00 |
hmn_Latn | 241.00 | 375.11 | 282.60 | 284.95 | 182.91 |
hmo_Latn | 165.38 | 236.46 | 178.12 | 142.53 | 133.19 |
hne_Deva | 201.66 | 298.38 | 171.37 | 184.06 | 161.30 |
hnj_Latn | 231.56 | 324.18 | 263.80 | 278.94 | 141.56 |
hra_Latn | 215.87 | 271.46 | 228.66 | 229.46 | 169.57 |
hrv_Latn | 43.03 | 82.09 | 63.02 | 69.82 | 54.08 |
hrx_Latn | 131.34 | 182.33 | 140.90 | 135.17 | 105.20 |
hsb_Latn | 182.90 | 293.15 | 211.79 | 235.34 | 127.71 |
hui_Latn | 297.34 | 388.25 | 319.23 | 318.32 | 197.57 |
hun_Latn | 45.03 | 79.27 | 75.21 | 79.06 | 59.30 |
hus_Latn | 247.96 | 352.19 | 260.90 | 258.32 | 180.85 |
hye_Armn | 286.18 | 602.02 | 372.75 | 454.38 | 202.92 |
hyw_Armn | 145.46 | 263.04 | 186.63 | 213.52 | 110.19 |
hyw_Cyrl | 162.17 | 231.84 | 171.73 | 165.61 | 117.61 |
iba_Latn | 150.03 | 192.62 | 157.75 | 151.54 | 133.67 |
ibg_Latn | 115.37 | 152.94 | 119.10 | 122.19 | 106.08 |
ibo_Latn | 232.57 | 333.59 | 223.37 | 296.17 | 184.97 |
ido_Latn | 140.94 | 273.88 | 153.94 | 164.61 | 121.37 |
idu_Latn | 153.00 | 209.20 | 162.53 | 157.46 | 106.21 |
ifa_Latn | 252.33 | 328.31 | 270.66 | 266.03 | 172.20 |
ifb_Latn | 257.92 | 340.56 | 278.23 | 272.79 | 183.83 |
ige_Latn | 148.85 | 199.02 | 173.80 | 176.02 | 111.50 |
ikk_Latn | 249.37 | 330.44 | 284.76 | 310.26 | 166.74 |
iku_Cans | 261.21 | 877.71 | 343.18 | 496.50 | 174.80 |
ile_Latn | 100.28 | 199.76 | 105.32 | 115.20 | 100.35 |
ilo_Latn | 172.24 | 227.41 | 186.11 | 208.36 | 146.96 |
ina_Latn | 209.38 | 408.99 | 230.14 | 236.01 | 201.92 |
ind_Latn | 42.59 | 69.80 | 35.50 | 36.82 | 56.03 |
ish_Latn | 126.54 | 178.71 | 144.92 | 146.15 | 101.29 |
isl_Latn | 103.40 | 156.83 | 127.49 | 139.76 | 83.51 |
iso_Latn | 148.38 | 175.75 | 168.85 | 167.42 | 104.67 |
ita_Latn | 39.35 | 79.02 | 49.94 | 40.47 | 53.36 |
ium_Latn | 247.28 | 361.48 | 264.84 | 266.46 | 167.10 |
ixl_Latn | 327.09 | 506.74 | 353.08 | 348.23 | 222.05 |
izz_Latn | 301.73 | 400.14 | 346.61 | 361.39 | 193.24 |
jam_Latn | 204.69 | 291.31 | 223.99 | 231.17 | 157.87 |
Lang | LLaMA 2-7B | mGPT-13B | BLOOM-7B1 | XGLM-7.5B | MALA-500 |
jav_Latn | 208.92 | 275.29 | 212.60 | 220.31 | 180.00 |
jbo_Latn | 103.91 | 200.82 | 109.86 | 112.61 | 117.25 |
jpn_Jpan | 136.26 | 301.32 | 197.23 | 149.70 | 150.43 |
kaa_Cyrl | 281.21 | 363.07 | 300.20 | 317.13 | 146.98 |
kaa_Latn | 284.60 | 354.51 | 292.04 | 309.67 | 192.43 |
kab_Latn | 192.58 | 264.51 | 185.56 | 216.46 | 161.31 |
kac_Latn | 210.47 | 267.38 | 223.77 | 249.95 | 166.44 |
kal_Latn | 240.15 | 262.90 | 259.85 | 155.45 | 182.71 |
kam_Latn | 153.84 | 194.60 | 156.10 | 186.00 | 115.78 |
kan_Knda | 216.22 | 556.40 | 146.43 | 355.75 | 175.17 |
kat_Geor | 302.53 | 413.90 | 435.47 | 483.85 | 239.51 |
kat_Latn | 184.94 | 308.06 | 217.07 | 208.25 | 184.20 |
kaz_Cyrl | 257.67 | 341.78 | 280.01 | 297.13 | 187.85 |
kbd_Cyrl | 212.12 | 229.63 | 198.20 | 202.85 | 146.86 |
kbp_Latn | 232.17 | 306.53 | 257.45 | 246.16 | 161.08 |
kea_Latn | 118.17 | 159.93 | 121.92 | 122.29 | 105.69 |
kek_Latn | 234.79 | 332.19 | 244.69 | 228.87 | 164.18 |
khm_Khmr | 257.14 | 815.56 | 317.46 | 437.56 | 167.88 |
kia_Latn | 222.01 | 298.21 | 245.77 | 236.71 | 164.39 |
kik_Latn | 208.26 | 277.92 | 213.92 | 237.26 | 159.49 |
kin_Latn | 206.40 | 237.66 | 174.18 | 234.91 | 168.37 |
kir_Cyrl | 265.65 | 308.15 | 277.34 | 313.50 | 175.71 |
kjb_Latn | 263.79 | 353.35 | 280.16 | 278.13 | 179.76 |
kjh_Cyrl | 200.11 | 251.59 | 211.84 | 217.34 | 147.81 |
kmb_Latn | 132.84 | 166.09 | 137.48 | 118.00 | 112.99 |
kmm_Latn | 246.57 | 330.77 | 263.79 | 266.44 | 180.90 |
kmr_Cyrl | 224.23 | 284.40 | 226.51 | 221.22 | 154.70 |
kmr_Latn | 183.95 | 220.51 | 194.67 | 215.02 | 142.36 |
knv_Latn | 430.56 | 581.45 | 456.13 | 427.27 | 232.18 |
kom_Cyrl | 224.18 | 302.71 | 249.08 | 213.41 | 134.88 |
kon_Latn | 112.77 | 131.61 | 116.89 | 119.41 | 96.00 |
koo_Latn | 132.73 | 167.13 | 144.33 | 134.74 | 111.26 |
kor_Hang | 129.20 | 224.06 | 180.21 | 95.71 | 151.37 |
kos_Latn | 146.15 | 191.23 | 153.05 | 154.26 | 123.85 |
kpg_Latn | 221.52 | 321.94 | 246.33 | 245.73 | 148.93 |
kqn_Latn | 125.33 | 149.57 | 128.12 | 109.60 | 106.08 |
krc_Cyrl | 247.13 | 292.86 | 248.83 | 267.39 | 167.05 |
kri_Latn | 166.50 | 240.92 | 193.15 | 192.19 | 140.20 |
ksd_Latn | 198.81 | 269.96 | 210.59 | 212.57 | 138.81 |
ksh_Latn | 204.72 | 261.51 | 220.93 | 218.50 | 161.62 |
kss_Latn | 310.35 | 477.02 | 335.25 | 300.31 | 226.38 |
ksw_Mymr | 210.34 | 266.24 | 226.59 | 154.55 | 124.78 |
kua_Latn | 179.05 | 206.09 | 187.92 | 151.87 | 140.72 |
kur_Arab | 402.78 | 464.44 | 400.97 | 550.57 | 253.61 |
kur_Latn | 633.22 | 779.47 | 678.30 | 748.20 | 424.98 |
kwn_Latn | 136.80 | 170.23 | 141.88 | 111.31 | 107.21 |
kwy_Latn | 131.93 | 160.78 | 137.77 | 134.01 | 110.55 |
lam_Latn | 209.07 | 276.89 | 228.12 | 203.17 | 176.61 |
lao_Laoo | 405.48 | 978.35 | 435.37 | 583.11 | 225.06 |
lat_Latn | 167.49 | 274.19 | 186.97 | 210.22 | 183.32 |
lav_Latn | 193.22 | 257.06 | 227.60 | 252.31 | 162.80 |
ldi_Latn | 178.84 | 230.26 | 185.58 | 191.19 | 160.61 |
leh_Latn | 216.80 | 273.56 | 230.25 | 201.57 | 172.92 |
lfn_Latn | 232.59 | 368.62 | 246.45 | 258.76 | 187.82 |
lhu_Latn | 209.10 | 365.95 | 220.56 | 219.50 | 142.74 |
lij_Latn | 328.66 | 483.81 | 345.62 | 348.28 | 249.64 |
lim_Latn | 199.01 | 290.80 | 236.94 | 239.44 | 180.52 |
lin_Latn | 161.88 | 173.63 | 158.33 | 180.17 | 135.66 |
lit_Latn | 163.71 | 220.62 | 195.08 | 225.98 | 147.53 |
llb_Latn | 135.01 | 180.06 | 146.51 | 135.39 | 120.02 |
lmo_Latn | 222.22 | 378.21 | 247.54 | 242.80 | 182.01 |
loz_Latn | 179.54 | 194.46 | 185.77 | 142.19 | 147.86 |
ltz_Latn | 190.70 | 303.65 | 202.02 | 174.00 | 169.36 |
lua_Latn | 126.47 | 147.86 | 131.94 | 102.71 | 102.36 |
lub_Latn | 136.45 | 143.64 | 140.96 | 99.41 | 111.01 |
lue_Latn | 128.48 | 158.40 | 135.27 | 129.94 | 103.72 |
lug_Latn | 225.72 | 318.09 | 221.56 | 272.90 | 196.21 |
lun_Latn | 135.96 | 170.81 | 142.71 | 136.26 | 113.31 |
Lang | LLaMA 2-7B | mGPT-13B | BLOOM-7B1 | XGLM-7.5B | MALA-500 |
luo_Latn | 177.43 | 224.04 | 194.07 | 187.72 | 156.23 |
lus_Latn | 192.97 | 251.35 | 203.37 | 212.72 | 163.95 |
lvs_Latn | 154.85 | 211.40 | 185.87 | 198.99 | 138.63 |
lzh_Hani | 149.57 | 215.19 | 130.32 | 153.38 | 151.15 |
mad_Latn | 232.71 | 325.38 | 245.39 | 249.29 | 176.81 |
mah_Latn | 178.50 | 246.26 | 188.98 | 183.17 | 145.35 |
mai_Deva | 245.94 | 389.84 | 189.93 | 223.00 | 185.84 |
mal_Mlym | 96.92 | 171.55 | 57.45 | 129.61 | 72.46 |
mam_Latn | 232.38 | 315.16 | 247.28 | 244.43 | 189.28 |
mar_Deva | 85.13 | 143.31 | 55.38 | 103.23 | 70.08 |
mau_Latn | 186.46 | 333.61 | 204.91 | 193.09 | 161.45 |
mbb_Latn | 282.70 | 410.99 | 309.56 | 307.47 | 175.50 |
mck_Latn | 191.94 | 244.49 | 202.17 | 191.28 | 152.16 |
mcn_Latn | 207.28 | 276.32 | 220.35 | 230.56 | 158.99 |
mco_Latn | 271.45 | 368.23 | 281.55 | 260.70 | 206.54 |
mdy_Ethi | 306.26 | 529.46 | 293.68 | 369.22 | 166.26 |
meu_Latn | 177.74 | 235.19 | 188.09 | 168.43 | 143.62 |
mfe_Latn | 147.50 | 194.41 | 143.47 | 92.23 | 129.23 |
mgh_Latn | 193.72 | 257.45 | 207.05 | 200.68 | 166.17 |
mgr_Latn | 183.96 | 226.09 | 194.25 | 149.77 | 160.18 |
mhr_Cyrl | 230.20 | 298.73 | 235.59 | 236.71 | 167.55 |
min_Latn | 161.40 | 266.18 | 164.13 | 170.30 | 166.91 |
miq_Latn | 207.63 | 276.27 | 228.42 | 223.78 | 160.37 |
mkd_Cyrl | 81.62 | 144.52 | 112.99 | 98.33 | 74.40 |
mlg_Latn | 185.23 | 250.78 | 189.32 | 226.85 | 148.82 |
mlt_Latn | 109.60 | 184.08 | 139.75 | 146.69 | 85.14 |
mny_Latn | 133.04 | 170.16 | 135.30 | 126.14 | 112.38 |
mon_Cyrl | 397.63 | 535.59 | 446.51 | 555.16 | 249.95 |
mon_Latn | 354.75 | 411.54 | 383.60 | 383.02 | 282.85 |
mos_Latn | 197.23 | 229.14 | 206.05 | 212.55 | 159.69 |
mps_Latn | 347.99 | 496.26 | 378.75 | 366.78 | 213.10 |
mri_Latn | 154.38 | 247.38 | 181.49 | 179.85 | 134.55 |
mrw_Latn | 235.11 | 306.78 | 250.41 | 253.18 | 169.69 |
msa_Latn | 164.05 | 261.28 | 155.14 | 151.77 | 190.44 |
mwl_Latn | 275.26 | 410.83 | 270.47 | 280.98 | 202.89 |
mwm_Latn | 293.40 | 430.46 | 315.11 | 294.17 | 162.95 |
mwn_Latn | 131.84 | 162.91 | 138.48 | 111.20 | 123.37 |
mxv_Latn | 206.13 | 324.92 | 222.48 | 222.86 | 171.82 |
mya_Mymr | 383.74 | 576.49 | 472.04 | 277.91 | 252.84 |
myv_Cyrl | 267.24 | 357.29 | 263.68 | 276.10 | 188.74 |
mzh_Latn | 257.70 | 370.86 | 285.03 | 276.60 | 169.96 |
mzn_Arab | 192.75 | 263.60 | 200.51 | 204.50 | 136.03 |
nan_Latn | 172.36 | 311.98 | 186.78 | 200.62 | 153.96 |
nap_Latn | 159.24 | 246.36 | 179.36 | 167.94 | 151.29 |
naq_Latn | 195.43 | 261.60 | 207.68 | 207.27 | 150.47 |
nav_Latn | 258.40 | 380.88 | 284.18 | 286.04 | 181.13 |
nba_Latn | 123.68 | 154.25 | 130.25 | 126.08 | 99.29 |
nbl_Latn | 175.10 | 238.64 | 194.74 | 211.98 | 154.90 |
nch_Latn | 206.55 | 287.53 | 220.86 | 221.43 | 183.56 |
ncj_Latn | 185.32 | 260.91 | 201.13 | 196.79 | 173.80 |
ncx_Latn | 115.71 | 168.08 | 121.23 | 122.70 | 98.71 |
ndc_Latn | 167.38 | 222.72 | 176.18 | 184.45 | 158.24 |
nde_Latn | 169.75 | 235.54 | 185.98 | 211.96 | 151.45 |
ndo_Latn | 192.10 | 227.02 | 204.45 | 150.28 | 149.69 |
nds_Latn | 195.44 | 272.44 | 213.17 | 204.47 | 184.93 |
nep_Deva | 232.93 | 425.83 | 167.54 | 291.83 | 210.52 |
new_Deva | 169.64 | 330.40 | 128.26 | 135.07 | 103.54 |
ngl_Latn | 134.87 | 177.05 | 140.92 | 115.46 | 104.59 |
ngu_Latn | 205.16 | 282.39 | 215.65 | 213.78 | 167.56 |
nia_Latn | 202.30 | 269.59 | 214.87 | 196.19 | 167.95 |
niu_Latn | 105.04 | 142.53 | 111.71 | 113.36 | 88.11 |
nld_Latn | 37.77 | 65.47 | 55.52 | 51.54 | 51.45 |
nmf_Latn | 222.98 | 290.53 | 242.04 | 246.36 | 167.31 |
nnb_Latn | 200.64 | 248.60 | 210.13 | 212.23 | 161.20 |
nno_Latn | 138.72 | 234.11 | 192.13 | 199.51 | 146.16 |
nob_Latn | 50.27 | 96.43 | 78.24 | 73.64 | 59.05 |
nor_Latn | 78.04 | 146.26 | 126.19 | 123.99 | 99.50 |
npi_Deva | 212.50 | 399.24 | 143.71 | 290.95 | 166.12 |
Lang | LLaMA 2-7B | mGPT-13B | BLOOM-7B1 | XGLM-7.5B | MALA-500 |
nse_Latn | 176.20 | 234.62 | 184.77 | 174.57 | 161.52 |
nso_Latn | 170.49 | 227.97 | 170.96 | 201.29 | 142.72 |
nya_Latn | 203.45 | 299.12 | 222.51 | 224.89 | 175.69 |
nyk_Latn | 131.13 | 166.47 | 142.89 | 138.50 | 105.26 |
nyn_Latn | 174.51 | 229.51 | 189.05 | 194.14 | 149.17 |
nyu_Latn | 126.29 | 172.40 | 132.49 | 127.67 | 99.26 |
nyy_Latn | 215.07 | 271.23 | 234.37 | 220.80 | 168.74 |
nzi_Latn | 191.21 | 256.55 | 219.47 | 209.42 | 152.30 |
oci_Latn | 202.93 | 343.11 | 207.95 | 210.24 | 185.26 |
ogo_Latn | 134.14 | 185.86 | 149.15 | 143.22 | 118.08 |
oke_Latn | 131.90 | 166.07 | 146.98 | 149.55 | 102.72 |
ori_Orya | 323.51 | 839.80 | 179.33 | 665.17 | 203.94 |
orm_Latn | 225.00 | 334.29 | 288.51 | 313.08 | 201.60 |
ory_Orya | 232.83 | 572.34 | 134.20 | 474.35 | 164.65 |
oss_Cyrl | 229.49 | 279.89 | 229.34 | 227.24 | 151.79 |
ote_Latn | 237.06 | 362.46 | 254.31 | 241.61 | 176.73 |
pag_Latn | 173.32 | 223.39 | 184.05 | 184.76 | 157.30 |
pam_Latn | 259.01 | 373.98 | 274.16 | 280.47 | 237.10 |
pan_Guru | 242.88 | 510.70 | 153.50 | 395.85 | 180.54 |
pap_Latn | 162.79 | 213.88 | 174.63 | 173.16 | 138.20 |
pau_Latn | 176.42 | 243.03 | 188.84 | 187.10 | 150.24 |
pcd_Latn | 144.96 | 228.79 | 143.18 | 150.08 | 140.39 |
pcm_Latn | 159.00 | 346.35 | 182.00 | 179.53 | 147.50 |
pdt_Latn | 192.69 | 252.34 | 199.07 | 199.80 | 144.40 |
pes_Arab | 153.46 | 199.83 | 175.97 | 179.97 | 139.01 |
pfl_Latn | 220.11 | 315.47 | 241.84 | 225.74 | 176.25 |
phm_Latn | 117.81 | 162.32 | 128.28 | 125.57 | 100.73 |
pis_Latn | 153.04 | 237.95 | 173.91 | 179.21 | 130.98 |
pls_Latn | 237.55 | 350.88 | 251.43 | 251.35 | 175.28 |
plt_Latn | 159.36 | 220.84 | 158.06 | 193.44 | 131.96 |
pms_Latn | 132.94 | 257.06 | 137.39 | 146.18 | 106.52 |
pnb_Arab | 345.25 | 418.35 | 279.85 | 240.35 | 237.22 |
poh_Latn | 389.80 | 589.86 | 417.71 | 416.42 | 230.35 |
pol_Latn | 44.19 | 82.29 | 66.66 | 71.91 | 60.02 |
pon_Latn | 177.92 | 236.38 | 190.47 | 189.62 | 149.49 |
por_Latn | 37.00 | 66.01 | 35.14 | 33.91 | 48.72 |
prk_Latn | 220.51 | 301.85 | 230.42 | 238.15 | 148.46 |
prs_Arab | 163.01 | 218.38 | 191.40 | 195.64 | 141.99 |
pus_Arab | 259.45 | 327.43 | 277.81 | 340.38 | 203.38 |
pxm_Latn | 299.37 | 391.48 | 317.99 | 307.01 | 180.85 |
qub_Latn | 210.38 | 265.76 | 222.82 | 172.89 | 152.70 |
quc_Latn | 248.16 | 320.50 | 271.51 | 258.06 | 187.13 |
que_Latn | 144.31 | 170.69 | 154.62 | 96.53 | 121.19 |
qug_Latn | 176.78 | 225.11 | 187.16 | 136.85 | 143.62 |
quh_Latn | 257.89 | 293.32 | 275.35 | 187.44 | 175.55 |
quw_Latn | 154.10 | 205.67 | 162.83 | 142.63 | 142.35 |
quy_Latn | 177.21 | 202.67 | 190.48 | 125.92 | 139.15 |
quz_Latn | 180.20 | 211.40 | 192.52 | 123.67 | 142.21 |
qvi_Latn | 178.08 | 234.53 | 188.58 | 156.22 | 145.79 |
rap_Latn | 204.53 | 354.21 | 219.29 | 226.89 | 158.90 |
rar_Latn | 169.22 | 249.96 | 191.91 | 189.56 | 168.88 |
rmn_Cyrl | 129.46 | 181.44 | 143.84 | 137.02 | 102.76 |
rmn_Grek | 135.82 | 190.47 | 141.78 | 125.21 | 92.56 |
rmn_Latn | 133.75 | 175.58 | 146.05 | 143.75 | 112.55 |
rmy_Cyrl | 135.65 | 184.00 | 147.87 | 137.05 | 109.18 |
rmy_Latn | 189.65 | 244.12 | 198.92 | 205.44 | 168.77 |
rng_Latn | 122.59 | 150.36 | 125.06 | 129.81 | 104.16 |
roh_Latn | 235.38 | 312.78 | 242.57 | 253.77 | 161.16 |
ron_Latn | 44.70 | 84.55 | 68.14 | 74.76 | 54.82 |
rop_Latn | 233.05 | 351.35 | 257.34 | 275.70 | 155.36 |
rue_Cyrl | 223.89 | 402.99 | 299.90 | 265.32 | 179.38 |
rug_Latn | 257.50 | 348.10 | 277.13 | 275.47 | 169.94 |
run_Latn | 184.59 | 218.12 | 161.96 | 207.06 | 157.49 |
rus_Cyrl | 65.34 | 155.39 | 116.17 | 67.59 | 84.56 |
sag_Latn | 162.87 | 194.78 | 175.45 | 155.14 | 149.65 |
sah_Cyrl | 383.55 | 455.30 | 382.36 | 423.03 | 218.84 |
san_Deva | 182.35 | 287.49 | 189.83 | 201.00 | 186.46 |
san_Latn | 242.46 | 324.45 | 278.75 | 282.93 | 199.18 |
Lang | LLaMA 2-7B | mGPT-13B | BLOOM-7B1 | XGLM-7.5B | MALA-500 |
sat_Olck | 654.37 | 3377.97 | 667.66 | 40.17 | 311.96 |
sba_Latn | 272.45 | 372.47 | 303.48 | 293.62 | 167.13 |
scn_Latn | 236.20 | 355.24 | 263.10 | 270.02 | 191.69 |
sco_Latn | 147.94 | 341.79 | 193.24 | 193.20 | 170.39 |
seh_Latn | 173.46 | 231.41 | 177.38 | 174.40 | 138.70 |
sgs_Latn | 248.33 | 313.78 | 251.35 | 277.73 | 182.16 |
sid_Latn | 135.53 | 180.29 | 147.72 | 139.17 | 114.24 |
sin_Sinh | 82.29 | 173.16 | 114.00 | 137.98 | 70.77 |
skg_Latn | 128.02 | 172.67 | 131.34 | 145.32 | 116.16 |
slk_Latn | 62.89 | 116.82 | 86.67 | 103.39 | 63.93 |
slv_Latn | 42.18 | 85.28 | 64.75 | 73.49 | 55.26 |
sme_Latn | 288.98 | 357.31 | 301.64 | 295.23 | 205.46 |
smo_Latn | 220.26 | 338.16 | 250.74 | 252.76 | 190.00 |
sna_Latn | 221.02 | 311.60 | 221.92 | 258.38 | 189.74 |
snd_Arab | 209.83 | 264.61 | 217.96 | 260.53 | 163.07 |
som_Arab | 230.91 | 410.59 | 192.88 | 175.01 | 265.88 |
som_Latn | 235.21 | 346.36 | 286.69 | 312.99 | 212.51 |
sop_Latn | 176.17 | 207.78 | 188.41 | 157.21 | 167.90 |
sot_Latn | 200.82 | 271.71 | 205.51 | 235.65 | 157.18 |
spa_Latn | 37.28 | 70.48 | 34.26 | 38.65 | 53.39 |
sqi_Latn | 207.58 | 295.58 | 241.22 | 296.90 | 172.78 |
srd_Latn | 228.12 | 341.00 | 242.74 | 251.01 | 179.87 |
srm_Latn | 229.46 | 318.77 | 250.79 | 246.75 | 173.83 |
srn_Latn | 161.18 | 183.34 | 171.30 | 179.59 | 132.77 |
srp_Cyrl | 45.22 | 100.88 | 77.59 | 81.95 | 57.85 |
srp_Latn | 33.66 | 57.89 | 43.91 | 46.74 | 42.31 |
ssw_Latn | 194.10 | 264.22 | 212.99 | 230.20 | 165.70 |
sun_Latn | 220.72 | 314.99 | 228.18 | 237.07 | 203.32 |
suz_Deva | 255.00 | 400.13 | 262.34 | 257.16 | 157.30 |
swa_Latn | 156.02 | 208.21 | 125.78 | 94.55 | 151.68 |
swc_Latn | 103.75 | 133.69 | 98.32 | 71.66 | 102.14 |
swe_Latn | 42.72 | 82.20 | 68.89 | 60.92 | 56.18 |
swh_Latn | 178.28 | 223.65 | 151.05 | 97.98 | 161.49 |
sxn_Latn | 243.81 | 346.98 | 263.76 | 260.44 | 183.47 |
szl_Latn | 132.77 | 348.45 | 156.37 | 177.33 | 111.32 |
tah_Latn | 114.41 | 158.18 | 124.60 | 121.22 | 101.40 |
tam_Taml | 231.12 | 444.83 | 152.34 | 146.94 | 205.53 |
tat_Cyrl | 251.96 | 301.03 | 256.66 | 276.29 | 159.44 |
tat_Latn | 248.71 | 338.00 | 261.10 | 278.92 | 186.84 |
tbz_Latn | 273.90 | 352.17 | 299.25 | 281.11 | 164.62 |
tca_Latn | 306.13 | 452.15 | 328.77 | 316.81 | 174.51 |
tcf_Latn | 133.72 | 193.63 | 138.67 | 133.58 | 102.94 |
tdt_Latn | 158.16 | 217.96 | 172.56 | 182.04 | 130.27 |
tdx_Latn | 125.88 | 167.70 | 130.54 | 135.72 | 113.29 |
tel_Telu | 94.93 | 152.33 | 54.92 | 47.52 | 72.02 |
teo_Latn | 193.42 | 250.17 | 206.10 | 193.68 | 159.90 |
tgk_Cyrl | 313.76 | 369.08 | 333.83 | 342.42 | 196.57 |
tgk_Latn | 296.86 | 412.46 | 342.18 | 352.59 | 248.69 |
tgl_Latn | 56.44 | 94.00 | 76.30 | 77.15 | 64.98 |
tha_Thai | 192.70 | 331.25 | 242.28 | 116.12 | 175.60 |
tih_Latn | 233.30 | 329.24 | 255.15 | 254.70 | 158.13 |
tir_Ethi | 267.84 | 579.39 | 319.29 | 424.77 | 189.73 |
tiv_Latn | 133.38 | 168.19 | 140.43 | 126.42 | 116.08 |
tlh_Latn | 163.23 | 258.64 | 183.94 | 184.72 | 111.43 |
tll_Latn | 138.57 | 167.75 | 152.44 | 126.10 | 105.23 |
tob_Latn | 299.95 | 450.25 | 316.77 | 324.19 | 182.95 |
tog_Latn | 127.47 | 165.93 | 133.37 | 115.35 | 102.88 |
toh_Latn | 181.85 | 238.80 | 196.33 | 194.76 | 146.50 |
toi_Latn | 185.04 | 233.23 | 194.93 | 164.94 | 165.33 |
toj_Latn | 232.66 | 311.24 | 239.53 | 236.11 | 198.17 |
tok_Latn | 46.19 | 61.55 | 50.56 | 43.88 | 47.57 |
ton_Latn | 172.88 | 243.40 | 178.94 | 190.23 | 141.17 |
top_Latn | 221.27 | 303.90 | 232.90 | 223.12 | 212.88 |
tpi_Latn | 139.90 | 209.92 | 155.89 | 170.67 | 120.65 |
tpm_Latn | 214.33 | 280.83 | 241.97 | 231.70 | 154.99 |
tsc_Latn | 131.42 | 150.62 | 130.29 | 132.42 | 104.44 |
tsn_Latn | 209.69 | 291.69 | 203.77 | 245.18 | 169.85 |
tso_Latn | 182.87 | 208.89 | 176.69 | 194.90 | 142.27 |
Lang | LLaMA 2-7B | mGPT-13B | BLOOM-7B1 | XGLM-7.5B | MALA-500 |
tsz_Latn | 183.82 | 253.97 | 200.16 | 176.35 | 153.98 |
ttj_Latn | 133.35 | 174.08 | 142.10 | 146.60 | 112.98 |
tuc_Latn | 325.34 | 444.23 | 346.52 | 291.84 | 180.84 |
tui_Latn | 247.40 | 330.20 | 266.54 | 265.71 | 181.39 |
tuk_Cyrl | 196.40 | 248.01 | 210.45 | 219.39 | 143.19 |
tuk_Latn | 217.31 | 235.95 | 217.78 | 238.66 | 155.71 |
tum_Latn | 184.51 | 236.91 | 190.41 | 153.36 | 153.44 |
tur_Latn | 48.52 | 66.76 | 60.61 | 34.71 | 63.33 |
tvl_Latn | 114.81 | 156.00 | 123.30 | 121.62 | 97.96 |
twi_Latn | 169.99 | 229.39 | 171.42 | 190.50 | 139.81 |
twx_Latn | 123.52 | 172.96 | 130.82 | 135.08 | 106.56 |
tyv_Cyrl | 270.89 | 314.09 | 275.97 | 304.11 | 174.60 |
tzh_Latn | 195.49 | 274.47 | 208.05 | 202.10 | 162.63 |
tzo_Latn | 223.14 | 324.35 | 237.54 | 228.92 | 173.78 |
udm_Cyrl | 222.45 | 277.14 | 231.98 | 219.71 | 160.30 |
uig_Arab | 336.01 | 432.84 | 320.43 | 463.38 | 207.25 |
uig_Latn | 254.59 | 292.36 | 270.85 | 285.12 | 203.29 |
ukr_Cyrl | 101.99 | 240.89 | 173.79 | 160.03 | 136.57 |
umb_Latn | 129.60 | 165.59 | 135.06 | 139.75 | 100.07 |
urd_Arab | 77.96 | 105.77 | 53.61 | 51.92 | 81.62 |
urh_Latn | 145.52 | 153.19 | 164.21 | 161.54 | 108.55 |
uzb_Cyrl | 307.70 | 353.00 | 332.86 | 314.77 | 178.07 |
uzb_Latn | 307.44 | 363.61 | 357.04 | 383.26 | 220.74 |
uzn_Cyrl | 233.89 | 270.06 | 254.96 | 247.92 | 145.01 |
vec_Latn | 163.22 | 261.93 | 181.25 | 168.76 | 170.03 |
ven_Latn | 190.45 | 233.75 | 198.94 | 198.18 | 151.65 |
vep_Latn | 316.12 | 456.77 | 326.76 | 243.40 | 192.08 |
vie_Latn | 108.65 | 169.92 | 86.74 | 91.41 | 138.89 |
vls_Latn | 200.17 | 292.89 | 242.66 | 253.44 | 171.13 |
vmw_Latn | 141.25 | 176.25 | 143.12 | 107.10 | 102.92 |
vol_Latn | 94.00 | 260.01 | 85.47 | 87.18 | 83.77 |
wal_Latn | 190.62 | 261.79 | 201.98 | 177.73 | 158.07 |
war_Latn | 127.41 | 249.86 | 146.46 | 166.29 | 153.84 |
wbm_Latn | 222.06 | 311.86 | 234.78 | 240.27 | 150.33 |
wes_Latn | 64.78 | 106.54 | 73.37 | 73.73 | 86.61 |
wls_Latn | 114.80 | 157.93 | 125.63 | 124.58 | 99.38 |
wol_Latn | 197.17 | 251.63 | 171.70 | 208.78 | 173.01 |
wuu_Hani | 152.90 | 283.11 | 127.83 | 152.82 | 145.05 |
xav_Latn | 350.22 | 619.11 | 379.76 | 371.80 | 201.63 |
xho_Latn | 224.10 | 315.12 | 219.57 | 265.57 | 187.35 |
xmf_Geor | 260.61 | 315.58 | 316.49 | 376.33 | 170.15 |
xmv_Latn | 125.37 | 168.73 | 129.48 | 139.97 | 111.94 |
yan_Latn | 228.46 | 314.62 | 248.18 | 243.68 | 165.66 |
yao_Latn | 196.25 | 253.72 | 209.77 | 198.91 | 166.06 |
yap_Latn | 197.98 | 274.54 | 212.39 | 209.00 | 169.09 |
yid_Hebr | 437.75 | 571.08 | 480.37 | 590.32 | 295.70 |
yom_Latn | 176.11 | 220.86 | 184.62 | 189.29 | 150.95 |
yor_Latn | 233.75 | 283.33 | 193.55 | 286.20 | 185.60 |
yua_Latn | 195.86 | 284.05 | 208.08 | 205.70 | 161.16 |
yue_Hani | 74.79 | 131.83 | 62.91 | 83.80 | 74.28 |
zai_Latn | 170.49 | 223.03 | 179.18 | 188.38 | 148.03 |
zea_Latn | 174.18 | 271.42 | 212.95 | 222.74 | 155.52 |
zho_Hani | 57.89 | 99.40 | 48.19 | 55.24 | 70.80 |
zlm_Latn | 106.37 | 176.09 | 92.63 | 93.81 | 118.56 |
zne_Latn | 127.57 | 167.13 | 134.43 | 115.53 | 104.95 |
zom_Latn | 214.60 | 277.57 | 233.64 | 228.48 | 170.06 |
zpa_Latn | 127.29 | 180.39 | 129.07 | 132.30 | 107.04 |
zsm_Latn | 102.42 | 171.64 | 92.39 | 94.59 | 123.31 |
zul_Latn | 208.94 | 340.58 | 235.91 | 257.18 | 192.84 |
all | 190.58 | 282.46 | 202.95 | 205.07 | 151.25 |
Lang | LLaMA 2-7B | mGPT-13B | BLOOM-7B1 | XGLM-7.5B | MALA-500 |
ace_Latn | 137.43 | 196.93 | 144.50 | 152.49 | 97.89 |
ach_Latn | 113.66 | 152.08 | 123.29 | 125.31 | 102.45 |
acr_Latn | 177.86 | 233.22 | 188.27 | 182.33 | 114.27 |
afr_Latn | 80.43 | 132.25 | 116.34 | 129.21 | 95.33 |
agw_Latn | 130.32 | 186.58 | 136.17 | 138.27 | 95.93 |
ahk_Latn | 175.75 | 291.31 | 187.63 | 179.54 | 116.76 |
aka_Latn | 98.41 | 135.74 | 99.46 | 108.01 | 78.20 |
aln_Latn | 101.54 | 147.77 | 115.11 | 139.73 | 82.71 |
als_Latn | 93.47 | 134.99 | 106.68 | 127.57 | 78.53 |
alt_Cyrl | 122.23 | 146.47 | 125.04 | 134.88 | 90.21 |
alz_Latn | 107.41 | 139.39 | 116.48 | 109.62 | 102.48 |
amh_Ethi | 100.60 | 255.43 | 121.36 | 161.34 | 98.24 |
aoj_Latn | 175.25 | 270.87 | 185.66 | 171.29 | 114.19 |
arb_Arab | 94.03 | 186.47 | 77.67 | 78.12 | 104.22 |
arn_Latn | 141.08 | 205.53 | 143.63 | 154.61 | 113.59 |
ary_Arab | 128.97 | 212.49 | 125.35 | 118.25 | 104.58 |
arz_Arab | 80.59 | 185.56 | 64.91 | 66.52 | 92.22 |
asm_Beng | 123.16 | 196.03 | 79.80 | 147.49 | 101.89 |
ayr_Latn | 149.96 | 188.09 | 154.45 | 157.13 | 106.66 |
azb_Arab | 134.00 | 160.29 | 139.49 | 144.68 | 93.09 |
aze_Latn | 97.68 | 131.80 | 113.03 | 106.38 | 90.96 |
bak_Cyrl | 134.49 | 169.96 | 133.79 | 150.46 | 93.49 |
bam_Latn | 109.68 | 147.72 | 110.24 | 118.13 | 91.88 |
ban_Latn | 138.98 | 195.92 | 147.93 | 149.04 | 111.51 |
bar_Latn | 114.49 | 154.37 | 121.74 | 113.26 | 108.65 |
bba_Latn | 132.00 | 166.51 | 146.31 | 131.24 | 96.87 |
bbc_Latn | 110.66 | 143.87 | 117.12 | 107.02 | 100.17 |
bci_Latn | 117.42 | 156.47 | 125.70 | 124.26 | 126.80 |
bcl_Latn | 101.39 | 146.46 | 109.03 | 116.78 | 88.46 |
bel_Cyrl | 92.30 | 137.26 | 110.12 | 118.30 | 88.89 |
bem_Latn | 125.52 | 158.97 | 135.60 | 104.16 | 107.57 |
ben_Beng | 111.68 | 194.50 | 68.00 | 77.83 | 105.61 |
bhw_Latn | 124.94 | 169.40 | 130.40 | 123.48 | 101.65 |
bim_Latn | 124.64 | 162.78 | 132.77 | 130.01 | 96.33 |
bis_Latn | 126.46 | 196.19 | 136.72 | 148.29 | 95.85 |
bod_Tibt | 138.16 | 525.70 | 144.33 | 30.40 | 105.99 |
bqc_Latn | 113.18 | 149.13 | 122.07 | 112.76 | 91.11 |
bre_Latn | 120.49 | 151.33 | 111.97 | 139.13 | 105.99 |
bts_Latn | 111.90 | 154.57 | 120.61 | 110.17 | 89.16 |
btx_Latn | 118.13 | 163.25 | 128.43 | 125.76 | 103.19 |
bul_Cyrl | 66.25 | 124.78 | 104.01 | 42.33 | 85.30 |
bum_Latn | 116.16 | 153.66 | 121.83 | 121.74 | 101.82 |
bzj_Latn | 115.75 | 175.63 | 128.70 | 135.59 | 93.15 |
cab_Latn | 164.07 | 215.31 | 172.22 | 174.35 | 123.20 |
cac_Latn | 169.42 | 231.73 | 176.29 | 175.63 | 116.03 |
cak_Latn | 185.42 | 246.76 | 193.62 | 191.54 | 123.65 |
caq_Latn | 128.13 | 174.12 | 141.21 | 138.17 | 95.54 |
cat_Latn | 54.93 | 118.69 | 44.29 | 45.98 | 76.47 |
cbk_Latn | 103.50 | 154.23 | 105.08 | 108.15 | 91.19 |
cce_Latn | 124.20 | 159.68 | 133.40 | 132.89 | 106.02 |
ceb_Latn | 99.37 | 146.70 | 113.69 | 132.72 | 94.43 |
ces_Latn | 62.40 | 133.26 | 101.82 | 114.59 | 86.91 |
cfm_Latn | 138.07 | 179.26 | 142.58 | 143.43 | 107.20 |
che_Cyrl | 152.68 | 188.52 | 146.76 | 148.42 | 126.87 |
chk_Latn | 128.34 | 180.14 | 133.81 | 134.01 | 97.76 |
chv_Cyrl | 132.89 | 166.12 | 138.37 | 128.58 | 91.96 |
ckb_Arab | 126.47 | 155.90 | 125.59 | 164.22 | 100.65 |
cmn_Hani | 63.67 | 121.22 | 51.49 | 60.91 | 76.95 |
cnh_Latn | 129.26 | 175.83 | 134.65 | 139.53 | 104.21 |
crh_Cyrl | 128.56 | 166.14 | 128.91 | 139.13 | 82.61 |
crs_Latn | 100.72 | 139.95 | 101.88 | 57.70 | 80.86 |
csy_Latn | 125.81 | 172.44 | 138.22 | 132.16 | 100.90 |
ctd_Latn | 120.85 | 163.07 | 128.99 | 125.52 | 92.79 |
ctu_Latn | 156.04 | 220.78 | 162.45 | 157.63 | 112.41 |
cuk_Latn | 151.95 | 213.08 | 159.59 | 156.10 | 119.01 |
cym_Latn | 110.34 | 165.10 | 135.91 | 147.72 | 103.89 |
Lang | LLaMA 2-7B | mGPT-13B | BLOOM-7B1 | XGLM-7.5B | MALA-500 |
dan_Latn | 63.65 | 114.43 | 97.95 | 101.13 | 86.68 |
deu_Latn | 57.09 | 109.69 | 84.08 | 54.00 | 80.90 |
djk_Latn | 143.19 | 192.80 | 147.13 | 153.74 | 120.66 |
dln_Latn | 113.43 | 155.19 | 118.82 | 125.73 | 92.37 |
dtp_Latn | 158.44 | 222.01 | 165.46 | 169.63 | 111.77 |
dyu_Latn | 122.24 | 161.61 | 126.53 | 132.73 | 103.04 |
dzo_Tibt | 157.37 | 550.44 | 162.42 | 36.35 | 99.22 |
efi_Latn | 121.73 | 173.61 | 139.58 | 136.78 | 90.15 |
ell_Grek | 80.65 | 169.16 | 109.07 | 57.11 | 105.74 |
eng_Latn | 28.40 | 93.81 | 40.01 | 42.56 | 46.91 |
enm_Latn | 45.43 | 113.74 | 62.99 | 66.87 | 55.22 |
epo_Latn | 79.83 | 125.27 | 88.81 | 100.79 | 85.24 |
est_Latn | 93.49 | 128.66 | 109.45 | 45.04 | 99.10 |
eus_Latn | 133.89 | 145.19 | 101.06 | 78.92 | 150.43 |
ewe_Latn | 140.69 | 190.49 | 147.85 | 133.36 | 103.15 |
fao_Latn | 101.92 | 150.02 | 113.16 | 134.84 | 93.21 |
fas_Arab | 87.19 | 121.18 | 99.32 | 104.15 | 77.85 |
fij_Latn | 110.29 | 158.90 | 130.18 | 97.89 | 97.65 |
fil_Latn | 74.66 | 130.09 | 106.51 | 109.03 | 84.32 |
fin_Latn | 68.42 | 125.52 | 116.35 | 38.52 | 91.75 |
fon_Latn | 160.80 | 210.76 | 176.23 | 178.40 | 107.16 |
fra_Latn | 46.01 | 105.73 | 38.57 | 44.16 | 73.33 |
fry_Latn | 111.69 | 146.88 | 111.45 | 123.28 | 100.32 |
gaa_Latn | 128.54 | 165.53 | 145.88 | 107.90 | 100.68 |
gil_Latn | 125.22 | 171.28 | 131.71 | 130.68 | 106.24 |
giz_Latn | 131.75 | 183.07 | 145.21 | 143.35 | 97.84 |
gkn_Latn | 151.99 | 210.57 | 167.40 | 166.78 | 116.75 |
gkp_Latn | 159.33 | 219.00 | 168.31 | 166.05 | 110.30 |
gla_Latn | 102.90 | 174.06 | 129.10 | 138.42 | 100.51 |
gle_Latn | 102.09 | 161.80 | 132.57 | 146.14 | 116.86 |
glv_Latn | 122.94 | 172.06 | 126.34 | 134.37 | 98.35 |
gom_Latn | 149.35 | 199.59 | 155.54 | 159.44 | 129.81 |
gor_Latn | 156.67 | 215.11 | 170.13 | 167.89 | 115.02 |
grc_Grek | 64.91 | 153.70 | 93.39 | 68.67 | 81.49 |
guc_Latn | 193.75 | 271.60 | 202.31 | 190.66 | 138.75 |
gug_Latn | 139.06 | 183.84 | 146.45 | 151.28 | 114.14 |
guj_Gujr | 121.18 | 329.05 | 86.23 | 202.19 | 107.88 |
gur_Latn | 143.42 | 208.51 | 152.80 | 148.18 | 106.41 |
guw_Latn | 142.60 | 155.00 | 158.22 | 166.16 | 98.92 |
gya_Latn | 130.25 | 197.61 | 146.23 | 137.31 | 99.85 |
gym_Latn | 180.93 | 262.58 | 196.74 | 161.03 | 135.73 |
hat_Latn | 112.20 | 159.68 | 116.00 | 48.45 | 90.71 |
hau_Latn | 105.95 | 146.45 | 117.21 | 127.18 | 96.63 |
haw_Latn | 91.42 | 140.04 | 102.87 | 102.50 | 91.03 |
heb_Hebr | 86.85 | 197.96 | 113.81 | 125.21 | 143.56 |
hif_Latn | 104.78 | 161.10 | 114.69 | 116.63 | 107.93 |
hil_Latn | 103.93 | 151.84 | 112.82 | 130.28 | 90.13 |
hin_Deva | 87.35 | 175.19 | 62.49 | 63.21 | 103.09 |
hin_Latn | 102.01 | 144.04 | 112.84 | 112.96 | 109.68 |
hmo_Latn | 119.64 | 179.32 | 128.46 | 103.09 | 91.86 |
hne_Deva | 124.72 | 183.69 | 106.59 | 120.10 | 94.27 |
hnj_Latn | 126.88 | 186.09 | 144.08 | 149.87 | 89.64 |
hra_Latn | 116.66 | 151.27 | 122.49 | 122.14 | 96.72 |
hrv_Latn | 62.52 | 125.68 | 96.82 | 107.18 | 73.96 |
hui_Latn | 151.46 | 203.46 | 161.05 | 161.36 | 108.54 |
hun_Latn | 69.17 | 118.92 | 117.60 | 125.55 | 94.04 |
hus_Latn | 170.91 | 241.76 | 179.70 | 177.42 | 120.81 |
hye_Armn | 111.94 | 219.94 | 141.97 | 171.24 | 89.75 |
iba_Latn | 102.40 | 135.32 | 109.00 | 102.90 | 87.43 |
ibo_Latn | 131.16 | 189.15 | 130.12 | 172.79 | 112.01 |
ifa_Latn | 140.53 | 194.86 | 151.53 | 148.37 | 102.38 |
ifb_Latn | 149.93 | 198.42 | 157.12 | 156.49 | 107.60 |
ikk_Latn | 132.84 | 186.95 | 150.31 | 163.16 | 95.14 |
ilo_Latn | 119.72 | 162.55 | 127.85 | 146.58 | 102.18 |
ind_Latn | 66.39 | 121.78 | 58.14 | 58.36 | 80.77 |
isl_Latn | 92.39 | 137.42 | 113.54 | 123.83 | 94.12 |
ita_Latn | 54.57 | 116.50 | 73.53 | 52.57 | 78.23 |
Lang | LLaMA 2-7B | mGPT-13B | BLOOM-7B1 | XGLM-7.5B | MALA-500 |
ium_Latn | 150.62 | 222.39 | 155.20 | 157.52 | 99.54 |
ixl_Latn | 190.07 | 299.20 | 206.08 | 202.92 | 127.52 |
izz_Latn | 167.28 | 228.45 | 195.19 | 198.57 | 118.78 |
jam_Latn | 119.85 | 181.93 | 134.52 | 139.07 | 96.42 |
jav_Latn | 134.11 | 171.16 | 136.06 | 140.01 | 109.34 |
jpn_Jpan | 67.67 | 114.11 | 84.64 | 61.57 | 88.53 |
kaa_Cyrl | 136.14 | 179.48 | 138.63 | 153.33 | 84.79 |
kaa_Latn | 134.02 | 172.76 | 135.14 | 145.18 | 99.15 |
kab_Latn | 137.81 | 193.54 | 129.87 | 159.45 | 117.96 |
kac_Latn | 141.33 | 187.59 | 150.24 | 163.68 | 110.99 |
kal_Latn | 120.90 | 143.71 | 134.44 | 90.38 | 109.58 |
kan_Knda | 128.60 | 336.06 | 93.77 | 210.99 | 110.09 |
kat_Geor | 103.81 | 132.04 | 144.43 | 155.32 | 93.39 |
kaz_Cyrl | 129.49 | 166.60 | 137.43 | 150.12 | 108.56 |
kbp_Latn | 151.83 | 205.24 | 166.76 | 156.21 | 105.09 |
kek_Latn | 161.79 | 230.77 | 168.62 | 155.46 | 110.43 |
khm_Khmr | 141.48 | 453.97 | 161.21 | 233.38 | 100.53 |
kia_Latn | 122.81 | 171.17 | 136.22 | 131.73 | 95.76 |
kik_Latn | 141.34 | 189.92 | 143.91 | 155.69 | 106.53 |
kin_Latn | 110.75 | 137.92 | 101.14 | 123.88 | 99.96 |
kir_Cyrl | 125.74 | 148.16 | 127.29 | 148.79 | 94.02 |
kjb_Latn | 152.31 | 205.47 | 156.49 | 160.88 | 109.02 |
kjh_Cyrl | 133.84 | 168.82 | 142.31 | 145.53 | 97.43 |
kmm_Latn | 137.88 | 185.46 | 149.16 | 145.91 | 107.81 |
kmr_Cyrl | 139.23 | 182.66 | 137.99 | 142.19 | 103.56 |
kmr_Latn | 120.54 | 149.31 | 124.74 | 136.78 | 96.93 |
knv_Latn | 249.77 | 346.55 | 266.68 | 245.87 | 135.66 |
kor_Hang | 66.58 | 119.14 | 92.53 | 42.28 | 82.45 |
kpg_Latn | 128.18 | 190.92 | 139.68 | 135.05 | 90.65 |
krc_Cyrl | 123.42 | 149.60 | 119.82 | 130.97 | 89.22 |
kri_Latn | 118.15 | 172.62 | 134.67 | 131.33 | 96.69 |
ksd_Latn | 108.75 | 155.22 | 117.82 | 116.91 | 84.44 |
kss_Latn | 248.46 | 385.70 | 269.58 | 224.81 | 174.70 |
ksw_Mymr | 145.34 | 187.44 | 155.38 | 107.97 | 94.71 |
kua_Latn | 118.00 | 142.31 | 125.97 | 104.16 | 99.83 |
lam_Latn | 145.51 | 199.78 | 154.91 | 139.07 | 115.48 |
lao_Laoo | 163.17 | 414.25 | 172.28 | 234.46 | 116.39 |
lat_Latn | 56.98 | 102.85 | 65.60 | 73.77 | 73.15 |
lav_Latn | 90.61 | 119.37 | 103.75 | 114.03 | 94.92 |
ldi_Latn | 118.61 | 161.07 | 122.03 | 124.26 | 112.27 |
leh_Latn | 131.72 | 169.51 | 140.67 | 124.57 | 104.47 |
lhu_Latn | 147.32 | 262.73 | 152.94 | 153.96 | 100.83 |
lin_Latn | 113.81 | 128.30 | 110.20 | 123.56 | 92.52 |
lit_Latn | 92.16 | 120.15 | 107.63 | 123.52 | 97.69 |
loz_Latn | 119.93 | 140.69 | 125.00 | 98.71 | 99.46 |
ltz_Latn | 114.62 | 156.92 | 114.49 | 104.79 | 96.89 |
lug_Latn | 117.59 | 174.83 | 115.12 | 143.99 | 107.58 |
luo_Latn | 118.37 | 158.54 | 129.88 | 126.61 | 108.96 |
lus_Latn | 122.17 | 159.02 | 125.37 | 133.65 | 103.21 |
lzh_Hani | 62.06 | 88.07 | 54.92 | 60.19 | 66.36 |
mad_Latn | 136.26 | 192.90 | 146.43 | 145.94 | 103.63 |
mah_Latn | 113.96 | 159.42 | 120.91 | 110.27 | 97.45 |
mai_Deva | 136.92 | 209.91 | 108.91 | 126.23 | 100.39 |
mal_Mlym | 111.12 | 210.81 | 72.27 | 126.62 | 105.00 |
mam_Latn | 173.35 | 227.62 | 181.33 | 179.63 | 138.57 |
mar_Deva | 105.80 | 184.52 | 83.30 | 141.12 | 106.37 |
mau_Latn | 139.06 | 259.48 | 153.06 | 140.49 | 148.96 |
mbb_Latn | 160.77 | 237.84 | 174.36 | 171.35 | 101.96 |
mck_Latn | 124.95 | 161.95 | 131.37 | 123.87 | 99.72 |
mcn_Latn | 110.95 | 153.55 | 120.44 | 123.48 | 96.39 |
mco_Latn | 203.59 | 285.23 | 205.92 | 192.68 | 159.16 |
mdy_Ethi | 164.72 | 284.41 | 157.66 | 188.38 | 92.89 |
meu_Latn | 111.26 | 152.92 | 120.09 | 103.47 | 91.50 |
mfe_Latn | 99.68 | 136.00 | 98.60 | 55.39 | 80.86 |
mgh_Latn | 131.75 | 181.11 | 140.22 | 136.00 | 118.72 |
mgr_Latn | 126.60 | 154.99 | 129.40 | 106.55 | 108.42 |
Lang | LLaMA 2-7B | mGPT-13B | BLOOM-7B1 | XGLM-7.5B | MALA-500 |
mhr_Cyrl | 122.42 | 160.48 | 119.36 | 127.42 | 100.09 |
min_Latn | 139.41 | 194.79 | 136.22 | 138.87 | 133.30 |
miq_Latn | 129.28 | 182.98 | 144.36 | 141.12 | 104.92 |
mkd_Cyrl | 85.29 | 151.22 | 112.67 | 89.21 | 89.46 |
mlg_Latn | 107.66 | 135.73 | 106.88 | 128.75 | 86.60 |
mlt_Latn | 108.58 | 168.92 | 134.24 | 137.26 | 107.12 |
mos_Latn | 129.97 | 161.07 | 135.30 | 138.61 | 112.98 |
mps_Latn | 196.29 | 283.21 | 212.92 | 204.15 | 126.56 |
mri_Latn | 87.56 | 138.21 | 103.82 | 111.33 | 88.68 |
mrw_Latn | 127.39 | 174.88 | 134.59 | 133.06 | 99.21 |
msa_Latn | 104.71 | 152.69 | 97.60 | 93.04 | 113.32 |
mwm_Latn | 159.30 | 238.34 | 171.46 | 159.27 | 99.80 |
mxv_Latn | 146.98 | 235.76 | 162.84 | 164.53 | 126.52 |
mya_Mymr | 162.62 | 248.51 | 185.69 | 84.78 | 107.92 |
myv_Cyrl | 148.95 | 192.16 | 140.65 | 152.76 | 110.76 |
mzh_Latn | 146.28 | 217.81 | 160.03 | 153.09 | 101.97 |
nan_Latn | 130.85 | 204.44 | 144.08 | 138.29 | 118.30 |
naq_Latn | 126.47 | 179.33 | 139.25 | 135.80 | 100.90 |
nav_Latn | 167.01 | 233.91 | 176.25 | 183.89 | 119.97 |
nbl_Latn | 109.14 | 148.07 | 114.06 | 127.75 | 96.55 |
nch_Latn | 155.09 | 212.74 | 165.40 | 171.74 | 144.74 |
ncj_Latn | 131.14 | 184.55 | 137.38 | 140.86 | 129.63 |
ndc_Latn | 106.50 | 151.16 | 111.70 | 117.94 | 107.15 |
nde_Latn | 106.83 | 152.97 | 114.79 | 133.97 | 100.83 |
ndo_Latn | 132.12 | 162.83 | 138.17 | 107.11 | 105.82 |
nds_Latn | 123.29 | 166.55 | 124.21 | 125.62 | 123.87 |
nep_Deva | 109.47 | 199.05 | 81.70 | 141.10 | 103.11 |
ngu_Latn | 148.78 | 204.17 | 156.73 | 156.92 | 120.15 |
nia_Latn | 135.60 | 192.37 | 143.95 | 130.11 | 111.86 |
nld_Latn | 58.81 | 114.31 | 96.47 | 97.28 | 82.78 |
nmf_Latn | 122.39 | 165.17 | 130.43 | 134.30 | 98.07 |
nnb_Latn | 122.26 | 163.28 | 127.40 | 131.83 | 98.36 |
nno_Latn | 80.33 | 133.43 | 102.92 | 112.53 | 86.17 |
nob_Latn | 61.45 | 126.38 | 100.25 | 98.89 | 80.02 |
nor_Latn | 56.27 | 104.11 | 87.94 | 86.18 | 71.86 |
npi_Deva | 115.63 | 219.43 | 78.62 | 159.24 | 96.97 |
nse_Latn | 116.86 | 157.47 | 124.34 | 116.64 | 109.55 |
nso_Latn | 116.55 | 160.63 | 114.34 | 132.40 | 97.49 |
nya_Latn | 112.30 | 160.76 | 116.85 | 124.27 | 101.20 |
nyn_Latn | 120.67 | 159.71 | 127.46 | 131.05 | 106.34 |
nyy_Latn | 153.10 | 189.04 | 164.69 | 160.66 | 121.06 |
nzi_Latn | 130.01 | 179.62 | 150.60 | 141.28 | 101.29 |
ori_Orya | 148.25 | 392.96 | 91.33 | 296.43 | 98.06 |
ory_Orya | 143.02 | 352.28 | 95.95 | 282.70 | 106.99 |
oss_Cyrl | 140.22 | 182.75 | 141.83 | 139.80 | 97.09 |
ote_Latn | 160.20 | 247.13 | 175.42 | 168.28 | 119.40 |
pag_Latn | 123.49 | 163.26 | 131.05 | 133.56 | 109.90 |
pam_Latn | 117.54 | 163.02 | 121.69 | 130.95 | 103.78 |
pan_Guru | 130.07 | 286.36 | 90.80 | 208.67 | 106.44 |
pap_Latn | 110.09 | 149.82 | 118.80 | 114.73 | 92.08 |
pau_Latn | 125.22 | 178.67 | 132.62 | 131.85 | 104.80 |
pcm_Latn | 76.80 | 127.05 | 89.92 | 91.28 | 79.44 |
pdt_Latn | 124.83 | 175.03 | 129.63 | 126.61 | 97.29 |
pes_Arab | 91.68 | 129.42 | 105.39 | 105.84 | 84.63 |
pis_Latn | 118.50 | 180.76 | 130.26 | 133.14 | 95.70 |
pls_Latn | 147.97 | 217.00 | 152.42 | 153.90 | 104.29 |
plt_Latn | 113.52 | 139.96 | 112.22 | 139.93 | 89.18 |
poh_Latn | 240.95 | 363.61 | 257.65 | 256.21 | 140.88 |
pol_Latn | 61.88 | 111.24 | 97.90 | 107.87 | 85.46 |
pon_Latn | 123.40 | 164.68 | 131.48 | 125.12 | 105.87 |
por_Latn | 53.69 | 106.83 | 42.29 | 45.85 | 75.88 |
prk_Latn | 118.66 | 167.68 | 121.37 | 128.55 | 94.48 |
prs_Arab | 88.26 | 123.81 | 99.63 | 105.09 | 80.28 |
pxm_Latn | 154.30 | 207.27 | 160.81 | 160.27 | 102.81 |
qub_Latn | 133.85 | 172.77 | 139.98 | 107.69 | 93.49 |
quc_Latn | 176.66 | 222.18 | 191.60 | 178.35 | 124.30 |
Lang | LLaMA 2-7B | mGPT-13B | BLOOM-7B1 | XGLM-7.5B | MALA-500 |
qug_Latn | 124.11 | 158.66 | 131.62 | 95.23 | 95.07 |
quh_Latn | 148.83 | 174.81 | 154.34 | 107.04 | 106.24 |
quw_Latn | 104.78 | 139.63 | 109.91 | 95.69 | 92.58 |
quy_Latn | 119.84 | 140.49 | 127.93 | 85.16 | 94.14 |
quz_Latn | 126.18 | 149.08 | 134.60 | 85.68 | 96.32 |
qvi_Latn | 134.03 | 177.51 | 139.73 | 114.64 | 100.81 |
rap_Latn | 139.27 | 239.23 | 152.39 | 157.18 | 100.81 |
rar_Latn | 136.30 | 205.48 | 152.87 | 149.88 | 123.36 |
rmy_Latn | 124.05 | 164.59 | 129.35 | 132.97 | 108.84 |
ron_Latn | 71.75 | 145.55 | 113.22 | 136.42 | 92.16 |
rop_Latn | 141.24 | 218.11 | 152.37 | 163.59 | 93.46 |
rug_Latn | 144.21 | 200.64 | 155.21 | 151.73 | 99.72 |
run_Latn | 111.11 | 140.11 | 101.95 | 120.78 | 99.61 |
rus_Cyrl | 57.09 | 115.06 | 85.44 | 48.93 | 78.66 |
sag_Latn | 118.57 | 144.91 | 123.32 | 113.66 | 101.74 |
sah_Cyrl | 140.83 | 175.34 | 139.86 | 155.36 | 99.78 |
san_Deva | 120.11 | 183.77 | 128.71 | 131.38 | 123.45 |
san_Latn | 133.78 | 188.35 | 151.17 | 152.84 | 112.82 |
sba_Latn | 147.44 | 205.90 | 167.36 | 154.66 | 98.05 |
seh_Latn | 116.71 | 159.73 | 123.08 | 121.65 | 100.51 |
sin_Sinh | 133.79 | 283.43 | 166.13 | 228.72 | 113.72 |
slk_Latn | 75.89 | 141.01 | 105.13 | 123.74 | 89.45 |
slv_Latn | 75.67 | 140.40 | 111.88 | 127.31 | 95.15 |
sme_Latn | 134.17 | 166.51 | 131.28 | 132.85 | 103.75 |
smo_Latn | 113.64 | 165.04 | 126.68 | 127.57 | 96.65 |
sna_Latn | 107.03 | 157.69 | 112.48 | 124.30 | 99.14 |
snd_Arab | 141.08 | 183.48 | 144.96 | 173.65 | 107.47 |
som_Latn | 114.80 | 163.60 | 131.06 | 149.83 | 110.34 |
sop_Latn | 120.92 | 148.69 | 129.81 | 113.62 | 117.37 |
sot_Latn | 112.14 | 155.55 | 113.59 | 127.04 | 95.35 |
spa_Latn | 49.64 | 107.41 | 43.22 | 48.95 | 69.30 |
sqi_Latn | 106.17 | 145.44 | 116.56 | 140.41 | 92.13 |
srm_Latn | 172.30 | 242.13 | 187.65 | 185.42 | 124.54 |
srn_Latn | 112.06 | 137.43 | 113.65 | 121.74 | 91.24 |
srp_Cyrl | 57.16 | 129.53 | 97.17 | 99.06 | 71.36 |
srp_Latn | 61.53 | 124.70 | 95.02 | 105.00 | 71.54 |
ssw_Latn | 120.48 | 172.73 | 132.69 | 140.25 | 104.17 |
sun_Latn | 123.92 | 165.15 | 124.81 | 129.90 | 111.93 |
suz_Deva | 141.06 | 222.01 | 143.66 | 139.17 | 93.18 |
swe_Latn | 60.78 | 124.59 | 105.53 | 99.90 | 86.99 |
swh_Latn | 97.92 | 131.52 | 87.87 | 54.27 | 90.87 |
sxn_Latn | 173.04 | 249.27 | 188.33 | 183.25 | 124.96 |
tam_Taml | 109.64 | 213.05 | 70.91 | 64.81 | 100.45 |
tat_Cyrl | 136.52 | 167.63 | 136.15 | 147.39 | 94.42 |
tbz_Latn | 135.12 | 176.55 | 145.64 | 137.50 | 88.42 |
tca_Latn | 202.44 | 294.68 | 215.39 | 207.66 | 112.33 |
tdt_Latn | 114.70 | 164.66 | 123.50 | 129.26 | 93.86 |
tel_Telu | 122.18 | 196.41 | 91.03 | 65.40 | 115.17 |
teo_Latn | 115.64 | 157.71 | 122.24 | 117.99 | 99.95 |
tgk_Cyrl | 128.86 | 144.47 | 127.10 | 140.25 | 101.04 |
tgl_Latn | 74.71 | 130.27 | 109.14 | 110.56 | 85.29 |
tha_Thai | 107.69 | 187.16 | 134.02 | 58.96 | 101.09 |
tih_Latn | 129.95 | 188.97 | 139.91 | 137.24 | 89.82 |
tir_Ethi | 122.75 | 258.00 | 143.76 | 190.93 | 99.03 |
tlh_Latn | 87.59 | 142.90 | 97.02 | 97.38 | 62.55 |
tob_Latn | 179.90 | 269.36 | 189.99 | 191.60 | 107.12 |
toh_Latn | 127.60 | 171.62 | 136.60 | 136.06 | 104.79 |
toi_Latn | 124.75 | 166.04 | 133.32 | 114.25 | 114.54 |
toj_Latn | 175.52 | 237.75 | 181.56 | 177.72 | 148.72 |
ton_Latn | 120.61 | 179.89 | 125.92 | 137.67 | 98.37 |
top_Latn | 165.19 | 223.46 | 174.82 | 173.24 | 164.19 |
tpi_Latn | 105.47 | 161.38 | 117.02 | 128.35 | 84.22 |
tpm_Latn | 120.52 | 166.93 | 131.95 | 129.07 | 89.91 |
tsn_Latn | 112.13 | 163.36 | 113.76 | 129.32 | 96.63 |
tso_Latn | 125.25 | 155.16 | 120.37 | 134.66 | 103.51 |
tsz_Latn | 129.96 | 184.88 | 142.29 | 126.39 | 110.66 |
tuc_Latn | 187.91 | 261.93 | 196.20 | 166.43 | 106.46 |
Lang | LLaMA 2-7B | mGPT-13B | BLOOM-7B1 | XGLM-7.5B | MALA-500 |
tui_Latn | 135.41 | 187.71 | 146.21 | 146.43 | 107.57 |
tuk_Cyrl | 127.20 | 168.02 | 136.69 | 145.86 | 94.72 |
tuk_Latn | 123.72 | 144.22 | 124.67 | 135.21 | 97.42 |
tum_Latn | 127.49 | 165.45 | 130.05 | 109.19 | 102.35 |
tur_Latn | 76.97 | 118.11 | 102.96 | 57.95 | 99.22 |
twi_Latn | 110.21 | 159.12 | 110.54 | 122.43 | 93.81 |
tyv_Cyrl | 165.82 | 197.37 | 164.25 | 181.87 | 107.33 |
tzh_Latn | 147.06 | 205.37 | 157.00 | 148.16 | 118.46 |
tzo_Latn | 166.45 | 248.81 | 178.03 | 173.52 | 122.42 |
udm_Cyrl | 138.00 | 176.90 | 140.39 | 137.21 | 102.56 |
uig_Arab | 166.57 | 226.61 | 157.03 | 229.43 | 114.04 |
uig_Latn | 145.11 | 165.68 | 156.02 | 157.78 | 121.77 |
ukr_Cyrl | 68.45 | 134.95 | 101.40 | 93.65 | 92.95 |
urd_Arab | 99.74 | 141.13 | 74.20 | 63.53 | 110.49 |
uzb_Cyrl | 128.48 | 149.94 | 136.85 | 135.60 | 88.90 |
uzb_Latn | 118.83 | 138.99 | 132.96 | 145.00 | 95.19 |
uzn_Cyrl | 136.04 | 160.00 | 145.95 | 142.34 | 94.12 |
ven_Latn | 131.18 | 172.05 | 138.68 | 137.88 | 104.82 |
vie_Latn | 74.42 | 115.85 | 56.37 | 59.30 | 91.10 |
wal_Latn | 129.99 | 180.12 | 134.43 | 122.12 | 105.68 |
war_Latn | 111.26 | 159.23 | 118.85 | 131.06 | 113.74 |
wbm_Latn | 120.96 | 174.48 | 126.11 | 128.99 | 94.78 |
wol_Latn | 115.67 | 154.99 | 101.97 | 127.09 | 102.93 |
xav_Latn | 243.35 | 430.83 | 263.49 | 257.10 | 137.76 |
xho_Latn | 112.96 | 155.63 | 109.11 | 135.39 | 107.43 |
yan_Latn | 125.81 | 179.37 | 136.34 | 131.31 | 98.47 |
yao_Latn | 143.68 | 187.96 | 148.92 | 143.16 | 114.54 |
yap_Latn | 150.28 | 207.86 | 157.67 | 157.91 | 123.07 |
yom_Latn | 118.61 | 155.58 | 121.64 | 126.20 | 100.18 |
yor_Latn | 129.77 | 166.68 | 100.87 | 155.88 | 105.79 |
yua_Latn | 148.34 | 218.12 | 155.65 | 156.40 | 118.30 |
yue_Hani | 64.57 | 122.47 | 54.42 | 62.71 | 87.78 |
zai_Latn | 121.90 | 161.31 | 121.61 | 129.12 | 108.99 |
zho_Hani | 64.02 | 115.19 | 51.79 | 63.22 | 69.53 |
zlm_Latn | 57.83 | 101.11 | 48.96 | 51.87 | 64.76 |
zom_Latn | 119.86 | 159.31 | 128.06 | 125.99 | 98.96 |
zsm_Latn | 60.40 | 110.60 | 51.75 | 52.51 | 70.43 |
zul_Latn | 103.20 | 157.55 | 113.26 | 130.35 | 98.06 |
all | 122.10 | 180.54 | 129.55 | 131.31 | 101.67 |
Lang | LLaMA 2 7B | mGPT 13B | BLOOM 7B1 | XGLM 7.5B | MaLA-500 | |||||||||
1-shot | 2-shot | 3-shot | 4-shot | 5-shot | 6-shot | 7-shot | 8-shot | 9-shot | 10-shot | |||||
ace_Latn | 44.12 | 47.55 | 50.00 | 36.76 | 34.31 | 52.94 | 60.29 | 60.78 | 65.69 | 67.65 | 64.22 | 65.20 | 68.63 | 71.57 |
acm_Arab | 52.45 | 65.69 | 69.12 | 58.33 | 32.35 | 53.43 | 59.31 | 63.73 | 63.73 | 67.16 | 66.67 | 69.12 | 66.67 | 66.67 |
afr_Latn | 68.14 | 55.39 | 53.92 | 40.20 | 41.18 | 62.25 | 65.69 | 69.12 | 71.08 | 74.02 | 73.53 | 74.51 | 76.96 | 78.92 |
ajp_Arab | 47.55 | 64.22 | 68.63 | 53.43 | 33.33 | 56.86 | 59.80 | 59.80 | 65.20 | 63.73 | 63.24 | 69.12 | 68.14 | 66.67 |
als_Latn | 41.67 | 46.57 | 45.59 | 28.43 | 27.94 | 51.96 | 63.73 | 62.25 | 69.12 | 71.08 | 69.61 | 72.06 | 75.98 | 77.45 |
amh_Ethi | 15.69 | 18.63 | 16.67 | 13.24 | 25.00 | 36.76 | 45.59 | 51.47 | 51.96 | 53.92 | 51.96 | 53.43 | 54.90 | 53.92 |
apc_Arab | 46.57 | 65.69 | 68.14 | 53.43 | 31.37 | 55.88 | 60.29 | 65.69 | 65.69 | 67.16 | 65.20 | 68.63 | 67.65 | 72.06 |
arb_Arab | 53.43 | 63.24 | 68.14 | 57.35 | 32.35 | 54.90 | 60.29 | 63.73 | 65.20 | 68.14 | 67.16 | 69.12 | 68.63 | 70.59 |
ary_Arab | 45.10 | 57.84 | 69.12 | 50.49 | 26.47 | 52.45 | 55.39 | 56.37 | 60.29 | 59.80 | 64.22 | 63.73 | 59.31 | 64.71 |
arz_Arab | 50.98 | 64.22 | 68.14 | 56.86 | 30.88 | 52.45 | 59.31 | 60.78 | 64.22 | 66.18 | 68.14 | 69.12 | 66.67 | 69.12 |
asm_Beng | 17.16 | 49.02 | 61.27 | 37.25 | 31.37 | 53.43 | 58.82 | 65.20 | 67.65 | 67.65 | 68.14 | 67.65 | 67.65 | 67.65 |
ast_Latn | 69.12 | 60.78 | 69.12 | 55.39 | 34.31 | 65.69 | 70.10 | 70.59 | 74.02 | 75.00 | 75.98 | 77.94 | 79.90 | 79.90 |
ayr_Latn | 25.00 | 26.96 | 32.35 | 19.61 | 20.10 | 29.41 | 38.24 | 38.73 | 38.24 | 43.14 | 40.20 | 43.14 | 44.61 | 42.16 |
azb_Arab | 25.49 | 41.67 | 32.84 | 24.51 | 25.98 | 41.18 | 45.59 | 45.10 | 46.57 | 49.51 | 50.00 | 49.02 | 50.49 | 49.51 |
azj_Latn | 34.80 | 64.22 | 37.25 | 32.84 | 30.88 | 57.84 | 64.71 | 68.63 | 64.22 | 72.55 | 69.61 | 70.59 | 74.02 | 72.55 |
bak_Cyrl | 38.73 | 61.27 | 32.35 | 32.35 | 34.80 | 51.47 | 60.29 | 61.27 | 69.12 | 68.63 | 68.14 | 68.63 | 73.53 | 70.10 |
bam_Latn | 25.49 | 24.51 | 29.41 | 20.10 | 22.55 | 25.98 | 34.80 | 42.16 | 43.14 | 44.12 | 45.10 | 42.16 | 46.08 | 44.12 |
ban_Latn | 58.82 | 51.47 | 58.82 | 43.14 | 28.92 | 55.39 | 63.24 | 65.69 | 66.67 | 72.06 | 72.06 | 72.55 | 72.06 | 71.57 |
bel_Cyrl | 47.55 | 59.80 | 28.92 | 30.39 | 40.69 | 60.29 | 63.24 | 66.18 | 67.65 | 70.10 | 72.55 | 72.06 | 72.55 | 73.04 |
bem_Latn | 31.37 | 28.92 | 38.24 | 25.49 | 21.08 | 34.80 | 43.14 | 48.04 | 50.49 | 50.49 | 53.43 | 53.43 | 52.45 | 53.92 |
ben_Beng | 25.49 | 61.27 | 64.22 | 52.45 | 31.37 | 54.90 | 63.24 | 62.25 | 67.65 | 70.10 | 70.10 | 69.12 | 66.18 | 68.63 |
bjn_Latn | 48.53 | 51.96 | 61.76 | 42.65 | 32.35 | 62.75 | 66.18 | 68.14 | 71.57 | 75.98 | 73.04 | 72.55 | 75.00 | 77.45 |
bod_Tibt | 15.20 | 12.75 | 15.69 | 15.69 | 22.06 | 34.80 | 37.75 | 37.75 | 38.73 | 39.71 | 39.71 | 39.71 | 41.67 | 44.12 |
bos_Latn | 65.20 | 64.71 | 45.59 | 33.82 | 37.75 | 65.20 | 72.06 | 70.10 | 71.57 | 75.98 | 75.00 | 76.47 | 76.96 | 77.45 |
bul_Cyrl | 66.18 | 63.24 | 38.73 | 52.94 | 45.10 | 62.25 | 67.65 | 67.16 | 68.14 | 75.00 | 71.57 | 72.06 | 73.53 | 75.00 |
cat_Latn | 71.08 | 60.78 | 66.67 | 60.29 | 34.31 | 59.31 | 68.14 | 68.63 | 71.57 | 72.55 | 69.12 | 73.04 | 76.96 | 76.47 |
ceb_Latn | 50.49 | 50.98 | 49.02 | 39.71 | 39.22 | 60.29 | 66.67 | 66.67 | 68.63 | 73.04 | 72.06 | 71.57 | 74.51 | 74.02 |
ces_Latn | 69.12 | 62.75 | 47.55 | 40.69 | 39.22 | 62.25 | 69.61 | 70.59 | 72.55 | 76.47 | 72.55 | 74.02 | 80.88 | 76.96 |
cjk_Latn | 27.94 | 30.39 | 34.31 | 26.47 | 22.55 | 30.88 | 31.86 | 32.84 | 38.24 | 38.24 | 38.24 | 35.78 | 39.22 | 42.65 |
ckb_Arab | 19.61 | 22.55 | 23.04 | 12.25 | 28.92 | 53.43 | 60.29 | 57.35 | 65.20 | 65.20 | 62.75 | 65.69 | 65.69 | 70.59 |
cmn_Hani | 73.04 | 65.20 | 67.65 | 54.90 | 39.71 | 69.12 | 74.02 | 72.06 | 76.47 | 77.45 | 76.47 | 75.98 | 79.41 | 76.47 |
crh_Latn | 38.24 | 56.37 | 40.20 | 36.76 | 29.41 | 51.96 | 62.25 | 61.76 | 64.22 | 69.12 | 64.22 | 70.10 | 69.12 | 71.08 |
cym_Latn | 39.22 | 28.43 | 34.80 | 21.57 | 28.43 | 55.39 | 62.75 | 63.73 | 66.18 | 72.06 | 74.02 | 72.06 | 75.00 | 77.45 |
dan_Latn | 69.12 | 64.22 | 55.39 | 44.12 | 38.24 | 54.41 | 63.24 | 65.20 | 70.10 | 71.08 | 72.06 | 71.57 | 73.53 | 74.51 |
deu_Latn | 74.02 | 60.29 | 61.27 | 55.39 | 41.18 | 63.73 | 68.63 | 71.57 | 69.12 | 75.00 | 75.49 | 76.47 | 77.45 | 77.45 |
dyu_Latn | 28.43 | 28.92 | 32.35 | 20.10 | 21.08 | 29.90 | 38.73 | 39.71 | 46.57 | 44.12 | 41.67 | 47.06 | 44.61 | 43.63 |
dzo_Tibt | 14.71 | 10.29 | 13.73 | 12.75 | 21.57 | 30.39 | 36.76 | 37.25 | 39.71 | 36.76 | 39.22 | 37.75 | 43.14 | 39.22 |
ell_Grek | 47.55 | 63.73 | 28.43 | 60.29 | 43.63 | 62.75 | 69.61 | 66.67 | 69.12 | 69.61 | 70.59 | 73.04 | 72.06 | 72.06 |
eng_Latn | 71.57 | 59.80 | 71.08 | 67.65 | 48.04 | 63.24 | 70.59 | 69.12 | 69.12 | 74.02 | 73.04 | 74.51 | 76.96 | 75.98 |
epo_Latn | 52.94 | 50.49 | 52.94 | 43.63 | 27.94 | 49.51 | 66.18 | 66.67 | 68.63 | 72.55 | 73.53 | 71.57 | 74.51 | 75.98 |
est_Latn | 48.04 | 54.90 | 41.18 | 57.35 | 29.41 | 55.88 | 62.75 | 66.67 | 70.10 | 72.06 | 70.10 | 69.12 | 71.57 | 73.04 |
eus_Latn | 36.27 | 59.80 | 64.22 | 55.88 | 27.94 | 52.45 | 61.76 | 66.18 | 66.18 | 73.53 | 73.53 | 72.55 | 73.53 | 75.49 |
ewe_Latn | 23.53 | 23.53 | 29.90 | 17.16 | 23.53 | 28.43 | 38.24 | 35.29 | 41.18 | 43.63 | 38.73 | 44.61 | 39.71 | 43.63 |
fao_Latn | 41.18 | 43.63 | 38.73 | 29.90 | 35.29 | 52.94 | 56.86 | 57.84 | 62.25 | 61.27 | 62.25 | 61.76 | 64.22 | 69.12 |
fij_Latn | 27.45 | 27.45 | 36.27 | 24.02 | 24.51 | 37.75 | 48.53 | 47.55 | 53.92 | 48.53 | 50.00 | 51.47 | 51.47 | 53.92 |
fin_Latn | 67.65 | 63.24 | 38.73 | 56.86 | 36.76 | 61.76 | 70.10 | 70.10 | 72.55 | 74.51 | 72.06 | 73.53 | 75.00 | 75.49 |
fon_Latn | 25.49 | 22.06 | 31.37 | 19.12 | 23.53 | 29.90 | 30.88 | 37.25 | 35.78 | 38.24 | 39.71 | 38.24 | 37.25 | 46.57 |
fra_Latn | 72.06 | 64.71 | 66.18 | 59.80 | 36.76 | 58.82 | 71.08 | 67.65 | 71.08 | 74.51 | 71.57 | 74.02 | 77.45 | 77.45 |
ful_Latn | 27.45 | 31.37 | 32.84 | 24.02 | 21.57 | 31.86 | 36.76 | 40.69 | 43.14 | 45.10 | 44.12 | 47.06 | 47.06 | 46.08 |
fur_Latn | 58.82 | 50.00 | 55.88 | 38.73 | 35.78 | 53.92 | 55.88 | 62.25 | 66.67 | 68.63 | 68.14 | 67.16 | 75.00 | 72.55 |
gla_Latn | 37.75 | 24.51 | 27.45 | 17.16 | 24.51 | 50.98 | 55.88 | 57.84 | 57.35 | 59.80 | 63.24 | 61.27 | 62.75 | 65.20 |
gle_Latn | 39.71 | 25.49 | 25.00 | 17.16 | 27.94 | 53.43 | 57.84 | 63.24 | 64.22 | 67.65 | 64.71 | 63.24 | 62.75 | 73.53 |
glg_Latn | 68.63 | 62.25 | 64.22 | 55.39 | 29.41 | 66.67 | 70.10 | 71.08 | 74.02 | 76.96 | 73.53 | 76.47 | 77.45 | 80.39 |
grn_Latn | 42.16 | 47.06 | 48.53 | 33.82 | 25.98 | 52.45 | 62.75 | 64.71 | 62.25 | 67.16 | 64.22 | 65.69 | 61.76 | 69.61 |
guj_Gujr | 15.20 | 09.31 | 62.25 | 12.75 | 28.43 | 50.98 | 54.90 | 60.29 | 63.73 | 63.73 | 64.22 | 65.69 | 62.75 | 67.16 |
hat_Latn | 41.67 | 38.73 | 42.16 | 45.10 | 34.80 | 57.35 | 63.73 | 62.25 | 65.20 | 72.06 | 69.61 | 70.10 | 73.53 | 73.04 |
hau_Latn | 25.49 | 28.43 | 29.90 | 20.59 | 28.43 | 47.06 | 55.39 | 57.84 | 61.27 | 65.69 | 62.75 | 65.69 | 66.18 | 65.20 |
heb_Hebr | 37.75 | 63.24 | 20.59 | 11.76 | 26.47 | 41.67 | 47.06 | 51.47 | 54.90 | 51.96 | 50.98 | 54.90 | 53.92 | 54.41 |
hin_Deva | 44.61 | 62.75 | 62.75 | 51.96 | 33.82 | 55.39 | 60.78 | 66.67 | 65.20 | 69.61 | 70.59 | 74.02 | 73.04 | 72.06 |
hne_Deva | 37.75 | 58.82 | 59.80 | 49.02 | 28.43 | 55.39 | 55.88 | 65.20 | 62.75 | 68.63 | 66.18 | 65.69 | 68.14 | 71.08 |
hrv_Latn | 66.18 | 65.20 | 44.12 | 36.27 | 40.69 | 63.73 | 73.04 | 71.08 | 73.53 | 76.47 | 72.06 | 74.51 | 78.43 | 78.43 |
hun_Latn | 71.08 | 63.24 | 41.67 | 27.94 | 30.88 | 60.78 | 67.16 | 70.59 | 68.63 | 75.49 | 73.53 | 74.02 | 73.04 | 76.47 |
hye_Armn | 20.59 | 17.16 | 13.73 | 12.75 | 32.84 | 58.82 | 59.80 | 67.16 | 65.20 | 69.61 | 68.14 | 69.12 | 69.12 | 72.55 |
ibo_Latn | 24.02 | 26.47 | 38.24 | 19.12 | 30.39 | 51.47 | 57.35 | 63.73 | 67.16 | 69.12 | 68.14 | 69.12 | 68.63 | 72.06 |
ilo_Latn | 45.10 | 45.59 | 48.04 | 32.35 | 27.45 | 54.90 | 61.76 | 61.76 | 68.14 | 68.14 | 70.10 | 73.04 | 73.53 | 70.10 |
ind_Latn | 74.02 | 62.75 | 70.10 | 54.90 | 40.69 | 62.75 | 68.63 | 71.57 | 70.59 | 75.49 | 75.49 | 76.96 | 80.39 | 77.94 |
isl_Latn | 35.29 | 36.76 | 28.92 | 24.51 | 38.73 | 55.88 | 60.78 | 58.33 | 60.29 | 64.71 | 63.73 | 63.24 | 62.75 | 65.20 |
ita_Latn | 69.61 | 62.25 | 62.75 | 57.84 | 40.20 | 64.22 | 70.59 | 70.59 | 74.51 | 77.94 | 75.98 | 76.96 | 80.39 | 76.96 |
jav_Latn | 50.49 | 52.94 | 55.39 | 38.24 | 31.86 | 53.43 | 60.78 | 64.22 | 65.20 | 72.55 | 69.12 | 73.04 | 68.63 | 73.04 |
jpn_Jpan | 73.53 | 60.29 | 63.24 | 55.88 | 38.73 | 67.16 | 72.06 | 75.49 | 78.92 | 79.41 | 80.39 | 78.92 | 81.86 | 81.37 |
kab_Latn | 16.18 | 16.67 | 20.10 | 12.25 | 20.59 | 24.02 | 22.55 | 30.39 | 31.86 | 34.80 | 28.43 | 33.33 | 32.35 | 34.31 |
kac_Latn | 25.98 | 24.51 | 28.43 | 20.59 | 20.10 | 24.02 | 29.90 | 35.78 | 35.78 | 43.14 | 37.75 | 37.25 | 43.63 | 39.71 |
kam_Latn | 26.96 | 34.31 | 34.80 | 26.47 | 22.06 | 36.76 | 38.73 | 37.75 | 40.69 | 41.67 | 46.57 | 41.67 | 42.16 | 42.16 |
kan_Knda | 17.16 | 11.27 | 61.27 | 11.27 | 25.49 | 50.98 | 57.35 | 60.29 | 61.27 | 65.20 | 63.24 | 64.22 | 65.69 | 67.16 |
kat_Geor | 29.41 | 61.27 | 18.14 | 14.71 | 32.84 | 56.86 | 60.78 | 62.25 | 65.20 | 67.65 | 70.59 | 70.59 | 70.10 | 74.51 |
kaz_Cyrl | 37.75 | 62.75 | 29.90 | 28.43 | 34.31 | 53.43 | 57.35 | 62.25 | 65.69 | 67.16 | 65.20 | 65.69 | 69.12 | 67.65 |
kbp_Latn | 24.51 | 22.06 | 30.39 | 16.18 | 21.08 | 28.43 | 36.76 | 38.73 | 40.69 | 39.22 | 40.69 | 39.71 | 38.73 | 40.20 |
kea_Latn | 53.43 | 51.96 | 56.86 | 39.71 | 32.84 | 56.86 | 63.73 | 65.20 | 67.16 | 69.61 | 69.12 | 71.57 | 71.57 | 72.06 |
khm_Khmr | 27.45 | 11.27 | 25.49 | 15.20 | 39.22 | 61.76 | 67.16 | 67.65 | 68.63 | 72.06 | 73.53 | 75.00 | 76.47 | 76.47 |
kik_Latn | 29.41 | 32.84 | 38.73 | 26.96 | 21.57 | 37.75 | 49.51 | 50.49 | 50.49 | 56.86 | 56.37 | 56.37 | 52.94 | 56.86 |
kin_Latn | 26.47 | 32.35 | 50.49 | 24.51 | 27.45 | 40.69 | 49.02 | 52.94 | 57.84 | 58.33 | 60.78 | 56.86 | 59.31 | 59.80 |
kir_Cyrl | 35.78 | 60.78 | 34.80 | 27.45 | 29.90 | 45.59 | 58.33 | 60.78 | 60.29 | 65.20 | 60.29 | 64.71 | 66.18 | 66.18 |
kmb_Latn | 26.47 | 28.43 | 33.82 | 25.00 | 21.08 | 31.86 | 35.29 | 39.71 | 38.24 | 41.18 | 41.67 | 37.25 | 41.67 | 44.61 |
kmr_Latn | 29.41 | 33.82 | 33.33 | 21.57 | 25.98 | 37.75 | 47.06 | 47.55 | 52.45 | 52.45 | 54.90 | 54.41 | 58.82 | 61.76 |
kon_Latn | 33.33 | 33.82 | 40.69 | 32.35 | 22.06 | 39.71 | 46.57 | 51.96 | 53.92 | 64.71 | 64.22 | 60.78 | 64.22 | 65.20 |
kor_Hang | 67.65 | 63.24 | 43.14 | 56.37 | 45.10 | 63.24 | 67.65 | 69.12 | 71.57 | 73.04 | 70.59 | 75.98 | 76.96 | 76.47 |
lao_Laoo | 24.02 | 14.22 | 26.47 | 16.67 | 39.71 | 55.39 | 62.25 | 63.73 | 68.63 | 70.59 | 70.10 | 68.63 | 70.59 | 70.59 |
lij_Latn | 55.88 | 53.43 | 56.37 | 44.61 | 37.25 | 58.82 | 67.65 | 66.67 | 69.61 | 71.57 | 71.08 | 74.02 | 76.47 | 74.02 |
Lang | LLaMA 2 7B | mGPT 13B | BLOOM 7B1 | XGLM 7.5B | MaLA-500 | |||||||||
1-shot | 2-shot | 3-shot | 4-shot | 5-shot | 6-shot | 7-shot | 8-shot | 9-shot | 10-shot | |||||
lim_Latn | 60.78 | 50.00 | 50.00 | 33.82 | 32.84 | 56.37 | 60.78 | 63.24 | 66.67 | 67.16 | 69.12 | 72.55 | 70.59 | 72.06 |
lin_Latn | 36.76 | 40.20 | 43.14 | 34.31 | 23.04 | 38.24 | 47.06 | 53.92 | 57.84 | 61.27 | 56.86 | 60.29 | 62.75 | 65.69 |
lit_Latn | 40.20 | 60.29 | 41.18 | 30.39 | 32.35 | 55.39 | 62.75 | 65.20 | 64.71 | 70.59 | 68.14 | 66.67 | 69.61 | 73.04 |
lmo_Latn | 57.84 | 50.98 | 55.88 | 41.67 | 34.80 | 59.31 | 65.69 | 66.18 | 70.10 | 71.57 | 70.59 | 70.59 | 75.00 | 75.49 |
ltz_Latn | 55.88 | 47.06 | 52.94 | 39.22 | 39.22 | 56.37 | 65.20 | 61.27 | 70.59 | 68.14 | 71.08 | 70.59 | 71.08 | 74.02 |
lua_Latn | 32.35 | 33.33 | 39.22 | 28.43 | 20.10 | 33.82 | 40.20 | 42.65 | 49.02 | 51.96 | 50.00 | 50.00 | 49.51 | 50.00 |
lug_Latn | 27.94 | 25.00 | 33.82 | 19.61 | 22.06 | 35.78 | 40.20 | 43.63 | 48.04 | 51.47 | 47.06 | 43.63 | 49.02 | 49.02 |
luo_Latn | 28.43 | 28.43 | 32.84 | 25.49 | 21.57 | 31.37 | 37.25 | 42.65 | 48.04 | 49.51 | 46.57 | 51.47 | 49.51 | 51.47 |
lus_Latn | 43.63 | 42.16 | 49.02 | 31.37 | 25.49 | 45.59 | 51.96 | 53.92 | 52.45 | 54.90 | 57.84 | 60.29 | 58.33 | 60.29 |
lvs_Latn | 43.14 | 67.16 | 43.63 | 29.41 | 31.37 | 57.84 | 65.20 | 63.24 | 68.14 | 72.55 | 67.65 | 69.61 | 71.08 | 72.55 |
mai_Deva | 40.69 | 59.31 | 60.29 | 51.47 | 33.33 | 57.84 | 61.76 | 66.67 | 67.65 | 69.12 | 69.12 | 70.10 | 71.57 | 69.12 |
mal_Mlym | 20.10 | 60.29 | 64.71 | 13.24 | 25.98 | 52.45 | 59.31 | 60.29 | 62.75 | 62.25 | 65.69 | 63.24 | 63.73 | 68.14 |
mar_Deva | 29.90 | 56.86 | 63.73 | 37.75 | 36.27 | 51.96 | 57.35 | 64.22 | 63.73 | 63.73 | 63.73 | 66.67 | 66.18 | 68.14 |
min_Latn | 48.04 | 55.39 | 59.80 | 39.71 | 31.37 | 57.35 | 69.12 | 68.14 | 68.63 | 77.94 | 72.06 | 75.98 | 75.49 | 77.45 |
mkd_Cyrl | 60.78 | 52.45 | 32.84 | 44.12 | 44.12 | 66.18 | 68.63 | 69.12 | 68.63 | 73.04 | 73.04 | 72.55 | 73.04 | 76.96 |
mlt_Latn | 49.51 | 45.10 | 46.08 | 29.90 | 35.78 | 64.71 | 67.16 | 68.14 | 67.16 | 77.45 | 75.49 | 76.96 | 77.45 | 76.96 |
mon_Cyrl | 23.53 | 54.90 | 20.10 | 18.63 | 38.24 | 50.00 | 56.86 | 55.88 | 63.24 | 64.22 | 63.24 | 63.24 | 65.20 | 67.16 |
mos_Latn | 25.49 | 23.53 | 29.90 | 20.59 | 20.59 | 27.94 | 36.76 | 37.75 | 37.75 | 40.69 | 41.67 | 45.10 | 41.67 | 45.10 |
mri_Latn | 30.39 | 24.02 | 30.88 | 17.65 | 28.43 | 44.12 | 49.02 | 51.47 | 51.47 | 57.84 | 55.88 | 58.82 | 56.37 | 58.33 |
mya_Mymr | 19.12 | 60.29 | 19.61 | 60.29 | 23.53 | 38.73 | 43.14 | 53.43 | 53.43 | 50.98 | 52.45 | 54.90 | 51.96 | 54.90 |
nld_Latn | 70.10 | 59.80 | 55.88 | 46.08 | 45.59 | 64.71 | 69.12 | 68.63 | 73.04 | 73.53 | 75.49 | 74.02 | 79.41 | 80.88 |
nno_Latn | 64.71 | 61.76 | 52.45 | 45.59 | 35.29 | 52.94 | 64.22 | 62.75 | 66.18 | 68.63 | 68.63 | 70.10 | 69.12 | 73.04 |
npi_Deva | 39.22 | 51.96 | 64.71 | 40.69 | 33.82 | 57.84 | 61.76 | 68.14 | 67.65 | 67.65 | 68.63 | 70.10 | 68.63 | 75.49 |
nso_Latn | 27.94 | 30.88 | 33.33 | 22.55 | 21.08 | 33.82 | 43.14 | 46.08 | 49.02 | 52.94 | 51.96 | 53.92 | 54.90 | 53.92 |
nya_Latn | 32.35 | 34.31 | 40.69 | 27.94 | 23.04 | 35.29 | 45.59 | 49.02 | 50.98 | 51.47 | 52.94 | 53.92 | 52.94 | 58.33 |
oci_Latn | 68.63 | 56.37 | 65.69 | 48.53 | 34.31 | 60.29 | 69.12 | 65.20 | 67.65 | 73.04 | 73.53 | 71.57 | 75.49 | 76.47 |
orm_Latn | 17.16 | 18.14 | 22.06 | 16.67 | 20.10 | 30.39 | 35.29 | 41.18 | 41.67 | 47.55 | 41.67 | 44.12 | 43.14 | 51.47 |
ory_Orya | 13.24 | 13.73 | 64.22 | 11.76 | 24.51 | 45.10 | 52.45 | 57.84 | 53.92 | 61.27 | 57.84 | 56.86 | 60.78 | 60.78 |
pag_Latn | 52.45 | 49.51 | 53.92 | 40.20 | 31.86 | 54.90 | 62.75 | 60.78 | 67.65 | 64.71 | 70.10 | 70.10 | 69.12 | 69.61 |
pan_Guru | 14.22 | 11.27 | 62.25 | 11.76 | 33.82 | 54.90 | 58.82 | 63.73 | 64.22 | 67.65 | 67.16 | 66.67 | 68.63 | 67.16 |
pap_Latn | 55.39 | 50.00 | 52.94 | 38.24 | 30.39 | 56.86 | 64.71 | 66.67 | 69.61 | 74.51 | 69.12 | 73.53 | 70.59 | 75.49 |
pes_Arab | 47.06 | 58.82 | 52.94 | 32.84 | 39.22 | 61.27 | 71.08 | 63.73 | 70.59 | 72.55 | 72.55 | 73.53 | 76.47 | 76.47 |
plt_Latn | 28.43 | 32.84 | 37.25 | 21.57 | 29.41 | 51.96 | 58.82 | 57.84 | 59.31 | 60.78 | 60.29 | 60.29 | 64.22 | 60.29 |
pol_Latn | 74.51 | 60.78 | 47.06 | 32.84 | 36.76 | 61.76 | 68.63 | 69.12 | 71.08 | 75.00 | 74.02 | 74.02 | 77.45 | 75.98 |
por_Latn | 70.10 | 61.76 | 65.20 | 59.31 | 36.76 | 64.71 | 72.06 | 70.10 | 74.51 | 75.00 | 76.96 | 75.49 | 78.43 | 82.84 |
prs_Arab | 50.49 | 55.39 | 49.51 | 33.33 | 37.25 | 60.78 | 64.22 | 67.16 | 69.12 | 72.55 | 72.55 | 73.53 | 72.55 | 75.49 |
pus_Arab | 30.39 | 34.80 | 38.73 | 21.08 | 30.39 | 47.06 | 50.98 | 52.45 | 54.41 | 53.92 | 53.92 | 55.88 | 55.88 | 57.84 |
quy_Latn | 32.84 | 35.29 | 40.69 | 35.29 | 22.06 | 36.27 | 44.12 | 45.59 | 49.02 | 52.45 | 49.51 | 49.02 | 50.98 | 50.98 |
ron_Latn | 69.12 | 61.76 | 57.84 | 42.65 | 41.18 | 61.27 | 70.10 | 65.20 | 70.10 | 74.51 | 73.53 | 75.00 | 78.92 | 78.43 |
run_Latn | 25.49 | 27.94 | 44.12 | 25.49 | 23.53 | 37.25 | 46.57 | 50.49 | 51.96 | 59.31 | 51.96 | 56.37 | 57.84 | 60.29 |
rus_Cyrl | 71.57 | 63.24 | 53.43 | 60.29 | 38.73 | 64.22 | 65.20 | 69.12 | 72.06 | 75.98 | 75.00 | 76.47 | 75.49 | 78.92 |
sag_Latn | 29.90 | 27.94 | 31.37 | 21.08 | 20.59 | 30.88 | 43.63 | 47.06 | 48.53 | 55.88 | 52.45 | 54.41 | 55.88 | 58.82 |
san_Deva | 27.94 | 47.55 | 54.90 | 42.65 | 24.51 | 48.04 | 60.29 | 57.84 | 62.25 | 66.67 | 65.20 | 61.76 | 66.18 | 65.20 |
scn_Latn | 51.96 | 50.00 | 53.43 | 40.69 | 37.25 | 63.73 | 73.04 | 70.59 | 74.02 | 77.45 | 75.49 | 75.00 | 80.39 | 76.47 |
sin_Sinh | 15.20 | 10.78 | 20.10 | 12.75 | 29.90 | 56.37 | 60.29 | 65.20 | 66.18 | 68.14 | 64.71 | 66.67 | 63.73 | 67.16 |
slk_Latn | 68.14 | 60.29 | 47.55 | 39.71 | 34.31 | 58.33 | 68.63 | 66.67 | 70.59 | 75.00 | 70.59 | 71.57 | 74.51 | 75.00 |
slv_Latn | 68.14 | 60.78 | 44.12 | 32.84 | 38.73 | 63.24 | 68.14 | 68.14 | 70.59 | 73.53 | 73.53 | 74.51 | 78.43 | 76.47 |
smo_Latn | 30.39 | 25.00 | 31.86 | 18.14 | 29.41 | 52.45 | 60.29 | 62.25 | 62.25 | 65.69 | 67.16 | 65.20 | 66.18 | 69.61 |
sna_Latn | 28.43 | 29.41 | 36.27 | 23.53 | 24.51 | 39.71 | 44.61 | 45.59 | 44.61 | 49.51 | 45.59 | 47.55 | 47.06 | 50.00 |
snd_Arab | 27.94 | 37.25 | 39.22 | 23.53 | 27.94 | 42.65 | 47.06 | 50.49 | 52.45 | 54.41 | 54.41 | 52.94 | 55.88 | 56.86 |
som_Latn | 23.53 | 25.49 | 27.94 | 17.16 | 22.06 | 36.27 | 44.61 | 47.55 | 51.47 | 52.94 | 52.94 | 53.92 | 54.41 | 55.39 |
sot_Latn | 29.41 | 28.43 | 33.82 | 18.63 | 22.55 | 36.76 | 43.14 | 47.06 | 50.49 | 51.96 | 52.45 | 55.39 | 54.41 | 56.86 |
spa_Latn | 72.55 | 58.33 | 67.65 | 56.37 | 35.29 | 64.22 | 69.61 | 72.06 | 74.51 | 74.02 | 72.06 | 76.47 | 78.43 | 78.43 |
srd_Latn | 53.92 | 52.45 | 50.98 | 37.25 | 31.37 | 60.29 | 66.18 | 68.63 | 75.98 | 74.02 | 77.94 | 77.45 | 79.41 | 79.41 |
srp_Cyrl | 63.73 | 55.39 | 33.33 | 39.22 | 45.59 | 65.20 | 70.59 | 69.61 | 73.04 | 76.47 | 74.02 | 75.00 | 77.94 | 79.41 |
ssw_Latn | 29.41 | 25.00 | 31.37 | 21.57 | 24.02 | 44.12 | 46.57 | 50.00 | 52.94 | 51.96 | 53.92 | 56.86 | 53.92 | 60.78 |
sun_Latn | 55.39 | 59.31 | 63.73 | 44.61 | 37.25 | 60.29 | 68.63 | 70.10 | 71.08 | 73.53 | 72.55 | 73.53 | 75.00 | 75.98 |
swe_Latn | 71.08 | 61.27 | 52.94 | 48.04 | 33.82 | 53.43 | 60.29 | 64.71 | 64.71 | 69.12 | 70.10 | 69.61 | 72.55 | 70.59 |
swh_Latn | 32.35 | 63.24 | 61.27 | 56.86 | 29.41 | 50.49 | 59.31 | 58.82 | 62.75 | 60.29 | 62.25 | 68.63 | 66.67 | 66.67 |
szl_Latn | 56.86 | 50.49 | 45.59 | 29.41 | 30.88 | 51.47 | 59.80 | 63.73 | 64.71 | 67.16 | 69.61 | 68.63 | 71.08 | 69.12 |
tam_Taml | 20.59 | 63.24 | 67.16 | 58.82 | 30.88 | 50.49 | 55.88 | 62.75 | 63.24 | 63.73 | 65.20 | 69.61 | 66.67 | 68.63 |
tat_Cyrl | 37.75 | 60.29 | 35.29 | 28.92 | 33.33 | 54.90 | 64.22 | 64.71 | 65.69 | 74.51 | 70.10 | 71.57 | 71.57 | 73.53 |
tel_Telu | 18.14 | 60.78 | 61.27 | 59.80 | 25.98 | 50.00 | 52.45 | 59.80 | 58.82 | 63.73 | 60.78 | 60.29 | 63.73 | 61.76 |
tgk_Cyrl | 26.96 | 57.84 | 23.53 | 17.16 | 36.76 | 54.90 | 60.29 | 60.78 | 61.27 | 69.12 | 64.71 | 66.18 | 68.14 | 70.59 |
tgl_Latn | 55.88 | 58.33 | 49.02 | 40.20 | 43.14 | 64.22 | 69.61 | 64.71 | 70.10 | 75.49 | 74.51 | 77.45 | 78.43 | 77.45 |
tha_Thai | 44.61 | 60.78 | 23.53 | 57.35 | 41.18 | 63.24 | 67.16 | 68.63 | 70.10 | 72.06 | 70.59 | 70.10 | 72.06 | 75.49 |
tir_Ethi | 13.24 | 16.18 | 16.18 | 13.73 | 21.57 | 34.80 | 39.22 | 41.67 | 47.06 | 46.08 | 45.59 | 47.06 | 45.59 | 47.55 |
tpi_Latn | 63.24 | 46.57 | 56.86 | 33.33 | 31.86 | 58.82 | 65.69 | 68.14 | 70.10 | 72.06 | 74.51 | 73.53 | 75.98 | 74.51 |
tsn_Latn | 28.92 | 29.90 | 32.35 | 24.51 | 24.51 | 40.20 | 44.12 | 47.06 | 47.06 | 53.43 | 51.96 | 50.00 | 50.49 | 54.41 |
tso_Latn | 30.88 | 31.37 | 36.76 | 28.92 | 22.55 | 35.29 | 41.18 | 45.10 | 46.08 | 48.04 | 43.14 | 43.14 | 45.59 | 49.02 |
tuk_Latn | 34.31 | 46.08 | 39.22 | 27.45 | 24.02 | 45.59 | 53.43 | 58.82 | 57.84 | 63.73 | 63.73 | 66.18 | 65.69 | 66.18 |
tum_Latn | 26.96 | 34.31 | 33.82 | 27.94 | 21.57 | 39.22 | 43.14 | 45.59 | 46.08 | 47.55 | 44.61 | 49.02 | 49.51 | 49.51 |
tur_Latn | 52.94 | 62.75 | 40.20 | 52.94 | 36.76 | 60.78 | 68.63 | 70.10 | 72.06 | 74.02 | 75.00 | 76.47 | 75.98 | 76.96 |
uig_Arab | 18.63 | 18.14 | 20.10 | 11.27 | 21.08 | 33.33 | 36.76 | 39.71 | 43.14 | 44.61 | 43.14 | 48.53 | 47.55 | 48.04 |
ukr_Cyrl | 71.57 | 63.73 | 41.18 | 43.63 | 39.71 | 60.29 | 65.69 | 66.18 | 69.12 | 71.08 | 75.00 | 73.53 | 72.55 | 75.00 |
umb_Latn | 25.00 | 26.47 | 29.90 | 23.04 | 21.57 | 30.88 | 32.84 | 36.76 | 35.78 | 40.20 | 38.24 | 34.80 | 36.76 | 35.29 |
urd_Arab | 38.73 | 53.43 | 63.24 | 54.41 | 36.27 | 55.39 | 62.75 | 64.22 | 64.22 | 68.63 | 65.20 | 68.63 | 67.16 | 67.16 |
uzb_Latn | 30.39 | 62.75 | 35.78 | 23.53 | 22.06 | 50.49 | 56.37 | 57.84 | 63.24 | 72.06 | 63.73 | 69.12 | 72.55 | 71.57 |
vec_Latn | 65.69 | 59.80 | 56.86 | 52.45 | 39.22 | 62.25 | 66.18 | 69.61 | 70.10 | 69.12 | 74.51 | 75.98 | 75.00 | 76.47 |
vie_Latn | 67.65 | 63.24 | 67.16 | 60.78 | 39.71 | 60.78 | 67.16 | 68.63 | 74.51 | 75.98 | 76.47 | 75.00 | 78.43 | 79.90 |
war_Latn | 51.47 | 51.47 | 51.47 | 37.25 | 37.75 | 61.27 | 65.69 | 65.20 | 69.61 | 73.04 | 71.57 | 71.57 | 74.02 | 74.02 |
wol_Latn | 32.35 | 34.80 | 43.14 | 25.49 | 23.53 | 36.76 | 42.16 | 45.59 | 48.53 | 53.43 | 47.55 | 52.94 | 54.90 | 53.43 |
xho_Latn | 30.39 | 29.90 | 38.24 | 22.55 | 25.98 | 46.57 | 51.96 | 56.37 | 58.33 | 60.78 | 61.76 | 60.29 | 62.75 | 64.22 |
yid_Hebr | 23.04 | 22.06 | 16.18 | 12.25 | 24.02 | 34.80 | 39.22 | 39.71 | 40.20 | 46.57 | 41.18 | 40.69 | 44.61 | 45.10 |
yor_Latn | 21.57 | 29.41 | 47.55 | 21.57 | 26.47 | 32.35 | 41.18 | 39.22 | 41.67 | 48.04 | 42.16 | 43.14 | 44.61 | 43.14 |
yue_Hani | 75.00 | 64.71 | 67.16 | 55.88 | 40.69 | 69.12 | 71.57 | 76.47 | 76.96 | 81.37 | 77.94 | 79.41 | 81.86 | 79.41 |
zsm_Latn | 65.69 | 61.27 | 64.71 | 50.00 | 36.76 | 60.29 | 68.14 | 69.12 | 67.65 | 73.53 | 73.53 | 76.96 | 77.45 | 75.00 |
zul_Latn | 25.00 | 25.49 | 35.29 | 15.20 | 25.49 | 51.47 | 49.02 | 54.41 | 56.37 | 57.84 | 60.29 | 61.76 | 59.31 | 62.75 |
all | 42.08 | 45.34 | 44.63 | 34.36 | 30.88 | 50.71 | 57.02 | 58.95 | 61.20 | 64.04 | 63.15 | 64.13 | 65.19 | 66.32 |
Lang | LLaMA 2-7B | mGPT-13B | BLOOM-7B1 | XGLM-7.5B | MaLA-500 |
ace_Latn | 46.85 | 47.75 | 49.55 | 41.44 | 48.65 |
ach_Latn | 45.05 | 37.84 | 41.44 | 40.54 | 36.04 |
acr_Latn | 47.75 | 51.35 | 50.45 | 47.75 | 50.45 |
afr_Latn | 54.05 | 38.74 | 51.35 | 49.55 | 55.86 |
agw_Latn | 48.65 | 45.05 | 41.44 | 42.34 | 49.55 |
ahk_Latn | 43.24 | 36.04 | 36.04 | 35.14 | 45.05 |
aka_Latn | 42.34 | 32.43 | 38.74 | 42.34 | 54.95 |
aln_Latn | 34.23 | 35.14 | 36.94 | 35.14 | 44.14 |
als_Latn | 38.74 | 36.94 | 42.34 | 42.34 | 47.75 |
alt_Cyrl | 44.14 | 44.14 | 45.05 | 51.35 | 51.35 |
alz_Latn | 36.94 | 35.14 | 31.53 | 28.83 | 37.84 |
aoj_Latn | 50.93 | 37.96 | 45.37 | 46.3 | 49.07 |
arb_Arab | 43.24 | 45.05 | 49.55 | 44.14 | 50.45 |
arn_Latn | 38.74 | 42.34 | 34.23 | 36.04 | 43.24 |
ary_Arab | 32.43 | 33.33 | 38.74 | 32.43 | 44.14 |
arz_Arab | 31.53 | 39.64 | 45.05 | 36.94 | 46.85 |
asm_Beng | 45.95 | 42.34 | 54.95 | 40.54 | 54.05 |
ayr_Latn | 47.75 | 37.84 | 44.14 | 44.14 | 54.05 |
azb_Arab | 39.64 | 40.54 | 47.75 | 45.05 | 47.75 |
aze_Latn | 45.05 | 46.85 | 45.05 | 43.24 | 49.55 |
bak_Cyrl | 45.05 | 52.25 | 49.55 | 56.76 | 56.76 |
bam_Latn | 42.34 | 37.84 | 49.55 | 39.64 | 47.75 |
ban_Latn | 36.04 | 41.44 | 34.23 | 34.23 | 42.34 |
bar_Latn | 49.55 | 46.85 | 44.14 | 48.65 | 53.15 |
bba_Latn | 45.05 | 32.43 | 45.95 | 46.85 | 46.85 |
bci_Latn | 36.94 | 35.14 | 36.94 | 33.33 | 44.14 |
bcl_Latn | 42.34 | 48.65 | 39.64 | 39.64 | 54.95 |
bel_Cyrl | 47.75 | 45.95 | 48.65 | 43.24 | 57.66 |
bem_Latn | 47.75 | 37.84 | 42.34 | 41.44 | 51.35 |
ben_Beng | 40.54 | 41.44 | 52.25 | 51.35 | 47.75 |
bhw_Latn | 37.84 | 43.24 | 41.44 | 46.85 | 47.75 |
bim_Latn | 38.74 | 39.64 | 33.33 | 36.94 | 45.05 |
bis_Latn | 44.14 | 49.55 | 44.14 | 39.64 | 48.65 |
bqc_Latn | 39.64 | 36.04 | 34.23 | 33.33 | 40.54 |
bre_Latn | 39.64 | 36.04 | 35.14 | 36.04 | 40.54 |
btx_Latn | 49.55 | 36.94 | 42.34 | 41.44 | 43.24 |
bul_Cyrl | 45.05 | 42.34 | 48.65 | 45.05 | 54.95 |
bum_Latn | 42.34 | 39.64 | 37.84 | 37.84 | 44.14 |
bzj_Latn | 53.15 | 46.85 | 47.75 | 50.45 | 52.25 |
cab_Latn | 39.64 | 38.74 | 37.84 | 36.94 | 36.04 |
cac_Latn | 43.24 | 37.84 | 40.54 | 38.74 | 45.05 |
cak_Latn | 45.95 | 35.14 | 44.14 | 40.54 | 50.45 |
caq_Latn | 39.64 | 38.74 | 38.74 | 44.14 | 37.84 |
cat_Latn | 52.25 | 45.05 | 46.85 | 48.65 | 52.25 |
cbk_Latn | 54.05 | 40.54 | 56.76 | 54.05 | 55.86 |
cce_Latn | 49.55 | 45.05 | 50.45 | 48.65 | 48.65 |
ceb_Latn | 44.14 | 42.34 | 48.65 | 45.05 | 51.35 |
ces_Latn | 44.14 | 43.24 | 45.05 | 46.85 | 51.35 |
cfm_Latn | 49.55 | 41.44 | 49.55 | 53.15 | 48.65 |
che_Cyrl | 37.84 | 33.33 | 36.94 | 37.84 | 38.74 |
chk_Latn | 45.05 | 41.44 | 41.44 | 36.04 | 45.95 |
chv_Cyrl | 43.24 | 45.05 | 45.05 | 49.55 | 58.56 |
ckb_Arab | 44.14 | 36.94 | 45.05 | 42.34 | 51.35 |
cmn_Hani | 48.65 | 45.05 | 53.15 | 48.65 | 53.15 |
cnh_Latn | 46.85 | 46.85 | 46.85 | 49.55 | 46.85 |
crh_Cyrl | 49.55 | 40.54 | 47.75 | 54.95 | 54.05 |
crs_Latn | 52.25 | 44.14 | 49.55 | 55.86 | 59.46 |
csy_Latn | 47.75 | 41.44 | 54.95 | 53.15 | 45.95 |
ctd_Latn | 50.45 | 48.65 | 56.76 | 53.15 | 56.76 |
ctu_Latn | 41.44 | 35.14 | 38.74 | 40.54 | 43.24 |
cuk_Latn | 42.34 | 42.34 | 38.74 | 39.64 | 37.84 |
cym_Latn | 39.64 | 38.74 | 39.64 | 43.24 | 41.44 |
dan_Latn | 53.15 | 41.44 | 39.64 | 38.74 | 54.95 |
deu_Latn | 45.05 | 36.04 | 37.84 | 38.74 | 43.24 |
djk_Latn | 42.34 | 35.14 | 42.34 | 46.85 | 40.54 |
dln_Latn | 48.65 | 40.54 | 51.35 | 54.05 | 47.75 |
Lang | LLaMA 2-7B | mGPT-13B | BLOOM-7B1 | XGLM-7.5B | MaLA-500 |
dtp_Latn | 39.64 | 35.14 | 42.34 | 46.85 | 50.45 |
dyu_Latn | 41.44 | 39.64 | 42.34 | 38.74 | 46.85 |
dzo_Tibt | 45.05 | 40.54 | 41.44 | 45.05 | 45.05 |
efi_Latn | 39.64 | 36.04 | 38.74 | 41.44 | 45.95 |
ell_Grek | 49.55 | 45.95 | 49.55 | 48.65 | 51.35 |
eng_Latn | 55.86 | 42.34 | 58.56 | 54.05 | 59.46 |
enm_Latn | 50.45 | 41.44 | 56.76 | 50.45 | 54.95 |
epo_Latn | 49.55 | 40.54 | 47.75 | 42.34 | 49.55 |
est_Latn | 46.85 | 42.34 | 38.74 | 52.25 | 46.85 |
eus_Latn | 38.74 | 36.04 | 36.94 | 39.64 | 39.64 |
ewe_Latn | 51.35 | 43.24 | 50.45 | 46.85 | 45.05 |
fao_Latn | 53.15 | 44.14 | 52.25 | 53.15 | 58.56 |
fas_Arab | 49.55 | 50.45 | 57.66 | 51.35 | 55.86 |
fij_Latn | 48.65 | 43.24 | 41.44 | 43.24 | 53.15 |
fil_Latn | 48.65 | 41.44 | 46.85 | 51.35 | 51.35 |
fin_Latn | 47.75 | 45.05 | 41.44 | 45.95 | 54.95 |
fon_Latn | 38.74 | 35.14 | 37.84 | 40.54 | 45.05 |
fra_Latn | 60.36 | 51.35 | 62.16 | 52.25 | 59.46 |
fry_Latn | 37.84 | 33.33 | 36.04 | 27.03 | 46.85 |
gaa_Latn | 41.44 | 33.33 | 37.84 | 35.14 | 40.54 |
gil_Latn | 36.7 | 31.19 | 41.28 | 32.11 | 41.28 |
giz_Latn | 46.85 | 44.14 | 43.24 | 38.74 | 45.05 |
gkn_Latn | 38.74 | 34.23 | 34.23 | 36.94 | 41.44 |
gkp_Latn | 30.63 | 33.33 | 41.44 | 29.73 | 48.65 |
gla_Latn | 33.33 | 39.64 | 44.14 | 45.05 | 49.55 |
gle_Latn | 33.33 | 35.14 | 36.04 | 34.23 | 39.64 |
glv_Latn | 43.24 | 41.44 | 37.84 | 38.74 | 42.34 |
gom_Latn | 34.23 | 31.53 | 33.33 | 40.54 | 42.34 |
gor_Latn | 43.24 | 34.23 | 43.24 | 40.54 | 46.85 |
guc_Latn | 44.14 | 36.04 | 37.84 | 41.44 | 45.05 |
gug_Latn | 45.05 | 44.14 | 42.34 | 41.44 | 50.45 |
guj_Gujr | 45.95 | 37.84 | 52.25 | 44.14 | 56.76 |
gur_Latn | 45.95 | 45.95 | 44.14 | 47.75 | 48.65 |
guw_Latn | 45.05 | 37.84 | 47.75 | 46.85 | 48.65 |
gya_Latn | 37.84 | 37.84 | 41.44 | 34.23 | 42.34 |
gym_Latn | 41.44 | 39.64 | 39.64 | 43.24 | 50.45 |
hat_Latn | 50.45 | 43.24 | 44.14 | 41.44 | 56.76 |
hau_Latn | 44.14 | 37.84 | 41.44 | 44.14 | 48.65 |
haw_Latn | 45.95 | 39.64 | 38.74 | 34.23 | 49.55 |
heb_Hebr | 38.74 | 35.14 | 34.23 | 36.94 | 44.14 |
hif_Latn | 42.34 | 43.24 | 49.55 | 47.75 | 48.65 |
hil_Latn | 49.55 | 41.44 | 40.54 | 36.94 | 54.95 |
hin_Deva | 51.35 | 50.45 | 49.55 | 46.85 | 56.76 |
hmo_Latn | 46.85 | 45.05 | 46.85 | 45.05 | 53.15 |
hne_Deva | 55.86 | 54.05 | 54.05 | 58.56 | 58.56 |
hnj_Latn | 48.65 | 45.05 | 53.15 | 51.35 | 60.36 |
hra_Latn | 49.55 | 41.44 | 43.24 | 46.85 | 45.95 |
hrv_Latn | 55.86 | 51.35 | 52.25 | 54.95 | 61.26 |
hui_Latn | 51.35 | 40.54 | 41.44 | 45.05 | 46.85 |
hun_Latn | 46.85 | 44.14 | 41.44 | 43.24 | 48.65 |
hus_Latn | 32.43 | 32.43 | 34.23 | 37.84 | 42.34 |
hye_Armn | 45.95 | 40.54 | 45.95 | 49.55 | 61.26 |
iba_Latn | 49.55 | 46.85 | 51.35 | 48.65 | 55.86 |
ibo_Latn | 38.74 | 33.33 | 43.24 | 38.74 | 44.14 |
ifa_Latn | 36.04 | 30.63 | 35.14 | 38.74 | 43.24 |
ifb_Latn | 34.23 | 35.14 | 39.64 | 34.23 | 53.15 |
ikk_Latn | 43.24 | 36.94 | 39.64 | 39.64 | 43.24 |
ilo_Latn | 39.64 | 36.04 | 41.44 | 37.84 | 43.24 |
ind_Latn | 49.55 | 50.45 | 53.15 | 53.15 | 54.95 |
isl_Latn | 48.65 | 44.14 | 43.24 | 48.65 | 54.05 |
ita_Latn | 50.45 | 49.55 | 56.76 | 58.56 | 54.95 |
ium_Latn | 45.95 | 44.14 | 49.55 | 50.45 | 45.05 |
ixl_Latn | 42.34 | 39.64 | 41.44 | 43.24 | 40.54 |
izz_Latn | 38.74 | 47.75 | 39.64 | 42.34 | 54.95 |
jam_Latn | 41.44 | 43.24 | 53.15 | 50.45 | 61.26 |
jav_Latn | 41.44 | 47.75 | 44.14 | 37.84 | 45.95 |
Lang | LLaMA 2-7B | mGPT-13B | BLOOM-7B1 | XGLM-7.5B | MaLA-500 |
jpn_Jpan | 46.85 | 46.85 | 47.75 | 50.45 | 51.35 |
kaa_Latn | 43.24 | 53.15 | 47.75 | 51.35 | 54.05 |
kab_Latn | 27.93 | 36.04 | 30.63 | 34.23 | 35.14 |
kac_Latn | 44.14 | 34.23 | 43.24 | 42.34 | 52.25 |
kal_Latn | 41.44 | 37.84 | 36.04 | 35.14 | 40.54 |
kan_Knda | 48.65 | 37.84 | 52.25 | 45.95 | 54.05 |
kat_Geor | 41.44 | 41.44 | 42.34 | 46.85 | 48.65 |
kaz_Cyrl | 49.55 | 45.05 | 51.35 | 53.15 | 55.86 |
kbp_Latn | 40.54 | 35.14 | 36.94 | 31.53 | 47.75 |
kek_Latn | 45.95 | 42.34 | 45.05 | 44.14 | 51.35 |
khm_Khmr | 52.25 | 38.74 | 48.65 | 49.55 | 64.86 |
kia_Latn | 36.94 | 36.04 | 40.54 | 41.44 | 48.65 |
kik_Latn | 45.05 | 43.24 | 45.05 | 44.14 | 50.45 |
kin_Latn | 42.34 | 37.84 | 41.44 | 38.74 | 50.45 |
kir_Cyrl | 51.35 | 46.85 | 47.75 | 63.06 | 64.86 |
kjb_Latn | 48.65 | 46.85 | 44.14 | 44.14 | 48.65 |
kjh_Cyrl | 44.14 | 41.44 | 45.05 | 41.44 | 45.95 |
kmm_Latn | 45.95 | 45.05 | 47.75 | 51.35 | 45.95 |
kmr_Cyrl | 39.64 | 35.14 | 45.05 | 42.34 | 43.24 |
knv_Latn | 44.55 | 44.55 | 45.45 | 42.73 | 44.55 |
kor_Hang | 48.65 | 48.65 | 49.55 | 51.35 | 62.16 |
kpg_Latn | 44.14 | 52.25 | 51.35 | 42.34 | 54.95 |
krc_Cyrl | 45.95 | 36.04 | 48.65 | 48.65 | 53.15 |
kri_Latn | 49.55 | 48.65 | 49.55 | 51.35 | 54.95 |
ksd_Latn | 36.94 | 33.33 | 40.54 | 33.33 | 49.55 |
kss_Latn | 32.43 | 28.83 | 34.23 | 29.73 | 47.75 |
ksw_Mymr | 44.14 | 45.95 | 42.34 | 37.84 | 52.25 |
kua_Latn | 41.44 | 42.34 | 36.94 | 35.14 | 40.54 |
lam_Latn | 43.24 | 36.94 | 45.95 | 43.24 | 40.54 |
lao_Laoo | 45.05 | 39.64 | 46.85 | 50.45 | 50.45 |
lat_Latn | 53.15 | 41.44 | 53.15 | 56.76 | 57.66 |
lav_Latn | 39.64 | 33.33 | 36.04 | 39.64 | 45.05 |
ldi_Latn | 35.14 | 32.43 | 36.94 | 34.23 | 36.04 |
leh_Latn | 47.75 | 37.84 | 33.33 | 32.43 | 41.44 |
lhu_Latn | 27.93 | 34.23 | 34.23 | 37.84 | 42.34 |
lin_Latn | 47.75 | 37.84 | 39.64 | 39.64 | 48.65 |
lit_Latn | 42.34 | 40.54 | 44.14 | 48.65 | 49.55 |
loz_Latn | 45.95 | 42.34 | 36.04 | 44.14 | 40.54 |
ltz_Latn | 46.85 | 45.95 | 47.75 | 41.44 | 49.55 |
lug_Latn | 40.54 | 32.43 | 39.64 | 38.74 | 45.95 |
luo_Latn | 40.54 | 36.94 | 34.23 | 38.74 | 40.54 |
lus_Latn | 39.64 | 40.54 | 42.34 | 41.44 | 50.45 |
lzh_Hani | 54.95 | 48.65 | 54.05 | 43.24 | 56.76 |
mad_Latn | 47.75 | 52.25 | 47.75 | 47.75 | 53.15 |
mah_Latn | 43.24 | 36.04 | 42.34 | 45.95 | 45.05 |
mai_Deva | 45.05 | 41.44 | 49.55 | 54.05 | 51.35 |
mam_Latn | 43.24 | 33.33 | 41.44 | 45.05 | 45.95 |
mar_Deva | 49.55 | 44.14 | 53.15 | 45.95 | 56.76 |
mau_Latn | 29.73 | 29.73 | 36.94 | 37.84 | 32.43 |
mbb_Latn | 44.14 | 42.34 | 38.74 | 39.64 | 49.55 |
mck_Latn | 40.54 | 34.23 | 36.04 | 39.64 | 49.55 |
mcn_Latn | 35.14 | 27.93 | 33.33 | 33.33 | 38.74 |
mco_Latn | 41.44 | 33.33 | 43.24 | 33.33 | 43.24 |
mdy_Ethi | 39.64 | 46.85 | 43.24 | 43.24 | 51.35 |
meu_Latn | 53.15 | 38.74 | 45.05 | 48.65 | 52.25 |
mfe_Latn | 51.35 | 48.65 | 52.25 | 50.45 | 56.76 |
mgh_Latn | 42.34 | 33.33 | 41.44 | 35.14 | 38.74 |
mgr_Latn | 39.64 | 34.23 | 33.33 | 41.44 | 38.74 |
mhr_Cyrl | 47.27 | 42.73 | 45.45 | 42.73 | 48.18 |
min_Latn | 37.84 | 45.95 | 53.15 | 45.05 | 53.15 |
miq_Latn | 51.35 | 46.85 | 43.24 | 54.95 | 49.55 |
mkd_Cyrl | 52.25 | 48.65 | 56.76 | 57.66 | 66.67 |
mlg_Latn | 35.14 | 36.04 | 36.94 | 37.84 | 45.95 |
mlt_Latn | 37.84 | 33.33 | 42.34 | 43.24 | 46.85 |
mos_Latn | 39.64 | 42.34 | 39.64 | 36.04 | 36.04 |
mps_Latn | 47.75 | 45.05 | 42.34 | 45.05 | 51.35 |
mri_Latn | 45.05 | 42.34 | 38.74 | 42.34 | 44.14 |
Lang | LLaMA 2-7B | mGPT-13B | BLOOM-7B1 | XGLM-7.5B | MaLA-500 |
mrw_Latn | 40.54 | 39.64 | 41.44 | 37.84 | 49.55 |
msa_Latn | 44.14 | 41.44 | 45.95 | 37.84 | 46.85 |
mwm_Latn | 36.94 | 31.53 | 39.64 | 38.74 | 47.75 |
mxv_Latn | 33.33 | 35.14 | 39.64 | 38.74 | 40.54 |
mya_Mymr | 45.05 | 48.65 | 44.14 | 44.14 | 46.85 |
myv_Cyrl | 39.64 | 43.24 | 40.54 | 41.44 | 45.05 |
mzh_Latn | 45.05 | 45.95 | 42.34 | 40.54 | 44.14 |
nan_Latn | 32.43 | 35.14 | 48.65 | 49.55 | 44.14 |
naq_Latn | 36.94 | 36.94 | 37.84 | 39.64 | 41.44 |
nav_Latn | 27.03 | 28.83 | 30.63 | 33.33 | 38.74 |
nbl_Latn | 21.62 | 18.02 | 21.62 | 25.23 | 27.93 |
nch_Latn | 37.84 | 34.23 | 33.33 | 40.54 | 40.54 |
ncj_Latn | 46.85 | 45.95 | 42.34 | 41.44 | 42.34 |
ndc_Latn | 44.14 | 36.04 | 43.24 | 36.94 | 49.55 |
nde_Latn | 33.33 | 29.73 | 33.33 | 36.04 | 41.44 |
ndo_Latn | 41.28 | 34.86 | 37.61 | 33.94 | 46.79 |
nds_Latn | 41.44 | 38.74 | 37.84 | 34.23 | 43.24 |
nep_Deva | 45.05 | 49.55 | 63.06 | 51.35 | 60.36 |
ngu_Latn | 47.75 | 39.64 | 43.24 | 42.34 | 49.55 |
nld_Latn | 47.75 | 39.64 | 47.75 | 43.24 | 56.76 |
nmf_Latn | 44.14 | 40.54 | 42.34 | 41.44 | 44.14 |
nnb_Latn | 45.05 | 42.34 | 36.94 | 44.14 | 40.54 |
nno_Latn | 56.76 | 46.85 | 45.95 | 52.25 | 54.95 |
nob_Latn | 52.25 | 41.44 | 44.14 | 45.95 | 56.76 |
nor_Latn | 50.45 | 35.14 | 46.85 | 47.75 | 53.15 |
npi_Deva | 51.35 | 54.95 | 55.86 | 45.95 | 54.95 |
nse_Latn | 38.74 | 28.83 | 39.64 | 38.74 | 42.34 |
nso_Latn | 45.05 | 43.24 | 45.05 | 45.05 | 50.45 |
nya_Latn | 48.65 | 39.64 | 44.14 | 42.34 | 54.95 |
nyn_Latn | 39.64 | 33.33 | 37.84 | 36.94 | 45.05 |
nyy_Latn | 43.24 | 42.34 | 43.24 | 40.54 | 47.75 |
nzi_Latn | 36.94 | 32.43 | 33.33 | 32.43 | 35.14 |
ori_Orya | 43.24 | 34.23 | 51.35 | 46.85 | 45.95 |
ory_Orya | 44.14 | 44.14 | 49.55 | 46.85 | 55.86 |
oss_Cyrl | 49.55 | 49.55 | 49.55 | 44.14 | 54.05 |
ote_Latn | 34.23 | 31.53 | 34.23 | 36.04 | 49.55 |
pag_Latn | 44.14 | 48.65 | 48.65 | 42.34 | 50.45 |
pam_Latn | 45.95 | 36.04 | 44.14 | 47.75 | 45.05 |
pan_Guru | 41.44 | 33.33 | 46.85 | 40.54 | 47.75 |
pap_Latn | 50.45 | 44.14 | 52.25 | 49.55 | 53.15 |
pau_Latn | 38.74 | 45.05 | 37.84 | 36.94 | 46.85 |
pcm_Latn | 58.56 | 47.75 | 56.76 | 53.15 | 57.66 |
pdt_Latn | 53.15 | 45.95 | 45.95 | 48.65 | 54.05 |
pes_Arab | 50.91 | 46.36 | 59.09 | 48.18 | 53.64 |
pis_Latn | 57.66 | 47.75 | 50.45 | 45.95 | 55.86 |
pls_Latn | 43.24 | 43.24 | 43.24 | 39.64 | 45.95 |
plt_Latn | 36.94 | 35.14 | 37.84 | 43.24 | 47.75 |
poh_Latn | 42.34 | 42.34 | 45.05 | 39.64 | 48.65 |
pol_Latn | 41.44 | 43.24 | 46.85 | 55.86 | 56.76 |
pon_Latn | 45.95 | 39.64 | 43.24 | 39.64 | 42.34 |
por_Latn | 56.76 | 54.95 | 56.76 | 54.05 | 58.56 |
prk_Latn | 44.14 | 43.24 | 49.55 | 40.54 | 46.85 |
prs_Arab | 50.45 | 51.35 | 55.86 | 56.76 | 57.66 |
pxm_Latn | 48.65 | 44.14 | 41.44 | 41.44 | 47.75 |
qub_Latn | 46.85 | 44.14 | 43.24 | 48.65 | 45.05 |
quc_Latn | 45.05 | 41.44 | 43.24 | 38.74 | 50.45 |
qug_Latn | 45.95 | 46.85 | 50.45 | 45.05 | 56.76 |
quh_Latn | 49.55 | 49.55 | 46.85 | 42.34 | 51.35 |
quw_Latn | 43.24 | 36.94 | 45.05 | 44.14 | 53.15 |
quy_Latn | 58.56 | 48.65 | 54.95 | 50.45 | 57.66 |
quz_Latn | 51.35 | 38.74 | 60.36 | 54.95 | 59.46 |
qvi_Latn | 46.79 | 46.79 | 49.54 | 45.87 | 47.71 |
rap_Latn | 43.24 | 35.14 | 41.44 | 39.64 | 46.85 |
rar_Latn | 40.54 | 32.43 | 31.53 | 29.73 | 45.95 |
rmy_Latn | 37.84 | 37.84 | 38.74 | 40.54 | 43.24 |
ron_Latn | 45.05 | 51.35 | 44.14 | 47.75 | 57.66 |
Lang | LLaMA 2-7B | mGPT-13B | BLOOM-7B1 | XGLM-7.5B | MaLA-500 |
rop_Latn | 45.95 | 45.05 | 42.34 | 42.34 | 55.86 |
rug_Latn | 43.24 | 38.74 | 46.85 | 44.14 | 45.05 |
run_Latn | 46.85 | 40.54 | 45.05 | 40.54 | 52.25 |
rus_Cyrl | 49.55 | 41.44 | 50.45 | 47.75 | 53.15 |
sag_Latn | 43.24 | 43.24 | 41.44 | 40.54 | 47.75 |
sah_Cyrl | 40.54 | 35.14 | 44.14 | 44.14 | 54.95 |
sba_Latn | 42.34 | 43.24 | 45.05 | 40.54 | 49.55 |
seh_Latn | 45.05 | 35.14 | 40.54 | 42.34 | 45.95 |
sin_Sinh | 39.64 | 38.74 | 39.64 | 42.34 | 45.95 |
slk_Latn | 53.15 | 50.45 | 44.14 | 47.75 | 53.15 |
slv_Latn | 47.75 | 45.05 | 55.86 | 51.35 | 49.55 |
sme_Latn | 45.95 | 45.05 | 42.34 | 41.44 | 48.65 |
smo_Latn | 38.74 | 40.54 | 43.24 | 44.14 | 53.15 |
sna_Latn | 50.45 | 30.63 | 43.24 | 45.95 | 60.36 |
snd_Arab | 44.14 | 45.05 | 56.76 | 51.35 | 56.76 |
som_Latn | 33.33 | 36.94 | 35.14 | 34.23 | 39.64 |
sop_Latn | 40.54 | 34.23 | 40.54 | 35.14 | 35.14 |
sot_Latn | 47.75 | 41.44 | 40.54 | 43.24 | 49.55 |
spa_Latn | 51.35 | 49.55 | 51.35 | 51.35 | 56.76 |
sqi_Latn | 42.34 | 43.24 | 52.25 | 52.25 | 57.66 |
srm_Latn | 35.14 | 41.44 | 39.64 | 37.84 | 45.05 |
srn_Latn | 45.95 | 53.15 | 54.05 | 48.65 | 51.35 |
srp_Latn | 59.46 | 48.65 | 58.56 | 54.05 | 58.56 |
ssw_Latn | 38.74 | 45.05 | 36.94 | 40.54 | 48.65 |
sun_Latn | 43.24 | 40.54 | 45.05 | 44.14 | 48.65 |
suz_Deva | 46.85 | 42.34 | 42.34 | 43.24 | 49.55 |
swe_Latn | 58.56 | 48.65 | 53.15 | 54.95 | 61.26 |
swh_Latn | 46.85 | 49.55 | 49.55 | 48.65 | 56.76 |
sxn_Latn | 42.34 | 36.94 | 44.14 | 44.14 | 46.85 |
tam_Taml | 44.14 | 53.15 | 59.46 | 48.65 | 60.36 |
tat_Cyrl | 47.75 | 47.75 | 45.95 | 48.65 | 54.05 |
tbz_Latn | 36.04 | 35.14 | 34.23 | 35.14 | 42.34 |
tca_Latn | 39.64 | 40.54 | 43.24 | 41.44 | 45.05 |
tdt_Latn | 40.54 | 38.74 | 48.65 | 45.05 | 52.25 |
tel_Telu | 33.33 | 45.95 | 50.45 | 45.95 | 49.55 |
teo_Latn | 33.33 | 37.84 | 26.13 | 31.53 | 41.44 |
tgk_Cyrl | 42.34 | 44.14 | 48.65 | 49.55 | 57.66 |
tgl_Latn | 48.65 | 41.44 | 46.85 | 51.35 | 51.35 |
tha_Thai | 43.24 | 42.34 | 43.24 | 37.84 | 47.75 |
tih_Latn | 43.24 | 37.84 | 40.54 | 36.04 | 54.05 |
tir_Ethi | 29.73 | 36.94 | 27.93 | 34.23 | 41.44 |
tlh_Latn | 51.35 | 45.95 | 45.95 | 41.44 | 53.15 |
tob_Latn | 44.55 | 43.64 | 41.82 | 38.18 | 50.00 |
toh_Latn | 42.34 | 39.64 | 40.54 | 40.54 | 42.34 |
toi_Latn | 44.14 | 45.05 | 34.23 | 36.04 | 45.05 |
toj_Latn | 43.24 | 40.54 | 36.94 | 43.24 | 42.34 |
ton_Latn | 42.34 | 42.34 | 42.34 | 44.14 | 52.25 |
top_Latn | 46.85 | 34.23 | 37.84 | 38.74 | 36.94 |
tpi_Latn | 48.65 | 44.14 | 52.25 | 48.65 | 49.55 |
tpm_Latn | 37.84 | 41.44 | 38.74 | 32.43 | 42.34 |
tsn_Latn | 40.54 | 36.04 | 38.74 | 34.23 | 37.84 |
tsz_Latn | 37.84 | 32.43 | 37.84 | 38.74 | 46.85 |
tuc_Latn | 45.95 | 44.14 | 47.75 | 44.14 | 48.65 |
tui_Latn | 42.34 | 38.74 | 38.74 | 37.84 | 50.45 |
tuk_Latn | 36.04 | 42.34 | 45.05 | 43.24 | 50.45 |
tum_Latn | 47.75 | 39.64 | 46.85 | 52.25 | 50.45 |
tur_Latn | 46.79 | 44.04 | 40.37 | 43.12 | 45.87 |
twi_Latn | 41.44 | 43.24 | 41.44 | 37.84 | 46.85 |
tyv_Cyrl | 38.74 | 38.74 | 43.24 | 44.14 | 45.05 |
tzh_Latn | 41.82 | 36.36 | 41.82 | 41.82 | 38.18 |
tzo_Latn | 39.64 | 43.24 | 34.23 | 29.73 | 41.44 |
udm_Cyrl | 36.94 | 38.74 | 42.34 | 44.14 | 47.75 |
ukr_Cyrl | 52.25 | 48.65 | 51.35 | 55.86 | 53.15 |
Lang | LLaMA 2-7B | mGPT-13B | BLOOM-7B1 | XGLM-7.5B | MaLA-500 |
ukr_Cyrl | 52.25 | 48.65 | 51.35 | 55.86 | 53.15 |
uzb_Latn | 45.05 | 49.55 | 37.84 | 46.85 | 54.05 |
uzn_Cyrl | 45.95 | 40.54 | 45.05 | 45.05 | 49.55 |
ven_Latn | 45.05 | 44.14 | 42.34 | 41.44 | 54.05 |
vie_Latn | 53.15 | 45.95 | 62.16 | 45.95 | 54.95 |
wal_Latn | 35.14 | 33.33 | 35.14 | 35.14 | 39.64 |
war_Latn | 48.65 | 39.64 | 37.84 | 45.05 | 54.95 |
wbm_Latn | 48.65 | 39.64 | 46.85 | 46.85 | 48.65 |
wol_Latn | 36.04 | 34.23 | 32.43 | 34.23 | 36.94 |
xav_Latn | 50.45 | 33.33 | 46.85 | 44.14 | 45.95 |
xho_Latn | 43.24 | 37.84 | 40.54 | 39.64 | 46.85 |
yan_Latn | 45.05 | 46.85 | 52.25 | 41.44 | 53.15 |
yao_Latn | 42.34 | 41.44 | 43.24 | 44.14 | 48.65 |
yap_Latn | 38.74 | 40.54 | 35.14 | 32.43 | 41.44 |
yom_Latn | 35.14 | 31.53 | 33.33 | 25.23 | 36.94 |
yor_Latn | 41.44 | 38.74 | 39.64 | 44.14 | 47.75 |
yua_Latn | 41.44 | 32.43 | 43.24 | 41.44 | 36.04 |
yue_Hani | 43.24 | 48.65 | 53.15 | 38.74 | 57.66 |
zai_Latn | 45.05 | 35.14 | 40.54 | 43.24 | 44.14 |
zho_Hani | 47.75 | 51.35 | 51.35 | 44.14 | 58.56 |
zlm_Latn | 54.05 | 49.55 | 57.66 | 56.76 | 64.86 |
zom_Latn | 50.45 | 42.34 | 44.14 | 43.24 | 48.65 |
zsm_Latn | 58.56 | 59.46 | 63.96 | 55.86 | 66.67 |
zul_Latn | 46.85 | 42.34 | 46.85 | 46.85 | 51.35 |
all | 44.07 | 40.98 | 43.98 | 43.24 | 48.89 |