FLUE: French Language Understanding Evaluation



FLUE is an evaluation setup for French NLP systems similar to the popular GLUE benchmark. The goal is to enable further reproducible experiments in the future and to share models and progress on the French language. The tasks and data are obtained from existing works, please refer to our Flaubert paper for a complete list of references.

On this page we describe the tasks and provide examples of usage.

A leaderboard will be updated frequently here.

Table of Contents

1. Text Classification
2. Paraphrasing
3. Natural Language Inference
4. Parsing
    4.1. Constituency Parsing
    4.2. Dependency Parsing
5. Word Sense Disambiguation
    5.1. Verb Sense Disambiguation
    5.2. Noun Sense Disambiguation

In the following, you should replace $DATA_DIR with a location on your computer, e.g. ~/data/cls, ~/data/pawsx, ~/data/xnli, etc. depending on the task. Raw data is downloaded and saved to $DATA_DIR/raw by running the below command

bash get-data-${task}.sh $DATA_DIR

where ${task} is either cls, pawsx, xnli.

$MODEL_DIR denotes the path to where you save the pretrainded FlauBERT model, which contains 3 files:

  • *.pth: FlauBERT's pretrained model.
  • codes: BPE codes learned on the training data.
  • vocab: BPE vocabulary file.

You can download these pretrained models from here.

1. Text Classification (CLS)

Task description

This is a binary classification task. It consists in classifying Amazon reviews for three product categories: books, DVD, and music. Each sample contains a review text and the associated rating from 1 to 5 stars. Reviews rated above 3 is labeled as positive, and those rated less than 3 is labeled as negative.


The train and test sets are balanced, including around 1k positive and 1k negative reviews for a total of 2k reviews in each dataset. We take the French portion to create the binary text classification task in FLUE and report the accuracy on the test set.


bash flue/ $DATA_DIR

The ouput files (train and test sets) obtained from the above script are: $DATA_DIR/raw/cls-acl10-unprocessed/${lang}/${category}/${split}.review, where:

  • ${lang} includes de, en, fr, jp
  • ${category} includes books, dvd, music
  • ${split} includes train, test

In this task, we use the related datasets for French (fr).


a. Finetuning FlauBERT with Facebook's XLM library

In this example, we describe how to finetune FlauBERT using the XLM library.

We split the train set into train and valid sets for training and validation (the default validation ratio is set to be 0.2. You can change this ratio in the flue/ script).

Preprocess data

To finetune FlauBERT on this task using the XLM library, we need to do some data pre-processing steps as follows:

  • (1) Clean and tokenize text using Moses tokenizer
  • (2) Apply BPE codes and vocabulary learned in pre-training (we used fastBPE)
  • (3) Binarize data

Run the following command to split the training set and perform the above preprocessing steps:

bash flue/ $DATA_DIR $MODEL_DIR $do_lower

where $do_lower should be set to True (or true, on, 1) if you use the uncased (lower-case) pretrained model, otherwise it should be set to False (or false, off, 0).


Run the below command to finetune Flaubert with the parameters from the configuration file flue/examples/cls_books_lr5e6_xlm_base_cased.cfg as an example.

source $config

python flue/ --exp_name $exp_name \
                        --exp_id $exp_id \
                        --dump_path $dump_path  \
                        --model_path $model_path  \
                        --data_path $data_path  \
                        --dropout $dropout \
                        --transfer_tasks $transfer_tasks \
                        --optimizer_e adam,lr=$lre \
                        --optimizer_p adam,lr=$lrp \
                        --finetune_layers $finetune_layer \
                        --batch_size $batch_size \
                        --n_epochs $num_epochs \
                        --epoch_size $epoch_size \
                        --max_len $max_len \
                        --max_vocab $max_vocab

b. Finetuning FlauBERT with Hugging Face's Transformers library

Preprocess data

Run the below command to prepare data for finetuning. The tokenization (Moses and BPE) is handled later using FlaubertTokenizer class in the fine-tuning script.

python flue/ --indir $DATA_DIR/raw/cls-acl10-unprocessed \
                                 --outdir $DATA_DIR/processed \
                                 --do_lower true \
                                 --use_hugging_face true


Run the below command to finetune Flaubert using Hugging Face's Transformers library.

source $config

python ~/transformers/examples/ \
                                        --data_dir $data_dir \
                                        --model_type flaubert \
                                        --model_name_or_path $model_name_or_path \
                                        --task_name $task_name \
                                        --output_dir $output_dir \
                                        --max_seq_length 512 \
                                        --do_train \
                                        --do_eval \
                                        --learning_rate $lr \
                                        --num_train_epochs $epochs \
                                        --save_steps $save_steps \
                                        --fp16 \
                                        --fp16_opt_level O1 \
                                        |& tee output.log

You can add CUDA_VISIBLE_DEVICES=0,1 in the beginning of the above running command to run the fine-tuning on 2 GPUs for example.

2. Paraphrasing (PAWS-X)

Task description

The task consists in identifying whether the two sentences in a pair are semantically equivalent or not.


The train set includes 49.4k examples, the dev and test sets each comprises nearly 2k examples. We take the related datasets for French to perform the paraphrasing task and report the accuracy on the test set.


bash flue/ $DATA_DIR

The ouput files obtained from the above script are: $DATA_DIR/raw/x-final/${lang}, where ${lang} includes de, en, es, fr, ja, ko, zh. Each folder comprises 3 files: translated_train.tsv, dev_2k.tsv, and test_2k.tsv

In this task, we use the related datasets for French (fr).


a. Finetuning FlauBERT with Facebook's XLM library

Preprocess data

The preprocessing includes 3 steps as described in the example in the text classification task.

bash flue/ $DATA_DIR $MODEL_DIR $do_lower


Run the below command to finetune Flaubert with the parameters from an input configuration file.

source $config

python flue/ --exp_name $exp_name \
                        --exp_id $exp_id \
                        --dump_path $dump_path  \
                        --model_path $model_path  \
                        --data_path $data_path  \
                        --dropout $dropout \
                        --transfer_tasks $transfer_tasks \
                        --optimizer_e adam,lr=$lre \
                        --optimizer_p adam,lr=$lrp \
                        --finetune_layers $finetune_layer \
                        --batch_size $batch_size \
                        --n_epochs $num_epochs \
                        --epoch_size $epoch_size \
                        --max_len $max_len \
                        --max_vocab $max_vocab

b. Finetuning FlauBERT with Hugging Face's transformers library

Preprocess data

Run the below command to prepare data for finetuning. The tokenization (Moses and BPE) is handled later using FlaubertTokenizer class in the fine-tuning script.

python flue/ --indir ~/Data/FLUE/pawsx/raw/x-final \
                             --outdir ~/Data/FLUE/pawsx/processed \
                             --use_hugging_face True


Run the below command to finetune Flaubert on PAWSX dataset using Hugging Face's Transformers library.

source $config

python ~/transformers/examples/ \
                                        --data_dir $data_dir \
                                        --model_type flaubert \
                                        --model_name_or_path $model_name_or_path \
                                        --task_name $task_name \
                                        --output_dir $output_dir \
                                        --max_seq_length 512 \
                                        --do_train \
                                        --do_eval \
                                        --learning_rate $lr \
                                        --num_train_epochs $epochs \
                                        --save_steps $save_steps \
                                        --fp16 \
                                        --fp16_opt_level O1 \
                                        |& tee output.log

3. Natural Language Inference (XNLI)

Task description

The Natural Language Inference (NLI) task, also known as recognizing textual entailment (RTE), is to determine whether a premise entails, contradicts or neither entails nor contradicts a hypothesis. We take the French part of the XNLI corpus to form the development and test sets for the NLI task in FLUE.


The train set includes 392.7k examples, the dev and test sets comprises 2.5k and 5k examples respectively. We take the related datasets for French to perform the NLI task and report the accuracy on the test set.


bash flue/ $DATA_DIR

The output files from the above script are: $DATA_DIR/processed/fr.raw.${split}, where ${split} includes train, valid, test.


a. Finetuning FlauBERT with Facebook's XLM library

Preprocess data

The preprocessing includes 3 steps as described in the example in the text classification task.

bash flue/ $DATA_DIR $MODEL_DIR $do_lower


Run the below command to finetune Flaubert with the parameters from an input configuration file.

source $config

python flue/ --exp_name $exp_name \
                        --exp_id $exp_id \
                        --dump_path $dump_path  \
                        --model_path $model_path  \
                        --data_path $data_path  \
                        --dropout $dropout \
                        --transfer_tasks $transfer_tasks \
                        --optimizer_e adam,lr=$lre \
                        --optimizer_p adam,lr=$lrp \
                        --finetune_layers $finetune_layer \
                        --batch_size $batch_size \
                        --n_epochs $num_epochs \
                        --epoch_size $epoch_size \
                        --max_len $max_len \
                        --max_vocab $max_vocab

b. Finetuning FlauBERT with Hugging Face's transformers library

Coming soon.

4. Parsing

4.1. Constituency Parsing

The French Treebank collection is freely available for research purposes. See here to download the latest version of the corpus and sign the license, and here to obtain the version of the corpus used for the experiments described in the paper.

To fine-tune FlauBERT on constituency parsing on the French Treebank, see instructions here.

Pretrained parsing models for both FlauBERT and CamemBERT are now available!

4.2. Dependency Parsing

To fine-tune FlauBERT on dependency parsing, see instructions here.

Pretrained models for both FlauBERT and CamemBERT are available!

5. Word Sense Disambiguation

5.1. Verb Sense Disambiguation

To evaluate Flaubert on the French Verb Sense Disambiguation task:

1. Download the FrenchSemEval (FSE) dataset available here (called $FSE_DIR hereafter)

2. Prepare the data

python --data $FSE_DIR --output $DATA_DIR

3. Run the model and evaluate with

python --exp_name myexp --model flaubert-base-cased --data $DATA_DIR --padding 80 --batchsize 32 --device 0 --output $OUTPUT_DIR

You can use this script to evaluate either a pretrained-model or your own model (from checkpoint). Yet It has to be one of the Flaubert/Camembert/Bert class of the Hugginface API.

See further options in the flue/wsd/verbs/ directory.

5.2. Noun Sense Disambiguation

To fine-tune Flaubert for French WSD with WordNet as sense inventory, you can follow the scripts located in the directory wsd/nouns, which allow you to:

  • Automatically download our publicly available dataset from this address
    → See the script
  • Download the disambiguate toolkit from this repository
    → See the script
  • Prepare the training/development data from the French SemCor and French WordNet Gloss Corpus
    → See the script
  • Train the neural model (assumes that $FLAUBERT_PATH is the path to a Flaubert model)
    → See the script
  • Evaluate the model on the French SemEval 2013 task 12 corpus
    → See the script

Once the model is trained, you can disambiguate any text using the script


If you use FlauBERT or the FLUE Benchmark for your scientific publication, or if you find the resources in this repository useful, please refer to our paper:

    title={FlauBERT: Unsupervised Language Model Pre-training for French},
    author={Hang Le and Loïc Vial and Jibril Frej and Vincent Segonne and Maximin Coavoux and Benjamin Lecouteux and Alexandre Allauzen and Benoît Crabbé and Laurent Besacier and Didier Schwab},