Skip to content

How good is BERT ? Comparing BERT to other state-of-the-art approaches on a French sentiment analysis dataset

License

Notifications You must be signed in to change notification settings

pika0208/french-sentiment-analysis-with-bert

Repository files navigation

French sentiment analysis with BERT

How good is BERT ? Comparing BERT to other state-of-the-art approaches on a large-scale French sentiment analysis dataset 📚

The contribution of this repository is threefold.

  • Firstly, I introduce a new dataset for sentiment analysis, scraped from Allociné.fr user reviews. It contains 100k positive and 100k negative reviews divided into 3 balanced splits: train (160k reviews), val (20k) and test (20k). At my knowledge, there is no dataset of this size in French language available on the internet.

  • Secondly, I share my code for French sentiment analysis with BERT, based on CamemBERT, and the 🤗Transformers library.

  • Lastly, I compare BERT results with other state-of-the-art approaches, such as TF-IDF and fastText, as well as other non-contextual word embeddings based methods.

Installation

If you want to experiment with the training code, follow these steps:

# Download repo and its dependencies 
git clone https://github.com/TheophileBlard/french-sentiment-analysis-with-bert/
cd french-sentiment-analysis-with-bert
pipenv install

# Extract dataset
cd allocine_dataset && tar xvjf data.tar.bz2 && cd ..

# Activate virtualenv and open-up BERT notebook
pipenv shell
jupyter notebook 03_bert.ipynb 

But if you only need the model for inference, the following code should do the trick:

from transformers import pipeline

tokenizer = AutoTokenizer.from_pretrained("tblard/tf-allocine")
model = TFAutoModelForSequenceClassification.from_pretrained("tblard/tf-allocine")
nlp = pipeline('sentiment-analysis', model=model, tokenizer=tokenizer)

print(nlp("Alad'2 est clairement le meilleur film de l'année 2018.")) # POSITIVE
print(nlp("Juste whoaaahouuu !")) # POSITIVE
print(nlp("NUL...A...CHIER ! FIN DE TRANSMISSION.")) # NEGATIVE
print(nlp("Je m'attendais à mieux de la part de Franck Dubosc !")) # NEGATIVE

Dataset

The dataset is made available as .jsonl files, as well as a .pickle file. Some examples from the training set are presented in the following table:

Review Polarity
Magnifique épopée, une belle histoire, touchante avec des acteurs qui interprètent très bien leur rôles (Mel Gibson, Heath Ledger, Jason Isaacs...), le genre de film qui se savoure en famille! Positive
N'étant pas fan de SF, j'ai du mal à commenter ce film. Au moins, dirons nous, il n'y a pas d'effets spéciaux et le thème de ces 3 derniers survivants, un blanc, un maori, une blanche est assez bien traité. Mais c'est quand même bien longuet ! Negative
Les scènes s'enchaînent de manière saccadée, les dialogues sont théâtraux, le jeu des acteurs ne transcende pas franchement le film. Seule la musique de Vivaldi sauve le tout. Belle déception. Negative

For more information, please refer to the dedicated page.

Results

Full dataset

Model Validation Accuracy Validation F1-Score Test Accuracy Test F1-Score
CamemBERT 97.39 97.36 97.44 97.34
RNN 94.39 94.34 94.58 94.39
TF-IDF + LogReg 94.35 94.29 94.38 94.19
CNN 93.69 93.72 94.10 93.98
fastText (unigrams) 92.88 92.75 92.90 92.57

CamemBERT outperforms all other models by a large margin.

Learning curves

Test accuracy as a function of training dataset size.

With only 500 training examples, CamemBERT is already showing better results that any other model trained on the full dataset. This is the power of modern language models and self-supervised pre-training.

For this kind of tasks, RNNs need a lot of data (>100k) to perform well. The same result (for English language) is empirically observed by Alec Radford in these slides.

Inference time

Time taken by a model to perform a single prediction (averaged on 1000 predictions).

As one would expect, the slowest model is CamemBERT, followed by TF-IDF.

On the other hand, fastText performs the ... fastest, but is actually slow compared to the original implementation, because of the overhead of Python and Keras.

Generalizability

I considered the text classification task from FLUE (French Language Understanding Evaluation) to evaluate the cross-domain generalization capabilities of the models. This is also a binary classification task, but on Amazon product reviews.

There is one train and test set for each product category (books, DVD and music). The train and test sets are balanced, including around 1000 positive and 1000 negative reviews, for a total of 2000 reviews in each dataset.

I didn't do any additional training, only inference on the test sets. The resulting accuracies are reported in the following table:

Model Books DVD Music
CamemBERT 94.10 93.25 94.55
TF-IDF + LogReg 87.10 88.10 87.45
CNN 85.80 88.75 87.25
RNN 85.30 87.55 87.50
fastText (unigrams) 85.25 87.10 86.65

Without additional training on domain-specific data, the CamemBERT model outperforms finetuned CamemBERT & FlauBERT models reported in (He et al., 2020). Update: FlauBERT (Large) released 03/20 gets better results, but it is excessively heavy.

TF-IDF + LogReg also performs better than specifically-trained mBERT (Eisenschlos et al., 2019).

Online Demo

Open the online demo on Google Colab:

Colab Demo

Release History

  • 0.4.0
  • 0.3.0
    • Added Google Colab online demo
  • 0.2.0
    • Added inference time + generalizability
  • 0.1.0
    • First proper release
    • Learning curves & results for all models
  • 0.0.1
    • Work in progress

Task List

  • Dataset available
  • Models available
  • Results on full dataset
  • Learning curves
  • Inference time
  • Generalizability
  • Online demo
  • Predicting usefulness

Author

Théophile Blard – 📧 [email protected]

If you use this work (code or dataset), please cite as:

Théophile Blard, French sentiment analysis with BERT, (2020), GitHub repository, https://github.com/TheophileBlard/french-sentiment-analysis-with-bert

About

How good is BERT ? Comparing BERT to other state-of-the-art approaches on a French sentiment analysis dataset

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 99.1%
  • Python 0.9%