Skip to content

Assignment 4 for the course "Language Analytics" at Aarhus University. Solution by Anton Drasbæk Schiønning.

Notifications You must be signed in to change notification settings


Repository files navigation

Assignment 4: Text Classification using Finetuned Transformers

Repository Overview

  1. Description
  2. Repository Tree
  3. Usage
  4. Modified Usage
  5. Results
  6. Discussion


This repository includes the solution by Anton Drasbæk Schiønning (202008161) to assignment 4 in the course "Language Analytics" at Aarhus University.

It provides a framework for doing emotion classification of headlines from the Fake News Dataset by utilizing a Huggingface pipeline. The dataset consists of over 7000 news headlines, texts and corresponding labels (real/fake). The HuggingFace model used to do the classification is j-hartmann/emotion-english-distilroberta-base which is a fine-tuned version of the destilled RoBERTa model for emotion classification.

Visualizations are also made to provide an overview of the classifications.

Repository Tree

├── data
│   ├── classified_titles_emotion-english-distilroberta-base.csv   <---- headlines with classifications and score
│   └── fake_or_real_news.csv                                      <---- original dataset for real/fake news
├── out
│   └── results_emotion-english-distilroberta-base    <---- example results
│       ├── classification_overview.csv                   <---- overview of all classifications         
│       ├── emotion_distribution.png                      <---- distribution of all emotions
│       └── emotions_by_label.png                         <---- share of emotions by headline type
├── requirements.txt
└── src
    ├──                                   <---- script for running classifications                                         
    └──                                  <---- script for creating visualizations/outputs


This analysis only assumes that you have Python3 installed and clone this GitHub repository. When this has been done, you can run the full analysis with the shell script:


This will achieve the following:

  • Create and activate a virtual environment
  • Install requirements to that environment
  • Classify emotions in all headlines ( using j-hartmann/emotion-english-distilroberta-base
  • Create and save visualizations of the classifications (
  • Deactivate the environment

The results are saved to the out directory under a subfolder, named after the model used. This result contains three files:

  • classification_overview.csv: Csv file with overview of how many headlines were classified as each emotion. Also splits the classifications by real and fake headlines.
  • emotion_distribution.png: Bar chart showing the distribution of emotions identified across all headlines.
  • emotion_by_label.png: Pie charts showing the distribution of emotions for real and fake headlines, presented side-by-side for an easy comparison.

Examples of these three files are also seen under Results.

Modified Usage

If you wish to use a different model for the emotion classifications, the repository also allows running a modified analysis. Firstly, run the setup bash script to create an environment and install requirements:


Run classifications

By default, uses j-hartmann/emotion-english-distilroberta-base for classifications. However, any other pretrained model from Huggingface for text classification can be used. Please note, that you should select a model specific to emotion classification if you wish to maintain the scope of the analysis.

When having selected a model, run classifications as such:

# uses distilbert-base-uncased-go-emotions-student for classification
python src/ -m "joeddav/distilbert-base-uncased-go-emotions-student"

You can find the data with classifications in data/classified_titles_{SELECTED_MODEL_NAME}.

Create Visualizations

Visualizations for a classification file can be created done by running the file. Again, you must specify which model was used for classifying the data in order for the visualization to cover the right data file:

python src/ -m "joeddav/distilbert-base-uncased-go-emotions-student"

From this, you will get a folder named out/results_{SELECTED_MODEL_NAME} which contains the three files mentioned earlier.

PLEASE NOTE: Visualizations are made to look neatly for classifications that use 7 emotions. If there are more or fewer in your model, visualizations may not look as neat. Regardless, classification_overview.csv for your classifications should still provide the needed overview.


Below are the results for running the classification with the default model, which can be found in the directory out/results_emotion-english-distilroberta-base.

Table: Classifcation Overview

Predicted Emotion All Headlines Real Only Fake Only
Anger 795 383 412
Disgust 434 186 248
Fear 1076 555 521
Joy 155 63 92
Neural 3180 1649 1531
Sadness 487 245 242

Plot: Emotion Distribution

alt text

Plot: Emotions by Label

alt text


Overall, the pie charts above reveal that the distribution of emotions in headlines is strikingly similar across the real and fake news. For both label types, the most common emotion by far is neutral with a 52% share for real headlines and 48% for fake ones. Also, joy is the rarest emotion in both the real and fake headlines. Perhaps the most noticeable discrepancy is that 7.8% of fake headlines are classified as disgust whereas it is just 5.9% for real ones.

The main takeaway remains that the fake headlines are extremely similar to the real ones, when it comes to the primary emotion displayed, according to this analysis. This implies that emotions in the headlines are not a good indicator of whether or not it is a real headline. Still, it should be emphasized that these results are just based on the classfications by j-hartmann/emotion-english-distilroberta-base and using a different classification model may have produced different results. If interested, exploring this can easily be achieved by following Modified Usage.


Assignment 4 for the course "Language Analytics" at Aarhus University. Solution by Anton Drasbæk Schiønning.






No releases published


No packages published