Clickbait Filter

This project is an optional challenge of the mandatory course "Machine Learning for Natural Language Understanding" of my NLP master's degree program at Trier University in the winter semester 22/23.

Task

The Task was to train a clickbait filter to classify clickbait articles by their headline. I could freely decide how to prepare the data and which ML model to use for classification.

The challenge was considered passed if the model performs better than professor's baseline (a simple classifier; F1 ~0.89).

Dataset

The data consists of two files, a text file with clickbait headlines and one with headlines from news sources. The hold out dataset is organized the same way.

I'm not allowed to publish the train and validation datasets since they are a property of Computerlinguistik und Digital Humanities Department of the University of Trier.

Results

I implemented an LSTM model (Raschka, 2022, p.499) with dropouts using PyTorch library (./utils/models.py) It showed a quite good result on the validation set: F1-Score = 96.2% (./notebooks/validation_and_examples.ipynb) which is however can be easily overcome with Transformer architecture.

How to use

git clone https://github.com/bourgeois-radical/clickbait-detection.git

ClickbaitClassifier class (./utils/showing_results.py) provides a dunder-method which classifies every English sentence you give. Feel free to check the classifier in the "Showing model results" section (./notebooks/validation_and_examples.ipynb). But don't forget to move vocab.pkl (click to download) to the ./data folder and model_with_dropouts (click to download) to the ./notebooks folder beforehand.

References

Aggarwal, C. (2022). Machine Learning for Text (2nd ed.). Springer

Raschka, S., Liu Y., & Mirjalili, V. (2022). Machine Learning with PyTorch and Scikit-Learn. Packt

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
notebooks		notebooks
utils		utils
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
train_and_evaluate.py		train_and_evaluate.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Clickbait Filter

Task

Dataset

Results

How to use

References

About

Releases

Packages

Languages

bourgeois-radical/clickbait-detection

Folders and files

Latest commit

History

Repository files navigation

Clickbait Filter

Task

Dataset

Results

How to use

References

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages