News to emoji

This project demonstrates how to fine-tune OpenAI GPT models for news classification tasks. It also includes the source code for the Streamlit application, available online at news2emoji.

Scripts

The scripts have a numerical prefix associated with the execution order. Also, the Makefile includes few commands with the OpenAI CLI tool.

Dataset

The dataset consists of the short news messages extracted from the Telegram channel @varlamov_news. Each message has reactions with a maximum of 10 mutually exclusive emoticons. This means that the person can only select one emoji per news post.

completion_mapping = {
    "❤️": "heart",
    "👍": "positive",
    "👎": "negative",
    "🤔": "thinking",
    "😢": "cry",
    "🤣": "laughing",
    "😱": "scream",
    "🤬": "symbols",
    "🤡": "clown",
    "💩": "shit",
}

Classification models

In this task the model predicts a single token.

model	trained tokens	costs
ada	2,615,288	$1.05
curie	2,615,288	$7.85
davinci	653,822	$19.61

This chart shows the Validation loss.

This chart shows the Validation token accuracy.

Generation models

In this task the model predicts a sequence of tokens.

model	trained tokens	costs
ada	1,721,474	$0.69
curie	1,721,474	$5.16

This chart shows the Validation loss.

spaCy baseline

The simple spaCy baseline model achieved only 36% accuracy.

spaCy tutorial

TODO

Fine-tune and evaluate more generations models
Build a strong baseline model

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
app		app
images		images
.env.example		.env.example
.gitignore		.gitignore
1_scrape.py		1_scrape.py
2_prepare_classification.py		2_prepare_classification.py
2_prepare_generation.py		2_prepare_generation.py
3_check.py		3_check.py
3_split.py		3_split.py
4_preprocess.py		4_preprocess.py
5_predict.py		5_predict.py
Makefile		Makefile
README.md		README.md
requirements.txt		requirements.txt
spacy.cfg		spacy.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

News to emoji

Scripts

Dataset

Classification models

Generation models

spaCy baseline

spaCy tutorial

TODO

About

Releases

Packages

Languages

1ytic/news2emoji

Folders and files

Latest commit

History

Repository files navigation

News to emoji

Scripts

Dataset

Classification models

Generation models

spaCy baseline

spaCy tutorial

TODO

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages