HebAugment: Enhancing Hebrew Sentiment Analysis through Data Augmentation

We hope to build a API around this project in the future, currently, it is publicly available for research purposes.

HebAugment: Enhancing Hebrew Sentiment Analysis through Data Augmentation

Overview

HebAugment is dedicated to advancing the field of Natural Language Processing (NLP) in Hebrew, particularly focusing on sentiment analysis. This project addresses the critical challenge of data scarcity in Hebrew by utilizing innovative data augmentation techniques. Through the strategic use of text translation and generation, HebAugment enriches the training dataset, paving the way for more accurate sentiment analysis models.

Paper

In the future, we aim to publish our paper Beyond the Data Scarcity Barrier: Unlocking Sentiment Analysis in Hebrew with Data Augmentation.

Structure

The repository is organized as follows to facilitate easy navigation and comprehension:

/data: Datasets and preprocessing scripts.
- test.csv: test data.
- train.csv: train data.
- /translated: English data and translations.
- /generated: generated data.
/models: Contains both the baseline and augmented model files.
- /baseline: The initial sentiment analysis models.
- /improved: Models trained on the augmented datasets.
/notebooks: Jupyter notebooks for exploratory data analysis, model training, and evaluation.
/src: Core source code.
- /augmentation: Data augmentation scripts.
- /classification: Sentiment classification algorithms.
- /evaluation: Scripts for model performance evaluation.
/experiments: Experimental configurations and results.
/scripts: Utility scripts for automation and data manipulation.

Future Work

Integrate a in-house LLM with a better hebrew tokenizer than OPEN-AI chatGPT 4
test on a wide varation of datasets
modul code

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
data		data
experiments		experiments
models		models
notebooks		notebooks
scripts		scripts
src		src
.gitignore		.gitignore
HebAugment.pdf		HebAugment.pdf
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HebAugment: Enhancing Hebrew Sentiment Analysis through Data Augmentation

Overview

Paper

Structure

Future Work

About

Releases

Packages

Contributors 5

Languages

License

amaruy/HebAugment

Folders and files

Latest commit

History

Repository files navigation

HebAugment: Enhancing Hebrew Sentiment Analysis through Data Augmentation

Overview

Paper

Structure

Future Work

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Languages

Packages