Awesome NLP Projects for Persian and English

Welcome to the "awesome-nlp" repository! This repository contains a collection of Natural Language Processing (NLP) and Information Retrieval projects designed for addressing problems in both Persian and English languages. Below, you'll find a brief overview of each project included in this repository, along with links to their respective repositories.

Projects

1. QA (Persian)

The QA (Question Answering) project aims to classify news or articles into thematic categories. It employs a model that, given a document, predicts its subject category. This dataset encompasses seven thematic categories. The QA task is tackled using Hidden Markov Models (HMM) and a transformer model. You can access the fine-tuned transformer model on HuggingFace.

2. Bias Detection in Language Models (English and Persian)

In this project, we delve into the detection and correction of bias in language models for both English and Persian languages. Bias in machine learning models can skew their decisions, and this project addresses bias related to race and gender. You can choose a language model, such as BERT, for this task.

3. Filter Illegal Word (Persian)

This project is designed to recognize illegal Persian words that may have undergone certain modifications, including the introduction of non-Persian characters like English letters, numbers, and special characters. It aims to improve upon existing bots that may fail to detect illegal words with unrelated characters.

4. NLP_Godfather_Trilogy (English)

This project compiles and analyzes user reviews of "The Godfather" trilogy from IMDB. After preprocessing the data, it conducts sentiment analysis and compares the sentiment of each movie within the trilogy.

5. NLP_Bio_Data_Analysis (Persian)

In this project, medical data is processed and preprocessed. It provides methods for information retrieval, including Boolean retrieval, TF-IDF, transformer-based models, and vector-based retrieval like FastText. Given a topic or illness, it retrieves relevant posts and articles.

6. SocialMediaHealth (Persian)

The "SocialMediaHealth" project is the final project of Modern Information Retrieval. It implements an information retrieval system for social networks and health articles. It offers four retrieval methods: boolean, transformer-based, FastText, and TF-IDF retrieval. It utilizes Elasticsearch for search and supports query expansion, classification, and clustering.

7. NLP_The_Office_Transcription_Analysis (English)

This project preprocesses and analyzes scripts from the TV series "The Office." It conducts frequency analysis, lemmatization, keyword extraction, and sentiment analysis for different characters.

8. NLP_Finding_Recipe (Persian)

The "NLP_Finding_Recipe" project analyzes recipes for various foods from Wikipedia. It extracts recipe details, including ingredients and quantities.

Usage

Each project in this repository is self-contained with its own documentation and code. You can explore individual project directories for more details on usage and implementation.

Contributing

Feel free to fork this repository and contribute by submitting pull requests. Any feedback, bug reports, or suggestions for improvement are highly welcome.

License

This repository is open-source, and each project may have its own license. Please check the license files within individual project directories for specific licensing information.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Awesome NLP Projects for Persian and English

Projects

1. QA (Persian)

2. Bias Detection in Language Models (English and Persian)

3. Filter Illegal Word (Persian)

4. NLP_Godfather_Trilogy (English)

5. NLP_Bio_Data_Analysis (Persian)

6. SocialMediaHealth (Persian)

Files

README.md

Latest commit

History

README.md

File metadata and controls

Awesome NLP Projects for Persian and English

Projects

1. QA (Persian)

2. Bias Detection in Language Models (English and Persian)

3. Filter Illegal Word (Persian)

4. NLP_Godfather_Trilogy (English)

5. NLP_Bio_Data_Analysis (Persian)

6. SocialMediaHealth (Persian)