Skip to content

In this repo you can find variety NLP projects for Persian and English.

Notifications You must be signed in to change notification settings

mmdrez4/awesome-nlp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 

Repository files navigation

Awesome NLP Projects for Persian and English

Welcome to the "awesome-nlp" repository! This repository contains a collection of Natural Language Processing (NLP) and Information Retrieval projects designed for addressing problems in both Persian and English languages. Below, you'll find a brief overview of each project included in this repository, along with links to their respective repositories.

Projects

The QA (Question Answering) project aims to classify news or articles into thematic categories. It employs a model that, given a document, predicts its subject category. This dataset encompasses seven thematic categories. The QA task is tackled using Hidden Markov Models (HMM) and a transformer model. You can access the fine-tuned transformer model on HuggingFace.

In this project, we delve into the detection and correction of bias in language models for both English and Persian languages. Bias in machine learning models can skew their decisions, and this project addresses bias related to race and gender. You can choose a language model, such as BERT, for this task.

This project is designed to recognize illegal Persian words that may have undergone certain modifications, including the introduction of non-Persian characters like English letters, numbers, and special characters. It aims to improve upon existing bots that may fail to detect illegal words with unrelated characters.

This project compiles and analyzes user reviews of "The Godfather" trilogy from IMDB. After preprocessing the data, it conducts sentiment analysis and compares the sentiment of each movie within the trilogy.

In this project, medical data is processed and preprocessed. It provides methods for information retrieval, including Boolean retrieval, TF-IDF, transformer-based models, and vector-based retrieval like FastText. Given a topic or illness, it retrieves relevant posts and articles.

The "SocialMediaHealth" project is the final project of Modern Information Retrieval. It implements an information retrieval system for social networks and health articles. It offers four retrieval methods: boolean, transformer-based, FastText, and TF-IDF retrieval. It utilizes Elasticsearch for search and supports query expansion, classification, and clustering.

This project preprocesses and analyzes scripts from the TV series "The Office." It conducts frequency analysis, lemmatization, keyword extraction, and sentiment analysis for different characters.

The "NLP_Finding_Recipe" project analyzes recipes for various foods from Wikipedia. It extracts recipe details, including ingredients and quantities.

Usage

Each project in this repository is self-contained with its own documentation and code. You can explore individual project directories for more details on usage and implementation.

Contributing

Feel free to fork this repository and contribute by submitting pull requests. Any feedback, bug reports, or suggestions for improvement are highly welcome.

License

This repository is open-source, and each project may have its own license. Please check the license files within individual project directories for specific licensing information.

About

In this repo you can find variety NLP projects for Persian and English.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages