Skip to content

Basic preprocessing for NLP datasets in Pandas dataframe.

License

Notifications You must be signed in to change notification settings

brucewlee/nlpPandas

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

cc-by-nc-sa-4.0 spaCy PyPI textreader Dev Status

nlpPandas

nlpPandas does basic processing to an NLP dataset. Input a Pandas dataframe and output preprocessed dataframe.

Usage

>>> import nlp-pandas

>>> nlpPandas = nlp-pandas.pass_data(data = some_df, target_column = some_column)


"""
- preprocessor "strong": remove nan, lowercase, remove special characters, remove numbers, remove website links, remove emails, remove nextline (\n), remove repeating whitespace
- preprocessor "base": remove nan, remove website links, remove emails, remove nextline (\n), remove repeating whitespace, *give whitespace number ("3boys"->"3 boys")
- preprocessor "weak": remove nan, remove website links, remove emails, remove nextline (\n), remove repeating whitespace, *give whitespace number ("3boys"->"3 boys")
- preprocessor "custom": under dev
"""
>>> nlpPandas.use_preprocessor(preprocessor = "base")


"""
- analyzer (under dev) "base": give each word count (returns dictionary)
"""
>>> nlpPandas.use_analyzer(analyzer = "base")

Install

Install using pip

pip install nlpPandas

About

Basic preprocessing for NLP datasets in Pandas dataframe.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages