Skip to content

Commit

Permalink
Merge pull request #95 from meghdadFar/meghdadFar-docs-and-package-fixes
Browse files Browse the repository at this point in the history
Fix docs and package fixes
  • Loading branch information
meghdadFar committed Aug 7, 2023
2 parents d72aa51 + f1ee818 commit 3ba91e8
Show file tree
Hide file tree
Showing 10 changed files with 969,577 additions and 71,050 deletions.
6 changes: 6 additions & 0 deletions CHANGES.rst
Original file line number Diff line number Diff line change
@@ -1,3 +1,9 @@
Version 1.1.1
-------------
- Fix minor bugs in bias analysis.
- Improve fonts and minor details in bias analysis plots.


Version 1.1.0
-------------
- Add bias detection and analysis feature (based on sentiment analysis)
Expand Down
31 changes: 17 additions & 14 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ Wordview (Work In Progress)
.. image:: https://img.shields.io/pypi/dm/wordview
:alt: PyPI - Downloads

Wordview is a Python package for Exploratory Data Analysis (EDA) and Feature Extraction for text.
Wordview is a Python package for Exploratory Data Analysis (EDA) of text.
Wordview's Python API is open-source and available under the `MIT
license <https://en.wikipedia.org/wiki/MIT_License>`__.

Expand Down Expand Up @@ -51,24 +51,25 @@ Wordview calculates several statistics for labels in labeled datasets whether th
See `label analysis documentation pages <./docs/source/labels.rst>`__ for usage and examples.


Feature Extraction
###################

Wordview has various functionalities for feature extraction from text, including Multiword Expressions (MWEs), clusters, anomalies and
outliers, and more. See the following sections as well as the linked documentation page in each section for details.

Multiword Expressions
*********************

Extraction & Analysis of Multiword Expressions
**********************************************
Multiword Expressions (MWEs) are phrases that can be treated as a single
semantic unit. E.g. *swimming pool* and *climate change*. MWEs have
application in different areas including: parsing, language models,
language generation, terminology extraction, and topic models. Wordview can extract different types of MWEs from text.
See `MWEs documentation page <./docs/source/mwes.rst>`__ for usage and examples.

Anomalies and Outliers
**********************

Bias Analysis
**************
In the rapidly evolving realm of Natural Language Processing (NLP), downstream models are as unbiased and fair as the data on which they are trained.
Wordview Bias Analysis module is designed to assist in the rigorous task of ensuring that underlying training datasets are devoid of explicit negative biases related to categories such as gender, race, and religion.
By identifying and rectifying these biases, Wordview attempts to pave the way for the creation of more inclusive, fair, and unbiased NLP applications, leading to better user experiences and more equitable technology.
See the `bias analysis documentation page <./docs/source/bias.rst>`__ for usage and examples.


Analysis of Anomalies and Outliers
**********************************
Anomalies and outliers have wide applications in Machine Learning. While in
some cases, you can capture them and remove them from the data to improve the
performance of a downstream ML model, in other cases, they become the data points
Expand All @@ -78,8 +79,10 @@ Wordview offers several anomaly and outlier detection functions.
See `anomalies documentation page <./docs/source/anomalies.rst>`__ for usage and examples.


Clusters
*********


Cluster Analysis
****************
Clustering can be used to identify different groups of documents with similar information, in an unsupervised fashion.
Despite it's ability to provide valuable insights into your data, you do not need labeled data for clustering. See
`wordview`'s `clustering documentation page <./docs/source/clustering.rst>`__ for usage and examples.
Expand Down
2,001 changes: 0 additions & 2,001 deletions data/IMDB_Dataset_sample.csv

This file was deleted.

64 changes: 64 additions & 0 deletions data/mwes.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
{
"LVC": {
"SHOOT the binding": 26.024772726811,
"achieve this elusive": 24.700867756741808,
"manipulate the wildlife": 24.439810226089847,
"offset the darker": 24.024772726811,
"remove the bindings": 24.024772726811,
"Wish that Anthony": 23.898535506871717,
"Add some French": 23.501618538160958,
"grab a beer": 22.824678372319397,
"steal the 42": 22.50121077075399,
"invoke the spirit": 22.11788213120248
},
"NC2": {
"gordon willis": 20.73998574816305,
"Smoking Barrels": 20.73998574816305,
"sadahiv amrapurkar": 20.73998574816305,
"nihilism nothingness": 20.73998574816305,
"tomato sauce": 20.73998574816305,
"Picket Fences": 20.73998574816305,
"deja vu": 19.73998574816305,
"cargo bay": 19.73998574816305,
"zoo souvenir": 19.155023247441893,
"cake frosting": 19.155023247441893
},
"NC3": {},
"ANC2": {
"bite-sized chunks": 20.73998574816305,
"lizardly snouts": 20.73998574816305,
"behind-the-scenes featurette": 20.73998574816305,
"hidebound conservatives": 20.73998574816305,
"judicious pruning": 20.73998574816305,
"substantial gauge": 19.73998574816305,
"haggish airheads": 19.73998574816305,
"global warming": 19.73998574816305,
"Ukrainian flags": 19.155023247441893,
"well-lit sights": 19.155023247441893
},
"ANC3": {},
"VPC": {
"upside down": 12.673896557705278,
"Stay away": 12.489687330256716,
"put together.": 11.615864436333862,
"sit through": 10.932923610488164,
"ratchet up": 10.82859376031959,
"shoot'em up": 10.82859376031959,
"rip off": 10.719204186026548,
"hunt down": 10.673896557705278,
"screw up": 10.413556261040748,
"scorch out": 10.403479188352796
},
"NP": {
"every penny": 12.779983816094969,
"THE END": 12.067560406191555,
"A JOKE": 11.785789437776176,
"A LOT": 11.048823843609968,
"Either way": 11.033489730101863,
"An absolute": 10.717617935134596,
"half hour": 10.647669057572031,
"no qualms": 10.468522720258676,
"every cliche": 10.458055721207607,
"another user": 10.368209103825127
}
}
Loading

0 comments on commit 3ba91e8

Please sign in to comment.