Name		Name	Last commit message	Last commit date
Latest commit History 436 Commits
api_docs		api_docs
config		config
docs		docs
notebooks		notebooks
resources		resources
test_data		test_data
text_extensions_for_pandas		text_extensions_for_pandas
tutorials		tutorials
.gitignore		.gitignore
.readthedocs.yml		.readthedocs.yml
.travis.yml		.travis.yml
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE.txt		LICENSE.txt
README.md		README.md
env.sh		env.sh
generate_docs.sh		generate_docs.sh
package.md		package.md
pandas_todo.md		pandas_todo.md
requirements.txt		requirements.txt
setup.py		setup.py

Repository files navigation

Text Extensions for Pandas

Natural language processing support for Pandas dataframes.

Text Extensions for Pandas turns Pandas DataFrames into a universal data structure for representing intermediate data in all phases of your NLP application development workflow.

Features

SpanArray: A Pandas extension type for spans of text

Connect features with regions of a document
Visualize the internal data of your NLP application
Analyze the accuracy of your models
Combine the results of multiple models

TensorArray: A Pandas extension type for tensors

Represent BERT embeddings in a Pandas series
Store logits and other feature vectors in a Pandas series
Store an entire time series in each cell of a Pandas series

Pandas front-ends for popular NLP toolkits

Installation

This library requires Python 3.7+, Pandas, and Numpy.

To install the latest release, just run:

pip install text-extensions-for-pandas

Depending on your use case, you may also need the following additional packages:

spacy (for SpaCy support)
transformers (for
ibm_watson (for IBM Watson support)

Installation from Source

If you'd like to try out the very latest version of our code, you can install directly from the head of the master branch:

pip install git+https://github.com/CODAIT/text-extensions-for-pandas

You can also directly import our package from your local copy of the text_extensions_for_pandas source tree. Just add the root of your local copy of this repository to the front of sys.path.

Documentation

For examples of how to use the library, take a look at the notebooks in this directory.

API documentation can be found at https://text-extensions-for-pandas.readthedocs.io/en/latest/

Contents of this repository

text_extensions_for_pandas: Source code for the text_extensions_for_pandas module.
env.sh: Script to create a conda environment pd capable of running the notebooks and test cases in this project
generate_docs.sh: Script to build the [API documentation]((https://readthedocs.org/projects/text-extensions-for-pandas/)
api_docs: Configuration files for generate_docs.sh
config: Configuration files for env.sh.
docs: Project web site
notebooks: example notebooks
resources: various input files used by our example notebooks
test_data: data files for regression tests. The tests themselves are located adjacent to the library code files.
tutorials: Detailed tutorials on using Text Extensions for Pandas to cover complex end-to-end NLP use cases (work in progress).

Instructions to run a demo notebook

Check out a copy of this repository
(optional) Use the script env.sh to set up an Anaconda environment for running the code in this repository.
Type jupyter lab from the root of your local source tree to start a JupyterLab environment.
Navigate to the notebooks directory and choose any of the notebooks there

Contributing

This project is an IBM open source project. We are developing the code in the open under the Apache License, and we welcome contributions from both inside and outside IBM.

To contribute, just open a Github issue or submit a pull request. Be sure to include a copy of the Developer's Certificate of Origin 1.1 along with your pull request.

Building and Running Tests

Before building the code in this repository, we recommend that you use the provided script env.sh to set up a consistent build environment:

$ ./env.sh myenv
$ conda activate myenv

(replace myenv with your choice of environment name).

To run tests, navigate to the root of your local copy and run:

pytest

To build pip and source code packages:

python setup.py sdist bdist_wheel

(outputs go into ./dist).

To build API documentation, run:

./generate_docs.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Text Extensions for Pandas

Features

SpanArray: A Pandas extension type for spans of text

TensorArray: A Pandas extension type for tensors

Pandas front-ends for popular NLP toolkits

Installation

Installation from Source

Documentation

Contents of this repository

Instructions to run a demo notebook

Contributing

Building and Running Tests

About

Releases 8

Packages

Contributors 8

Languages

License

CODAIT/text-extensions-for-pandas

Folders and files

Latest commit

History

Repository files navigation

Text Extensions for Pandas

Features

SpanArray: A Pandas extension type for spans of text

TensorArray: A Pandas extension type for tensors

Pandas front-ends for popular NLP toolkits

Installation

Installation from Source

Documentation

Contents of this repository

Instructions to run a demo notebook

Contributing

Building and Running Tests

About

Resources

License

Stars

Watchers

Forks

Releases 8

Packages 0

Contributors 8

Languages

Packages