Skip to content

Research into identifying and correcting incorrect labels in the CoNLL-2003 corpus.

License

Notifications You must be signed in to change notification settings

frreiss/Identifying-Incorrect-Labels-In-CoNLL-2003

 
 

Repository files navigation

Identifying-Incorrect-Labels-In-CoNLL-2003

Research into identifying and correcting incorrect labels in the CoNLL-2003 corpus.

To download the CoNLL-2003 corpus and apply label corrections to produce a corrected version of the corpus, run the commands below. The CoNLL-2003 corpus is licensed for research use only. Be sure to adhere to the terms of the license when using this data set!

pip3 install -r requirements.txt
python3 scripts/download_and_correct_corpus.py

This will download the CoNLL-2003 corpus to original_corpus/, apply corrections and save the corrected corpus in corrected_corpus/.

NOTE: Text Extensions for Pandas must be installed to run the script. It provides utilities to download and work with the CoNLL-2003 corpus and assist with NLP analysis on Pandas. The pip3 install command described above should install Text Extensions for Pandas for you.

About

Research into identifying and correcting incorrect labels in the CoNLL-2003 corpus.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Jupyter Notebook 98.9%
  • Python 1.1%