Skip to content

szeke/etk

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

etk

travis ci This repository will contain our toolkit for extracting information from web pages. It will be built in stages to contain the following capabilities:

  • Several structure extractors to identify the main content of a page and tables
  • A host of data extractors for common entities, including people, places, phone, email, dates, etc.
  • A trainable algorithm to rank extractions
  • Automated experimentation to measure precision and recall of extractions

Setup

conda-env create .
source activate etk_env
python -m spacy download en

Run Tests

python -m unittest discover

Launch Jupyter Notebook

jupyter notebook etk_examples.ipynb
or
jupyter notebook etk_extraction_using_config.ipynb

Before running the code in the notebook, change the kernel to Python [conda env:etk_env]

About

Extraction Toolkit

Resources

License

Code of conduct

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Julia 94.5%
  • Python 3.3%
  • Jupyter Notebook 2.2%