Auto Extractor

An intelligent extractor library which learns the structures of the input web pages and then figures out a strategy for scraping the structured content.

Links

Developers:

Citation:

If you use this work, please cite: https://ieeexplore.ieee.org/abstract/document/7785739

@inproceedings{gowda2016clustering,
  title={Clustering Web Pages Based on Structure and Style Similarity (Application Paper)},
  author={Gowda, Thamme and Mattmann, Chris A},
  booktitle={Information Reuse and Integration (IRI), 2016 IEEE 17th International Conference on},
  pages={175--180},
  year={2016},
  organization={IEEE}
}

References :

K. Zhang and D. Shasha. 1989. "Simple fast algorithms for the editing distance between trees and related problems". SIAM J. Comput. 18, 6 (December 1989), 1245-1262.
Jarvis, R.A.; Patrick, Edward A., "Clustering Using a Similarity Measure Based on Shared Near Neighbors," in Computers, IEEE Transactions on , vol.C-22, no.11, pp.1025-1034, Nov. 1973

Name		Name	Last commit message	Last commit date
Latest commit History 108 Commits
apted		apted
autoext-spark		autoext-spark
autoext		autoext
screenshots		screenshots
visuals		visuals
.gitignore		.gitignore
LICENSE		LICENSE
NOTICE.txt		NOTICE.txt
OPENSOURCE-LICENCES.md		OPENSOURCE-LICENCES.md
README.md		README.md
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Auto Extractor

Links

Developers:

Citation:

References :

About

Releases

Packages

Languages

License

USCDataScience/autoextractor

Folders and files

Latest commit

History

Repository files navigation

Auto Extractor

Links

Developers:

Citation:

References :

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages