HEPcrawl is a harvesting library based on Scrapy (http:https://scrapy.org) for INSPIRE-HEP (http:https://inspirehep.net) that focuses on automatic and semi-automatic retrieval of new content from all the sources the site aggregates. In particular content from major and minor publishers in the field of High-Energy Physics.
The project is currently in early stage of development.
See full documentation at http:https://pythonhosted.org/hepcrawl