Skip to content

ChenghaoMou/edgar-crawler

 
 

Repository files navigation

EDGAR-CRAWLER: Unlock the Power of Financial Documents 🚀

This is a modified version of EDGAR-CRAWLER to crawl specific material contracts using scrapy.

Usage

# Update the settings in edgar_crawler/settings.py to match your needs
scrapy crawl exhibit -a start_year=2023 -a end_year=2023
# Deploy the spider to a cloud service
shub deploy

License

GNU General Public License v3.0

Citation

An EDGAR-CRAWLER paper is on its way. Until then, please cite the relevant EDGAR-CORPUS paper published at the 3rd Economics and Natural Language Processing (ECONLP) workshop at EMNLP 2021 (Punta Cana, Dominican Republic):

@inproceedings{loukas-etal-2021-edgar,
    title = "{EDGAR}-{CORPUS}: Billions of Tokens Make The World Go Round",
    author = "Loukas, Lefteris  and
      Fergadiotis, Manos  and
      Androutsopoulos, Ion  and
      Malakasiotis, Prodromos",
    booktitle = "Proceedings of the Third Workshop on Economics and Natural Language Processing",
    month = nov,
    year = "2021",
    address = "Punta Cana, Dominican Republic",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2021.econlp-1.2",
    pages = "13--18",
}

Read the paper here: https://aclanthology.org/2021.econlp-1.2/

Languages

  • Python 98.2%
  • Dockerfile 1.8%