Skip to content

Latest commit

 

History

History
13 lines (11 loc) · 1.08 KB

web-crawling.md

File metadata and controls

13 lines (11 loc) · 1.08 KB

Web Crawling

Libraries that analyze the content of websites.

  • anemone - Ruby library and CLI for crawling websites.
  • LinkThumbnailer - Ruby gem that generates thumbnail images and videos from a given URL. Much like popular social website with link preview.
  • Mechanize - Mechanize is a ruby library that makes automated web interaction easy.
  • MetaInspector - Ruby gem for web scraping purposes.
  • Upton - A batteries-included framework for easy web-scraping.
  • Wombat - Web scraper with an elegant DSL that parses structured data from web pages.
  • Apache Nutch - Highly extensible, highly scalable web crawler for production environment.
  • Crawler4j - Simple and lightweight web crawler.
  • JSoup - Scrapes, parses, manipulates and cleans HTML.