Skip to content

ripgrep but for gzip-compressed files over http

Notifications You must be signed in to change notification settings

signalhunter/juicer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Juicer

It's ripgrep but for Gzip-compressed files over HTTP!

This tool was primarily designed to scan thru the Common Crawl dataset for URLs without spending a fortune on AWS.

Features:

  • Extremely fast regex engine (Intel Hyperscan)
  • Scan thru terabytes of data without writing them to disk
  • Concurrent scanning of multiple files

TODO:

  • Client/server for handing out scanning tasks
  • Zstandard support? (for IA WARCs)

About

ripgrep but for gzip-compressed files over http

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages