Skip to content

spk/maman

Repository files navigation

Maman

Maman is a Rust Web Crawler saving pages on Redis.

Pages are send to list <MAMAN_ENV>:queue:maman using Sidekiq job format

{
"class": "Maman",
"jid": "b4a577edbccf1d805744efa9",
"retry": true,
"created_at": 1461789979, "enqueued_at": 1461789979,
"args": {
    "document":"<html><body><a href='#' /><a href='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/new' /></html>",
    "urls": ["https://example.net/new"],
    "headers": {"content-type": "text/html"},
    "url": "https://example.net/"
    }
}

Dependencies

Installation

With cargo

cargo install maman

With just

PREFIX=~/.local just install

Usage

maman URL [LIMIT] [MIME_TYPES]

LIMIT must be an integer or 0 is the default, meaning no limit.

Environment variables

Defaults

  • MAMAN_ENV=development
  • REDIS_URL="redis:https://127.0.0.1/"

Others

  • RUST_LOG=maman=info

LICENSE

The MIT License

Copyright (c) 2016-2021 Laurent Arnoud [email protected]


Build Version Documentation License Dependency status