🗃 Open source self-hosted web archiving. Takes URLs/browser history/bookmarks/Pocket/Pinboard/etc., saves HTML, JS, PDFs, media, and more...
-
Updated
Sep 27, 2024 - Python
🗃 Open source self-hosted web archiving. Takes URLs/browser history/bookmarks/Pocket/Pinboard/etc., saves HTML, JS, PDFs, media, and more...
Wayback Machine API interface & a command-line tool
🌐 Guide and tools to run a full offline mirror of Wikipedia.org with three different approaches: Nginx caching proxy, Kiwix + ZIM dump, and MediaWiki/XOWA + XML dump
Official ArchiveBox browser extension: automatically/manually preserve your browsing history using ArchiveBox.
Desktop Electron app for ArchiveBox internet archiver. (ALPHA: not ready for general use)
Javascript/Node wrapper around Mozilla's Readability library so that ArchiveBox can call it as a oneshot CLI command to extract each page's article text.
Home of the official docker image for ArchiveBox
😇 A Docker Compose bundle to run on servers with spare CPU, RAM, disk, and bandwidth to help the world. Includes Tor, ArchiveWarrior, BOINC, and more...
Navigator for Web Archive
🎭 An introduction to the Internet Archiving ecosystem, tooling, and some of the ethical dilemmas that the community faces.
Home of the official apt/deb package for Ubuntu/Debian-based systems.
Homebrew formula for the ArchiveBox self-hosted internet archiving solution.
Source for the Github Wiki / ReadTheDocs documentation for AchiveBox, the self-hosted internet archiving solution.
Official Python package for ArchiveBox, the self-hosted internet archiving solution.
Scrape posts, threads from forums, news aggregators, mail archives, export to JSONL, mailbox, WARC
Pick a date and explore websites from the early days of the internet to now all in an easy-to-use browser format! 💻
Pick a date and explore websites from the early days of the internet to now all in an easy-to-use browser format! 💻
Passively capture, archive, and hoard your web browsing history, including the contents of the pages you visit, for later offline viewing, mirroring, and/or indexing. Your own personal private Wayback Machine that can also archive HTTP POST requests and responses, as well as most other HTTP-level data.
upload stuff to the Internet Archive using a shell script
Download and archive RSS feeds to Wayback Machine. Save a list of archived feed in locad db.
Add a description, image, and links to the internet-archiving topic page so that developers can more easily learn about it.
To associate your repository with the internet-archiving topic, visit your repo's landing page and select "manage topics."