Skip to content

nestarz/infrastructure-crawler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

47 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

infrastructure

Dark Crawler

Most hidden wikis are just scam directories. The sights that do not work are most likely just forgotten about sites that have went down. If you want to actually explore onions, find Daniel's directory link. I would give it to you, but I don't have it. -- reddit, ewna843

The crawler gather onions from Daniel's directory and put them in a stack. While the stack is filled, the crawler pop website from the stack and visit some pages of this website. From this website the crawler gather links to other websites and put them at the end of the stack. They are added only if they have not been visited.

The crawler also takes screenshot of visited websites and replace all NSFW images using a classifier, this is to prevent any harmful material being shown. Be aware that this classifier is not perfect, it use nsfw.js underhood. The crawler only accept html, stylesheets, images and fonts, other ressources requests, like scripts are intercepted and aborted to prevent any unwanted exposure.

Here is the pseudo-code:

use_proxy tor
stack = daniel_directory

while stack not empty
  website = pop stack
  visit website
  remove_nsfw_images website
  screenshot website
  stack = stack + extractlinks website

In order to speed up the crawling, multiple instances of the crawler can be launched, this is done using only one browser and multiple pages.

Technical details

The crawler is decomposed in 4 services, orchestrated using docker-compose.

  • Tor Socks5 proxy: Configure a Tor proxy to be used by other services
  • NSFW Classifier: API with image url classification if not safe for work using nsfw.js
  • Chrome Browser (Puppeteer): Crawl the web using Daniel's directory
  • Autoheal: Restart any unhealthy services, specially Tor proxy when the circuit seem down

Usage

After installing docker, go to the dark-crawler folder and execute this command:

docker-compose up -d

Then, serve the website and you will see the dark crawler in action.

About

Navigating in the Dark (net)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages