Skip to content
This repository has been archived by the owner on Jun 5, 2023. It is now read-only.
/ crude-seo-spider Public archive

Script for spidering a website and providing information to assist in search engine optimisation

Notifications You must be signed in to change notification settings

sbarakat/crude-seo-spider

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

38 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Crude SEO Spider

Provides a simple method of spidering a website to provide some basic url information to assist in Search Engine Optimisation.

Features

  • Detects duplicate content using MD5 hashes
  • Shows HTTP status codes for each url
  • Displays the response time and page size
  • Follows redirects
  • Export results to CSV format
  • Supports the Robots Exclusion Protocol (robots.txt)
  • Supports rel="" link attribute

Usage

For usage parameters run

./spider.pl -h

  1. First open and edit the spider.pl script and at the top set the full path to the lib directory.

  2. Modify the options in the spider.conf file, each option is commented so it should be self explanatory.

  3. Run the spider either by executing the script directly:

    ./spider.pl
    Or by running the script through perl:
    perl spider.pl

  4. While the script is running it will provide information on the currently tracked urls and will be outputting the information to results.txt file.

Options

To output to a CSV file provide the --csv=FILE perameter.

About

Script for spidering a website and providing information to assist in search engine optimisation

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages