Skip to content

riccardo-forina/status-jquery-crawler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 

Repository files navigation

status-jquery-crawler

Check for broken links in yout website with jQuery

Version 0.2.0

(c) 2012 Riccardo Forina Status.js may be freely distributed under the MIT license. For details and documentation: https://www.codingnot.es

Status.js, a jQuery crawler

Preview of Status.js

Demo

Demo: See it in action!

How does it work?

Status.js will scan the website it's hosted on from the root (/) for links.
Internal links will be followed (fetched through Ajax) and scanned again. Yes, it's recursive.
External links will be memorized and used for cross-referencies. You can't check if an external link is broken with Status.js because of the cross-domain limitation of Ajax calls.

A table with some nice data will be populated in real time while Status.js is working.

Sometimes something more graphic is better, so I implemented the Javascript InfoVis Toolkit (Jit) in Status.js to plot the website as a graph you can interact with.

Last but not lest, there is a sitemap.xml generator that makes use of the crawler work. Nothing fancy, but if you can't generate a sitemap in a more correct way it can be useful.

## How is it done?

Status.js is a Backbone application.
For Ajax and DOM manipulation, there is jQuery.
The url manipulation is powered by jsUri. Plotting done with Javascript InfoVis Toolkit (Jit). The GUI part is Twitter Bootstrap.

What can I check with Status.js?

Url

The url of the page. To avoid duplication, hashes will be removed.

Title

Available for internal pages only, the title tag is fetched. If not present, you'll get a {No title} placeholder.

Description

Available for internal pages only, the meta name="description" tag is fetched. If not present, you'll get a {No description} placeholder.

Status

Because of the Javascript-in-a-browser limitations, we can handle only these statuses:

Success
Available for internal pages only, means a correctly fetched page.
External
Indicates an external link.
Redirect
Indicates that there is another page for the same url but with a trailing slash. It's an hack around the browser that does not return any 30x http code
Error
_Broken link!_
Unfetched
Page memorized but waiting to be crawled.

### Out links

It's the number of internal and external links present in the page. Clicking on the number you'll get the full list.

In links

It's the number of pages that link to the url. Clicking on the number you'll get the full list.

TODO list

This is a list of some of the things I'll have to work on. Please feel free to contribute with suggestions!

  • Warnings about duplicate/too long/missing titles/descriptions.
  • Verify for the presence of Google Analytics.
  • Check for broken images
  • Warning about missing/bad alt tags for images.
  • Pagination
  • Performance tests
  • Let's be honest... do tests!
  • Code cleaning, comments, etc.

Releases

No releases published

Packages

No packages published