Skip to content

Project containing spiders used to gather MOOC websites data

Notifications You must be signed in to change notification settings

egrinstein/crawl-edx

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MOOC Crawler -- Crawl EDx

Crawler using scrapy to get the data from EDx's courses.

Gets names, universities, review count and value, duration, weekly effort, etc.

when exporting to .csv using scrapy, some problem with the "&" character seems to be messing up the line. Gonna build a pipeline to fix that when I have the time.

This crawl was made on September 24th, 2016. If EDx's site changes, the crawl will probably stop working. If that happens, feel free to send me a message and I might adapt it.

You can find a tutorial on how to crawl AJAX dependant pages at this link.

About

Project containing spiders used to gather MOOC websites data

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages