GitHub - egrinstein/crawl-edx: Project containing spiders used to gather MOOC websites data

MOOC Crawler -- Crawl EDx

Crawler using scrapy to get the data from EDx's courses.

Gets names, universities, review count and value, duration, weekly effort, etc.

when exporting to .csv using scrapy, some problem with the "&" character seems to be messing up the line. Gonna build a pipeline to fix that when I have the time.

This crawl was made on September 24th, 2016. If EDx's site changes, the crawl will probably stop working. If that happens, feel free to send me a message and I might adapt it.

You can find a tutorial on how to crawl AJAX dependant pages at this link.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
mooc_crawler		mooc_crawler
.gitignore		.gitignore
README.md		README.md
out.json		out.json
scrapy.cfg		scrapy.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MOOC Crawler -- Crawl EDx

About

Releases

Packages

Languages

egrinstein/crawl-edx

Folders and files

Latest commit

History

Repository files navigation

MOOC Crawler -- Crawl EDx

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages