Skip to content

dubyadud/screddit

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Screddit: Reddit Scraper

Scrapes subreddit posts for ID, author, title, content, upvotes and reply-to fields

Usage

Command Line

php cl.php -mMAX_RESULTS -sSUBREDDIT -rREPLIES_BOOLEAN

Examples: php cl.php -m1000 -sgaming -rtrue will scrape 1000 posts from the r/gaming subreddit and fetch the comments for each post php cl.php -m500 -sgaming -rfalse will scrape 500 posts from the r/gaming subreddit without comments

Web

Navigating to https://YOURSERVER.COM/screddit/index.php without URL parameters will generate CSV files for up to 100k posts from the /r/minecraft subreddit. Hitting the same URL while a job is in process will display the message: "Work is in progress on subreddit minecraft, check back here for the zip file when it has completed."

Once it is done, that same page will have a link to the completed zip file.

You can also pass in the following URL parameters: sub: Changing this will change the subreddit that you are going to get posts from, so

https://YOURSERVER.COM/screddit/?sub=news

Will create 100k (or until it runs out) post files from the /r/news subreddit.

max: Pass this in if you want to change the maximum number of results. This is handy for testing, or if you know you want a smaller result / faster run.

https://YOURSERVER.COM/screddit/?max=100

Will create 100 files from the /r/minecraft subreddit.

delete: Pass this to delete an existing zip so you can re-run it after it's completed. I don't have any hooks to stop a running process yet, so if you delete something mid-run it'll keep going, you'd just lose whatever progress had already been made.

https://YOURSERVER.COM/screddit/?delete=true

Will delete any existing /r/hearthstone files

stop: Pass this to stop all current work

These are best in combination, so if you wanted to delete your run of /r/news you could pass

https://YOURSERVER.COM/screddit/?delete=true&sub=news

Or if you wanted to do a quick run of a different subreddit

https://YOURSERVER.COM/screddit/?max=100&sub=funny

Only one job can be run at a time to avoid running into Reddit's API limits.

License

This software is licensed under the MIT license

About

Reddit Scraper

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • PHP 100.0%