Skip to content

kavorite/mutual-relevance-scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 

Repository files navigation

mutual-relevance-scraper

Simple PRAW script for scraping concatenated posts, both relevant and irrelevant to one another, depth-first from random Reddit comment pools. To use, first set the environment variables REDDIT_CLIENT_ID and REDDIT_CLIENT_SECRET, then run like so:

$ python data.py --length 0.1G --encoding utf-8 -o annotations.txt

note: while it is possible to redirect stdout, using --opath allows data.py to remember where it left off and account for progress accordingly.

  • data.py — print supervised fastText annotations to stdout

TODO: measure toxicity v. supportiveness: Some replies are constructive, and some aren't; we should be able to measure either a lack of attacks or presence of positive features in responses, though fastText may not represent the most accurate means to accomplish this

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages