4Chan-Scraper

Scrapes a given board catalog on 4Chan for all comments, files, and associated metadata with help of the BASC 4Chan Python Library.

UPDATE for Python 3.9+/ Script Failure

Windows Navigation:

C:\Users\USER\AppData\Local\Programs\Python\Python3x\Lib\site-packages\basc_py4chan\util.py

LINUX Navigation:

/usr/lib/site-packages/python3.x/site-packages/basc_py4chan/util.py

Rename HTMLParser dependency from HTMLParser

# HTML parser was renamed in python 3.x
try:
    from html.parser import HTMLParser
except ImportError:
    from HTMLParser import HTMLParser
_parser = html.HTMLParser()

to the newly named dependency html:

# HTML parser was renamed in python 3.x
import html
_parser = html

Installation:

Download .zip from the github repo or clone using

git clone https://github.com/malavmodi/4Chan-Scraper.git

as well as install the required dependencies with pip:

pip3 install -r requirements.txt

Commands:

--board_name
- Board from 4Chan to Scrape (Required)
--num_threads:
- Number of threads to scrape (Required)
--debug:
- Additional log output (Optional / Case Insensitive)

Example Usage:

NOTE: For additional information on usage, run python 4chan_scraper.py -h to check options.

Scraping the first 5 threads of /pol/
- python 4chan_scraper.py --board_name "pol" --num_threads 5 --debug "False"

Runtime:

When running the script, it will create a folder with all associated data in the current working directory in a hierarchial structure as such:

Thread ID with Subject (if not Null) (Folder)
- Thread ID files (Folder)
  - File Data
- CSV with comments/replies from posts
- JSON formatted output of thread
- File Metadata
- Thread Metadata

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
4chan_scraper.py		4chan_scraper.py
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

4Chan-Scraper

UPDATE for Python 3.9+/ Script Failure

Installation:

Commands:

Example Usage:

Runtime:

About

Releases

Packages

Languages

malavmodi/4Chan-Scraper

Folders and files

Latest commit

History

Repository files navigation

4Chan-Scraper

UPDATE for Python 3.9+/ Script Failure

Installation:

Commands:

Example Usage:

Runtime:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages