-
Notifications
You must be signed in to change notification settings - Fork 1
Searching for misspelling, bad grammar, and violations of the Manual of Style in Wikipedia
License
cdbeland/moss
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
See https://en.wikipedia.org/wiki/Wikipedia:Typo_Team/moss HARDWARE REQUIREMENTS * It takes at least ~6GB of RAM just for the spell-check dictionary to load * Parallelized Python code is expecting 8 available CPU cores for optimal performance (if tweaking, look for "Pool(8)" in Python scripts) UBUNTU DEPENDENCIES sudo apt-get install git pyflakes3 pylint python3-pep8 python3-dev g++ protobuf-c-compiler FEDORA DEPENDENCIES sudo dnf install git python3-flake8 protobuf-c-compiler pylint python3-devel protobuf-devel GIT SETUP (optional) git config --global color.ui true git config --global push.default simple CLONE AND SET UP: cd ~ # Or wherever you'd like the clone to be parented git clone https://github.com/cdbeland/moss.git cd moss/ ./reset_environment.sh sudo mkdir /var/local/moss/bulk-wikipedia/ sudo chown $USER /var/local/moss/bulk-wikipedia/ Run "FIRST TIME SETUP" steps in ./update_downloads_parallel.sh DOWNLOAD DATA: # Do not do this if Wikipedia or Wiktionary dumps are in progress at: # https://dumps.wikimedia.org/backup-index-bydb.html ./update_downloads_parallel.sh # Wait for download to complete; you can watch progress with: # tail -f /var/local/moss/bulk-wikipedia/*log RUN SPELL CHECK AND FRIENDS: ./run_moss_parallel.sh TO SPELL CHECK ONLY ARTICLES WITH TITLES STARTING WITH "X": ./run_moss_parallel.sh --spell-check-only X Omit X to run spell check only and skip other reports. You can also specify any letter or number, or "BEFORE_A" and "AFTER_Z".
About
Searching for misspelling, bad grammar, and violations of the Manual of Style in Wikipedia
Resources
License
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published