ScrapydWeb: Web app for Scrapyd cluster management, with support for Scrapy log analysis & visualization.

🔤 English | 🀄 简体中文

ScrapydWeb: Web app for Scrapyd cluster management, with support for Scrapy log analysis & visualization.

Scrapyd ❌ ScrapydWeb ❌ LogParser

📖 Recommended Reading

🔗 How to efficiently manage your distributed web scraping projects

🔗 How to set up Scrapyd cluster on Heroku

👀 Demo

🔗 scrapydweb.herokuapp.com

⭐ Features

View contents

💠 Scrapyd Cluster Management
- 💯 All Scrapyd JSON API Supported
- ☑️ Group, filter and select any number of nodes
- 🖱️ Execute command on multinodes with just a few clicks
🔍 Scrapy Log Analysis
- 📊 Stats collection
- 📈 Progress visualization
- 📑 Logs categorization
🔋 Enhancements
- 📦 Auto packaging
- 🕵️‍♂️ Integrated with 🔗 LogParser
- ⏰ Timer tasks
- 📧 Monitor & Alert
- 📱 Mobile UI
- 🔐 Basic auth for web UI

💻 Getting Started

View contents

⚠️ Prerequisites

❗ Make sure that 🔗 Scrapyd has been installed and started on all of your hosts.

‼️ Note that for remote access, you have to manually set 'bind_address = 0.0.0.0' in 🔗 the configuration file of Scrapyd and restart Scrapyd to make it visible externally.

⬇️ Install

Use pip:

pip install scrapydweb

❗ Note that you may need to execute python -m pip install --upgrade pip first in order to get the latest version of scrapydweb, or download the tar.gz file from https://pypi.org/project/scrapydweb/#files and get it installed via pip install scrapydweb-x.x.x.tar.gz

Use git:

pip install --upgrade git+https://github.com/my8100/scrapydweb.git

Or:

git clone https://github.com/my8100/scrapydweb.git
cd scrapydweb
python setup.py install

▶️ Start

Start ScrapydWeb via command scrapydweb. (a config file would be generated for customizing settings at the first startup.)
Visit http:https://127.0.0.1:5000 (It's recommended to use Google Chrome for a better experience.)

🌐 Browser Support

The latest version of Google Chrome, Firefox, and Safari.

✔️ Running the tests

View contents

$ git clone https://github.com/my8100/scrapydweb.git
$ cd scrapydweb

# To create isolated Python environments
$ pip install virtualenv
$ virtualenv venv/scrapydweb
# Or specify your Python interpreter: $ virtualenv -p /usr/local/bin/python3.7 venv/scrapydweb
$ source venv/scrapydweb/bin/activate

# Install dependent libraries
(scrapydweb) $ python setup.py install
(scrapydweb) $ pip install pytest
(scrapydweb) $ pip install coverage

# Make sure Scrapyd has been installed and started, then update the custom_settings item in tests/conftest.py
(scrapydweb) $ vi tests/conftest.py
(scrapydweb) $ curl http:https://127.0.0.1:6800

# '-x': stop on first failure
(scrapydweb) $ coverage run --source=scrapydweb -m pytest tests/test_a_factory.py -s -vv -x
(scrapydweb) $ coverage run --source=scrapydweb -m pytest tests -s -vv --disable-warnings
(scrapydweb) $ coverage report
# To create an HTML report, check out htmlcov/index.html
(scrapydweb) $ coverage html

🏗️ Built With

View contents

Front End
- 🔗 Element
- 🔗 ECharts
Back End
- 🔗 Flask

📋 Local Environment

Recommended (and tested) approach to set up the local environment:
1. Project is slightly outdated so, for best results install Python 3.7 specifically for the project
2. Make sure exact Python packages are installed from requirements.txt as some are outdated
3. FlaskSQlAlchemy package may be incompatible with some other packages, try downgrading
4. In project root directory run scrapydweb to initiate settings config file
5. Set DOMAIN, ALL_WORKERS and SCRAPYD_SERVER as:
DOMAIN = 'smbots-international.com'
ALL_WORKERS = ['US,US-minibots,US-linear,AR,AT,AU,BE,BR,CA,CH,DE,DK,ES,FI,FR,GB,IE,IN,IT,JP,KR,MX,NL,NO,PL,PT,SE,NZ,TR']

SCRAPYD_SERVERS = [
  ScrapydServer(
      f"scrapyd-us",
      f"scrapyd-us.{DOMAIN}",
      80,
      (USERNAME, PASSWORD),
      'US'
  )
]

6. Re-run the project again with scrapydweb it should start server on port 6800
7. To see timer tasks locally, use exported CSV from the prod database and import it into task.db SQLite database. Default db table is in /root/scrapydweb/scrapydweb/data/database/timer_tasks.db
8. To run Flask app in debugger/IDE, set FLASK_APP=/root/scrapydweb/scrapydweb/run.py

📋 Changelog

Detailed changes for each release are documented in the 🔗 HISTORY.md.

👨‍💻 Author

_my8100

👥 Contributors

_Kaisla

©️ License

This project is licensed under the GNU General Public License v3.0 - see the 🔗 LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 180 Commits
.circleci		.circleci
.github/ISSUE_TEMPLATE		.github/ISSUE_TEMPLATE
scrapydweb		scrapydweb
screenshots		screenshots
tests		tests
.codecov.yml		.codecov.yml
.coveragerc		.coveragerc
.gitignore		.gitignore
.pep8speaks.yml		.pep8speaks.yml
HISTORY.md		HISTORY.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
README_CN.md		README_CN.md
requirements-tests.txt		requirements-tests.txt
requirements.txt		requirements.txt
setup.py		setup.py
task.csv		task.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ScrapydWeb: Web app for Scrapyd cluster management, with support for Scrapy log analysis & visualization.

Scrapyd ❌ ScrapydWeb ❌ LogParser

📖 Recommended Reading

👀 Demo

⭐ Features

💻 Getting Started

⚠️ Prerequisites

⬇️ Install

▶️ Start

🌐 Browser Support

✔️ Running the tests

🏗️ Built With

📋 Local Environment

📋 Changelog

👨‍💻 Author

👥 Contributors

©️ License

About

Releases

Packages

Languages

License

mediabiz-dev/scrapydweb

Folders and files

Latest commit

History

Repository files navigation

ScrapydWeb: Web app for Scrapyd cluster management, with support for Scrapy log analysis & visualization.

Scrapyd ❌ ScrapydWeb ❌ LogParser

📖 Recommended Reading

👀 Demo

⭐ Features

💻 Getting Started

⚠️ Prerequisites

⬇️ Install

▶️ Start

🌐 Browser Support

✔️ Running the tests

🏗️ Built With

📋 Local Environment

📋 Changelog

👨‍💻 Author

👥 Contributors

©️ License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages