Ohio Energy Provider Comparison

Scrapes Energy Choice Ohio's provider comparison tables.

ELECTRIC
GAS

Usage

With PDM

Setup environment
- $ pdm install
Run start script
- $ pdm start

Without PDM

Create venv
- $ virtualenv .venv
Activate venv
- Linux: $ . .venv/bin/activate
Install requirements
- $ pip install -r requirements.txt
cd to app directory
- $ cd ohioenergy
Run crawler(s)
- $ python main.py

Notes

Run Scrapy spiders from a Python script

Scrapy's CrawlerRunner, for running multiple crawlers

Utilized twisted for async crawls.

Example single crawler, using the ohioenergy.spiders.ohioenergyproviders.OhioenergyprovidersSpider spider:

from twisted.internet import reactor
from scrapy.crawler import CrawlerRunner

from ohioenergy.spiders.ohioenergyproviders import OhioenergyprovidersSpider

if __name__ == "__main__":
    
    configure_logging({"LOG_FORMAT": default_fmt})
    settings = get_project_settings()
    
    runner = CrawlerRunner(settings=settings)
    
    electric_providers = runner.crawl(OhioenergyprovidersSpider)
    
    ## Add runners and a twisted reactor.stop() to runner
    electric_providers.addBoth(lambda _: reactor.stop())
    
    ## Run crawlers
    reactor.run()

Example multiple crawlers, using hypothetical Crawler1 and Crawler2:

from twisted.internet import reactor
from scrapy.crawler import CrawlerRunner
from scrapy.utils.log import configure_logging
from scrapy.utils.project import get_projectsettings


class Spider1(scrapy.Spider):
    ...

class Spider2(scrapy.Spider):
    ...

if __name__ == "__main__":
    settings = get_project_settings()
    runner = CrawlerRunner(settings)

    ## Add spiders to runner
    runner.crawl(Spider1)
    runner.crawl(Spider2)

    ## Join crawlers
    crawl = runner.join()

    ## Set Twisted's reactor.stop()
    crawl.addBoth(lambda _: reactor.stop())

    ## Run crawler
    reactor.run()

Scrapy's CrawlerProcess

Use scrapy.crawler.CrawlerProcess to run spiders. Make sure to import spiders into the script.

Example using the ohioenergy.spiders.ohioenergyproviders.OhioenergyprovidersSpider spider:

## main.py

import scrapy
## Import CrawlerProcess
from scrapy.crawler import CrawlerProcess
## Import scrapy project's settings
from scrapy.utils.project import get_project_settings

## Import OhioenergyprovidersSpider
from ohioenergy.spiders.ohioenergyproviders import OhioenergyprovidersSpider

if __name__ == "__main__":
    
    ## Create CrawlerProcess object. Initialize with Scrapy project's settings
    process = CrawlerProcess(get_project_settings())
    
    ## Prepare crawl
    process.crawl(OhioenergyprovidersSpider)
    ## Start crawl
    process.start()

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
ohioenergy		ohioenergy
poetry_ver		poetry_ver
.gitignore		.gitignore
README.md		README.md
export_requirements.sh		export_requirements.sh
pdm.lock		pdm.lock
pyproject.toml		pyproject.toml
requirements.ci.txt		requirements.ci.txt
requirements.dev.txt		requirements.dev.txt
requirements.txt		requirements.txt
ruff.toml		ruff.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Ohio Energy Provider Comparison

Usage

With PDM

Without PDM

Notes

Run Scrapy spiders from a Python script

Scrapy's CrawlerRunner, for running multiple crawlers

Scrapy's CrawlerProcess

About

Releases

Packages

Languages

redjax/ohio_utility_scraper

Folders and files

Latest commit

History

Repository files navigation

Ohio Energy Provider Comparison

Usage

With PDM

Without PDM

Notes

Run Scrapy spiders from a Python script

Scrapy's CrawlerRunner, for running multiple crawlers

Scrapy's CrawlerProcess

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages