Added maximum idle waiting time MAX_IDLE_TIME_BEFORE_CLOSE. #193

nieweiming · 2021-04-26T07:56:13Z

新增空闲最大等待时间MAX_IDLE_TIME_BEFORE_CLOSE.
在设置中使用MAX_IDLE_TIME_BEFORE_CLOSE来表示最大的等待秒数.
不设置或为0时，则会一直等待.
MAX_IDLE_TIME_BEFORE_CLOSE不会影响SCHEDULER_IDLE_BEFORE_CLOSE的使用.

Added maximum idle waiting time MAX_IDLE_TIME_BEFORE_CLOSE.
Use MAX_IDLE_TIME_BEFORE_CLOSE in the settings to indicate the maximum number of seconds to wait.
If it is not set or 0, it will wait forever.
MAX_IDLE_TIME_BEFORE_CLOSE will not affect the use of SCHEDULER_IDLE_BEFORE_CLOSE.

rmax · 2021-04-26T12:33:15Z

Thanks for taking the time to send the PR.

How is this different from using SCHEDULER_IDLE_BEFORE_CLOSE setting? See https://github.com/rmax/scrapy-redis#usage

That feature uses a blocking redis operation to wait for the next request

scrapy-redis/src/scrapy_redis/scheduler.py

Lines 163 to 164 in fff0d82

 block_pop_timeout = self.idle_before_close 

 request = self.queue.pop(block_pop_timeout)

nieweiming · 2021-04-26T13:34:34Z

class RedisMixin(object):
    def setup_redis(self, crawler=None):
        ...
        self.server = connection.from_settings(crawler.settings)
        # The idle signal is called when the spider has no requests left,
        # that's when we will schedule new requests from redis queue
        crawler.signals.connect(self.spider_idle, signal=signals.spider_idle)

    def schedule_next_requests(self):
        """Schedules a request if available"""
        # TODO: While there is capacity, schedule a batch of redis requests.
        for req in self.next_requests():
            self.crawler.engine.crawl(req, spider=self)

    def spider_idle(self):
        """Schedules a request if available, otherwise waits."""
        # XXX: Handle a sentinel to close the spider.
        self.schedule_next_requests()
        raise DontCloseSpider

SCHEDULER_IDLE_BEFORE_CLOSE will not stop the crawler, because DontCloseSpider is always thrown,
So I hope that when the queue is idle for a period of time, it can end by itself.
The task is completed but in the running state, which means the occupation of resources;

rmax

Thanks for the contribution. Just a few points on time_ns() and non-english comments.

Could you add tests to ensure further changes do not break this feature?

This is a reference https://github.com/rmax/scrapy-redis/blob/master/tests/test_spiders.py#L113

src/scrapy_redis/spiders.py

rmax · 2021-04-26T14:07:53Z

Oh, please update the readme too with this new setting 🚀

rmax

Looks good to me! Thanks 🙏🏽

rmax requested changes Apr 26, 2021

View reviewed changes

src/scrapy_redis/spiders.py Outdated Show resolved Hide resolved

src/scrapy_redis/spiders.py Outdated Show resolved Hide resolved

nieweiming force-pushed the master branch from a18ad4f to db4ba2d Compare April 27, 2021 02:44

nieweiming closed this May 8, 2021

nieweiming force-pushed the master branch from 1b333a2 to 1d0fab0 Compare May 8, 2021 09:51

Added maximum idle waiting time MAX_IDLE_TIME_BEFORE_CLOSE.

b344cc4

nieweiming reopened this May 8, 2021

nieweiming requested a review from rmax May 8, 2021 10:08

rmax approved these changes May 8, 2021

View reviewed changes

rmax merged commit 6067555 into rmax:master May 8, 2021

Germey mentioned this pull request Dec 26, 2021

Release Patch Version 0.7.2 #206

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added maximum idle waiting time MAX_IDLE_TIME_BEFORE_CLOSE. #193

Added maximum idle waiting time MAX_IDLE_TIME_BEFORE_CLOSE. #193

nieweiming commented Apr 26, 2021

rmax commented Apr 26, 2021

nieweiming commented Apr 26, 2021

rmax left a comment

rmax commented Apr 26, 2021

rmax left a comment

Added maximum idle waiting time MAX_IDLE_TIME_BEFORE_CLOSE. #193

Added maximum idle waiting time MAX_IDLE_TIME_BEFORE_CLOSE. #193

Conversation

nieweiming commented Apr 26, 2021

新增空闲最大等待时间MAX_IDLE_TIME_BEFORE_CLOSE. 在设置中使用MAX_IDLE_TIME_BEFORE_CLOSE来表示最大的等待秒数. 不设置或为0时，则会一直等待. MAX_IDLE_TIME_BEFORE_CLOSE不会影响SCHEDULER_IDLE_BEFORE_CLOSE的使用.

rmax commented Apr 26, 2021

nieweiming commented Apr 26, 2021

rmax left a comment

Choose a reason for hiding this comment

rmax commented Apr 26, 2021

rmax left a comment

Choose a reason for hiding this comment

新增空闲最大等待时间MAX_IDLE_TIME_BEFORE_CLOSE.
在设置中使用MAX_IDLE_TIME_BEFORE_CLOSE来表示最大的等待秒数.
不设置或为0时，则会一直等待.
MAX_IDLE_TIME_BEFORE_CLOSE不会影响SCHEDULER_IDLE_BEFORE_CLOSE的使用.