Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix/handle ratelimit many crawlers #13

Merged
merged 1 commit into from
Jul 13, 2021

Conversation

geoheelias
Copy link
Contributor

No description provided.

@geoheelias geoheelias requested a review from puhoy July 13, 2021 22:14
@geoheelias geoheelias self-assigned this Jul 13, 2021
@geoheelias geoheelias changed the base branch from master to fix/github_retry_on_abuse July 13, 2021 22:15
]
}
"""
return list(map(lambda d: d["type"], errors))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved this to its own function even though its just a map, so we can have easier documentation/explanation for it without cluttering the main function

logger.debug(f"{error_types} - ratelimit was reached elsewhere - retry in {GITHUB_RATELIMIT_SLEEP}s")
time.sleep(GITHUB_RATELIMIT_SLEEP)
logger.debug(f"long ratelimit sleep over, retry query")
response = send_query()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so I have to look for this crappy error type, in that list, to know that something else consumed all our ratelimit, and then ill guess it happened just now and sleep for 1h before retrying - it sucks but theres no alternative when they dont give out the info after you reached the limit

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, we only retry once - after that we get a skipped block chunk (and hole in our data)

Base automatically changed from fix/github_retry_on_abuse to master July 13, 2021 22:23
@puhoy puhoy merged commit 86b37e2 into master Jul 13, 2021
@puhoy puhoy deleted the fix/handle_ratelimit_many_crawlers branch July 13, 2021 22:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants