Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

avoid emitting results on response.ok == False #5

Merged
merged 1 commit into from
Jul 1, 2021

Conversation

geoheelias
Copy link
Contributor

We sent back "error" strings instead of repos because we didnt check if response.ok was true on each API request.

@geoheelias geoheelias requested a review from puhoy July 1, 2021 15:51
# otherwise spam&sleep
sleep_s = reset_ts - time.time()
logger.info(f"ratelimit exceeded for {self}, sleeping for {sleep_s} seconds...")
time.sleep(sleep_s)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This allows the old default sleep-throttling along when/if headers are missing. Otherwise we react to specific gitlab headers.

@@ -42,6 +55,8 @@ def crawl(self, state: dict = None) -> Tuple[bool, List[dict], dict]:
)
try:
response = self.requests.get(self.request_url, params=params)
if not response.ok:
return False, [], state
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the bugfix for "error" repos sent to the indexer, as we didnt return we would just yield whatever came out of response.json().

@geoheelias geoheelias self-assigned this Jul 1, 2021
@puhoy puhoy merged commit 7651085 into master Jul 1, 2021
@puhoy puhoy deleted the feature/gitlab_ratelimit branch July 1, 2021 18:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants