Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add machine-id to all "local" indexer-requests & refactor indexer-request headers #17

Merged
merged 4 commits into from
Aug 6, 2021

Conversation

geoheelias
Copy link
Contributor

No description provided.

@geoheelias geoheelias requested a review from puhoy July 26, 2021 14:24
@geoheelias geoheelias self-assigned this Jul 26, 2021
.env.dist Outdated
HUBGREP_CRAWLERS_USER_AGENT_SUFFIX="@ https://hubgrep.io/about"
HUBGREP_CRAWLERS_MACHINE_ID=no-id
Copy link
Contributor Author

@geoheelias geoheelias Jul 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tried to do some fancy uuid auto-generating in docker-compose.yml instead of manually setting it - but I couldnt get it to run =(

maybe you know how to do it?

this should work on linux/osx as a bash one-liner:
od -x /dev/urandom | head -1 | awk '{OFS="-"; print $2$3,$4,$5,$6,$7$8$9}'

otherwise this requires a dependancy and exported into the envvar:
uuid=$(uuidgen)

both found here:
https://serverfault.com/questions/103359/how-to-create-a-uuid-in-bash

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but its probably better not to anyway, for whenever we restart stuff etc.

"X-Request-ID": crawler_uuid
"User-Agent": current_app.config["USER_AGENT"],
"X-Correlation-ID": crawler_uuid, # specific crawler
"Hubgrep-Crawler-Machine-ID": current_app.config["MACHINE_ID"] # shared by "local" crawlers (all hosters)
Copy link
Contributor Author

@geoheelias geoheelias Jul 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I misunderstood the "X-Request-ID" previously, and it only there to track the specific unique request (ofc.).

"X-Correlation-ID" is instead a header that is intended to be used across many requests to "follow the red thread" - this is our specific crawler instance, which we should use in debugging logs. Crawler uuid is still a bit broader than what the header intends as a "cross services transaction id" kind of thing, but I think it fits for us. Even if the same id comes up again and again.

As for a higher global id, I cant find any existing headers so I made our own "Hubgrep-Crawler-Machine-ID" which we will use to map api_keys from.

@@ -19,6 +20,7 @@

def _hoster_session_request(method, session, url, error_count=0, *args, **kwargs):
try:
session.headers.update({"X-Request-ID": uuid.uuid4().hex})
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update this header on each request to the indexer

@geoheelias geoheelias changed the base branch from master to fix/update_a_block_key July 26, 2021 16:00
@geoheelias geoheelias changed the base branch from fix/update_a_block_key to master July 26, 2021 16:01
platform_type = platform_data["type"]
api_url = platform_data["api_url"]
api_key = platform_data["api_key"]
api_key = platform_data.get("api_key", None)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, giving out an api_key is optional across all crawlers (but specific crawlers should throw if they require it)

This has to do with they way we assign api_keys, and that if you keep adding machines, they shouldnt share api_keys and thus once we run out of api_keys "None" will be used.

@puhoy puhoy merged commit b716654 into master Aug 6, 2021
@puhoy puhoy deleted the feature/machine_id branch August 6, 2021 11:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants