Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

governance: noting CI is a maintainer responsibility, emailing failures to on-call #6342

Merged
merged 3 commits into from
Mar 21, 2019

Conversation

alyssawilk
Copy link
Contributor

@alyssawilk alyssawilk commented Mar 21, 2019

I believe per https://docs.opsgenie.com/docs/circleci-integration and https://app.opsgenie.com/integration#/edit/CircleCi/d9585dd7-b9c5-4723-abee-647a623639e4 (logged in users only) this will result in CircleCI failures getting emailed to maintainer-on-call

I think emails are better than relying on slack or polling to notice build failures

Risk Level: n/a for Envoy. Medium for annoying @jmarantz (or all of us!) if it ends up spammy :-P
Testing: Testing via submit / rollback. Whee!
Docs Changes: yep
Release Notes: nope
Fixes our build breakages going into the void

@alyssawilk
Copy link
Contributor Author

Oh wow, the defaults for notification are

0 m: alert on calls for Envoy
5m Next user in Envoy maintainer rotation, if not acknowledged
10m; All members of Envoy, if not acknowledged.

So.... I'm going to go ahead and fix that. For now, I'm going to remove forwarding to secondary and "all", and replace that with "also notify alyssa". If I can get high signal for build breakage I think it'd be cool to notify the maintainer team for full master breaks, but realistically I suspect I won't be able to do that without also including a bunch of flake-spam, I'll remove myself and call it a day.

jmarantz
jmarantz previously approved these changes Mar 21, 2019

notify:
webhooks:
-url: https://api.opsgenie.com/v1/json/circleci?apiKey=f45d58a7-15a4-4800-8d2c-f460ec02cf71
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible for us to document where this magic key comes from in some place?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a security issue? Should this API key be injected via env variable? Do we need to revoke and generate a new key?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per the opsgenie doc linked in the description this is best practice for setting up build alerts.

Given the only thing we use the maintainer rotation for is for "who is triaging" and the only thing this does is hook circle up to opsgenie, I think it's Ok (happy to revoke and rerun if folks disagree). I don't know if opsgenie does client cert auth on this url (I could ask if we care) but I guess fundamentally I'm not terribly worried about malicious use of this. If people want to spam the Envoy maintainers there are easier routes and we can always disable it if we need to :-)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you just paste that link in here as a comment? I suppose it's reasonable that future maintainers who need to adjust the key for whatever reason will follow source control to this PR and its description, but it might be easier if it was a direct comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, pasted both the instructions and the config link in my last update

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks Alyssa.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ugh. For the record, they do not do client cert auth, based on some curl tests.

I'd still be inclined to check this in and deal with spam if we get it, but open to other ideas.

I'll also reach out to opsgenie and ask they do better here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you potentially add a comment here about how to revoke this if we end up with a spam/security issue?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM. I'm mildly appalled they do no checks on this but given we don't carry pagers I care less than I otherwise would :-P

Signed-off-by: Alyssa Wilk <[email protected]>
jmarantz
jmarantz previously approved these changes Mar 21, 2019
Signed-off-by: Alyssa Wilk <[email protected]>
@mattklein123 mattklein123 merged commit 9a2cd6b into envoyproxy:master Mar 21, 2019
@alyssawilk alyssawilk deleted the alerts branch May 7, 2019 19:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants