-
Notifications
You must be signed in to change notification settings - Fork 84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/account pool metrics #271
Feature/account pool metrics #271
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very cool and useful. A few very minor changes requested. Other comments are mostly me reflecting. Can you please add a page of documentation for the monitoring and the dashboard? Based on a quick once over of our current docs, it does not look like the documentation exists for any of the dashboard work that has been done. Otherwise, this is awesome!
account_pool_metrics_count = var.account_pool_metrics_toggle == "true" ? 1 : 0 | ||
} | ||
|
||
module "account_pool_metrics_lambda" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not trying to let "perfect be the enemy or good" here, but it would be nice if the lambda also utilized the count
parameter. That would require some changes to the lambda module and also might create wonkiness for the deploy / build scripts, so this is not a blocker or a requirement here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, agreed. I tried doing that initially, but it would have required a lot of changes.
pkg/data/accounts.go
Outdated
@@ -54,6 +55,7 @@ func (a *Account) queryAccounts(query *account.Account, keyName string, index st | |||
|
|||
res, err = a.DynamoDB.Query(queryInput) | |||
if err != nil { | |||
log.Println("err: ", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this meant to be a permanent addition to the logging or was this added for debugging? If it is meant to be permanent, please add some context to the logger statement (e.g. "Error querying XYZ")
variable "orphaned_accounts_alarm_threshold" { | ||
type = string | ||
description = "Alarm when number of orphaned accounts is greater than or equal to this threshold." | ||
default = "1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to tie the alarm to a percent of total accounts as opposed to a number? Wondering this because alerting on 1 orphaned account by default seems a little aggressive - albeit this feature is disabled by default so that's not too big of a deal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could probably accomplish that by passing orphaned_accounts_alarm_threshold
into the account_pool_metrics
lambda as an environment variable, querying the cloudwatch api for this alarm, and then updating it to be the correct number based on the total number of accounts. However, that adds complexity and feels a bit wonky to me.
Proposed changes
This PR adds a lambda that queries the accounts table for each type of account status (Ready, NotReady, Leased, Orphaned). It then published metrics to CloudWatch so that users can monitor the state of their account pool.
Added the following variables for configuration:
account_pool_metrics_toggle: Toggle the CloudWatch scheduled event, effectively turning on/off this feature
account_pool_metrics_collection_rate_expression: Set the rate at which the lambda will collect and publish metrics
orphaned_accounts_alarm_threshold: Set a threshold to alarm when there are too many orphaned accounts
ready_accounts_alarm_threshold Set a threshold to alarm when their are too few ready accounts
account_pool_metrics_widget_period: Set the period over which metrics are aggregated in the dashboard widget
Types of changes
Checklist
README.md
, inline comments, etc.)CHANGELOG.md
under a## next
release, with a short summary of my changesRelevant Links
Further comments