Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/account pool metrics #271

Merged
merged 31 commits into from
Feb 11, 2020
Merged

Feature/account pool metrics #271

merged 31 commits into from
Feb 11, 2020

Conversation

joshmarsh
Copy link
Contributor

@joshmarsh joshmarsh commented Feb 6, 2020

Proposed changes

This PR adds a lambda that queries the accounts table for each type of account status (Ready, NotReady, Leased, Orphaned). It then published metrics to CloudWatch so that users can monitor the state of their account pool.

Added the following variables for configuration:

account_pool_metrics_toggle: Toggle the CloudWatch scheduled event, effectively turning on/off this feature
account_pool_metrics_collection_rate_expression: Set the rate at which the lambda will collect and publish metrics
orphaned_accounts_alarm_threshold: Set a threshold to alarm when there are too many orphaned accounts
ready_accounts_alarm_threshold Set a threshold to alarm when their are too few ready accounts
account_pool_metrics_widget_period: Set the period over which metrics are aggregated in the dashboard widget

Types of changes

  • Bugfix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Refactor (changes to code, which do not change application behavior)

Checklist

  • I have filled out this PR template
  • I have read the CONTRIBUTING doc
  • I have added automated tests that prove my fix is effective or that my feature works
  • I have added necessary documentation (README.md, inline comments, etc.)
  • I have updated the CHANGELOG.md under a ## next release, with a short summary of my changes

Relevant Links

Further comments

Screen Shot 2020-02-05 at 10 05 53 PM

@joshmarsh joshmarsh changed the title WIP: Feature/account pool metrics Feature/account pool metrics Feb 7, 2020
Copy link
Contributor

@marinatedpork marinatedpork left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very cool and useful. A few very minor changes requested. Other comments are mostly me reflecting. Can you please add a page of documentation for the monitoring and the dashboard? Based on a quick once over of our current docs, it does not look like the documentation exists for any of the dashboard work that has been done. Otherwise, this is awesome!

cmd/lambda/account_pool_metrics/main.go Outdated Show resolved Hide resolved
cmd/lambda/account_pool_metrics/main.go Outdated Show resolved Hide resolved
account_pool_metrics_count = var.account_pool_metrics_toggle == "true" ? 1 : 0
}

module "account_pool_metrics_lambda" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not trying to let "perfect be the enemy or good" here, but it would be nice if the lambda also utilized the count parameter. That would require some changes to the lambda module and also might create wonkiness for the deploy / build scripts, so this is not a blocker or a requirement here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, agreed. I tried doing that initially, but it would have required a lot of changes.

@@ -54,6 +55,7 @@ func (a *Account) queryAccounts(query *account.Account, keyName string, index st

res, err = a.DynamoDB.Query(queryInput)
if err != nil {
log.Println("err: ", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this meant to be a permanent addition to the logging or was this added for debugging? If it is meant to be permanent, please add some context to the logger statement (e.g. "Error querying XYZ")

variable "orphaned_accounts_alarm_threshold" {
type = string
description = "Alarm when number of orphaned accounts is greater than or equal to this threshold."
default = "1"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to tie the alarm to a percent of total accounts as opposed to a number? Wondering this because alerting on 1 orphaned account by default seems a little aggressive - albeit this feature is disabled by default so that's not too big of a deal.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could probably accomplish that by passing orphaned_accounts_alarm_threshold into the account_pool_metrics lambda as an environment variable, querying the cloudwatch api for this alarm, and then updating it to be the correct number based on the total number of accounts. However, that adds complexity and feels a bit wonky to me.

@joshmarsh joshmarsh merged commit bd8fe90 into Optum:master Feb 11, 2020
@joshmarsh joshmarsh deleted the feature/account-pool-metrics branch February 11, 2020 13:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants