Auto Scaling

Background

For those who aren't familiar with Auto Scaling, it is an automation service Amazon provides as part of their cloud offering. It provides features to manage a running pool of servers, including the capability to replace failed instances and automatically grow and shrink the size of the pool. For a more thorough description, please see the Amazon Documentation.

Amazon Documentation

In case of Emergency (Turn it off)

Set min, desired and max to be the same for the group
- They should also be set to a value greater than the capacity you think you need
Disable scaling actions
- Command:

as-suspend-processes MyAutoScalingGroup
as-resume-processes MyAutoScalingGroup

Metrics Published by Amazon

See Amazon's documentation about the default metrics publish to CloudWatch

What are good metrics to publish?

Throughput

Requests-per-second (RPS) metics can be an important way to measure system performance. RPS is usually highly correlated with availability metrics: tomcat threads, apache workers, etc.

Latency

Time is also an important metric to monitor. When possible, avoid averages, in favor of percentiles. Specifically, 95th and 99th percentile.

Publishing directly

To publish metrics directly to CloudWatch, refer to the documentation for the Amazon CloudWatch CLI or programmatic interfaces.

Command line example:

mon-put-data -namespace "NFLX/TEST" -metric-name "Foo" -value 123

Using Servo to publish

Servo is an application monitoring library that allows you to register any fields in your Java code as custom metrics published to CloudWatch.

See the Servo documentation.

More Info

Refer to the wiki side bar for more documentation about auto scaling.

A Netflix Original Production
Tech Blog | Twitter @NetflixOSS | Jobs

Provide feedback

Saved searches

Use saved searches to filter your results more quickly