Skip to content

Latest commit

 

History

History
308 lines (217 loc) · 15.2 KB

hawkular_metrics.adoc

File metadata and controls

308 lines (217 loc) · 15.2 KB

Accessing Metrics Using Hawkular Metrics

If you wish to access and manage metrics directly, you can do so via the Hawkular Metrics REST API.

The Hawkular Metrics documentation covers how to use the API, but there are a few differences when dealing with the version of Hawkular Metrics configured for use on OpenShift.

Alerting is now part of the Hawkular Metrics distribution, but it is currently lacking any integration with other OpenShift services and has not UI elements exposed in the console. If you wish to directly access the alerting API endpoints it is possible by following the Hawkular Metrics Alerting documentation.

Note

By default only read access is enabled for accessing the Hawkular Metrics REST API.

To enable write access, you will need to add a Hawkular admin cluster policy and grant your user the hawkular-metrics-admin role. This will grant the user full access to write or modify any metrics or alerts for your project.

To add the Hawkular cluster policy:

$ oc create -f metrics-cluster-role.yaml

To grant the hawkular-admin role to a user:

$ oadm policy add-role-to-user hawkular-metrics-admin ${USER}

To grant the hawkular-admin cluster role to a service account (e.g. for management):

$ oadm policy add-cluster-role-to-user hawkular-metrics-admin ${SERVICEACCOUNT}

Note that typically the service account used for management is system:serviceaccount:management-infra:management-admin.

Hawkular Tenants & OpenShift Projects

Hawkular Metrics is a multi-tenanted application. The way its been configured is that a project in OpenShift corresponds to a tenant in Hawkular Metrics. So if you are dealing with metrics in the awesome project, these would correspond to a tenant named awesome in Hawkular Metrics.

There is one special _system tenant in Hawkular metrics which corresponds to system or node level metrics. This is where things like the cpu usage of the node would be stored.

The tenant is sent to the Hawkular Metrics server via the Hawkular-Tenant header.

Authorization

Hawkular Metrics accepts a bearer token from the client and verifies that token with the OpenShift server using a SubjectAccessReview. If the user has proper read privileges for the project, they will be allowed to read the metrics for that project.

For the _system tenant, the user requesting to read from this tenant must have cluster-reader permission.

The bearer token is sent to the Hawkular Metrics server via the Authorization header.

Simple Examples

The following example assumes that the Hawkular Metrics route hostname is set to hawkular-metrics.example.com and that we are dealing with metrics from a project named test. Retrieval of the bearer token is outside of the scope of these examples. These just represent some very simple examples on how to query data directly from Hawkular Metrics. It is by no means an exhaustive list. For more detailed information please see the Hawkular Metrics documentation.

Accessing all metrics for a Project

If you wish to get a list of all metrics on available under the project test you can perform the following command

$ curl -H "Authorization: Bearer XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
       -H "Hawkular-Tenant: test"
       -X GET https://hawkular-metrics.example.com/hawkular/metrics/metrics | python -m json.tool

This will return a list of metrics in the following format:

[
   ...
   },
   {
        "id": "hawkular-cassandra-1/7186d9dc-9d30-11e5-90d9-3c970e88a56f/cpu/limit",
        "tags": {
            "container_base_image": "openshift/origin-metrics-cassandra:latest",
            "container_base_image_description": "User-defined image name that is run inside the container",
            "container_name": "hawkular-cassandra-1",
            "container_name_description": "User-provided name of the container or full container name for system containers",
            "descriptor_name": "cpu/limit",
            "group_id": "hawkular-cassandra-1/cpu/limit",
            "host_id": "192.168.122.1",
            "host_id_description": "Identifier specific to a host. Set by cloud provider or user",
            "hostname": "192.168.122.1",
            "hostname_description": "Hostname where the container ran",
            "labels": "...",
            "labels_description": "Comma-separated list of user-provided labels",
            "namespace_id": "fd5c9c31-8750-11e5-8b09-3c970e88a56f",
            "namespace_id_description": "The UID of namespace of the pod",
            "pod_id": "7186d9dc-9d30-11e5-90d9-3c970e88a56f",
            "pod_id_description": "The unique ID of the pod",
            "pod_name": "hawkular-cassandra-1-yoe3h",
            "pod_name_description": "The name of the pod",
            "pod_namespace": "openshift-infra",
            "pod_namespace_description": "The namespace of the pod",
            "resource_id_description": "Identifier(s) specific to a metric"
        },
        "tenantId": "openshift-infra",
        "type": "gauge"
    },
    {
  ...
]

Accessing Node Metrics

As mention previously there is a special tenant called _system which is used to store the metrics coming from the node itself and not coming from a particular container.

The Hawkular API access for the _system tenant is exactly the same as for any of the other metrics access. Access to this tenant requires cluster-reader or cluster-admin level access.

For instance, taking our example above, to get all the metrics from the _system tenant, you would just need to change the Hawkular-Tenant to _system:

$ curl -H "Authorization: Bearer XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
       -H "Hawkular-Tenant: _system"
       -X GET https://hawkular-metrics.example.com/hawkular/metrics/metrics | python -m json.tool

The system level metrics actually contain a few more metrics than those gathered by the container. Specifically it includes network and disk usage metrics.

The following command will list the ids for all the machine level metrics:

$ curl -H "Authorization: Bearer XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
       -H "Hawkular-Tenant: _system"
       -X GET https://hawkular-metrics.example.com/hawkular/metrics/metrics | python -m json.tool | grep -i \"id\" | grep -i machine

For more information on what each of these metrics contains, please see the Heapster schema page.

Querying based on tag

You can further query down the result that you are looking for by querying based on tags.

Accessing all the cpu/usage metrics

$ curl -H "Authorization: Bearer XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
       -H "Hawkular-Tenant: test"
       -X GET https://hawkular-metrics.example.com/hawkular/metrics/metrics?tags=descriptor_name:cpu/usage | python -m json.tool

Accessing all the cpu/usage metrics in pod named myPod

$ curl -H "Authorization: Bearer XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
       -H "Hawkular-Tenant: test"
       -X GET https://hawkular-metrics.example.com/hawkular/metrics/metrics?tags=descriptor_name:cpu/usage,pod_name:myPod  | python -m json.tool

Regular Expressions: Accessing all pods where the container_base_image contains test

Regular expressions can also be used in tag queries. The following example will return all metrics where the container_base_image contains test:

$ curl -H "Authorization: Bearer XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
       -H "Hawkular-Tenant: test"
       -X GET https://hawkular-metrics.example.com/hawkular/metrics/metrics?tags=container_base_image:.*test.*  | python -m json.tool

Regular Expressions: Accessing all pods where the container_base_image start with test/

$ curl -H "Authorization: Bearer XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
       -H "Hawkular-Tenant: test"
       -X GET https://hawkular-metrics.example.com/hawkular/metrics/metrics?tags=container_base_image:test/.*  | python -m json.tool

Accessing A Specific Metric

From the querying results above, you will notice that each metric contains an id value. You can use this value to directly access the metric itself and the data it contains.

Accessing the counter metric with id 'hawkular-cassandra-1/7186d9dc-9d30-11e5-90d9-3c970e88a56f/cpu/usage'

$ curl -H "Authorization: Bearer XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
       -H "Hawkular-Tenant: test"
       -X GET https://hawkular-metrics.example.com/hawkular/metrics/counters/hawkular-cassandra-1%2F7186d9dc-9d30-11e5-90d9-3c970e88a56f%2Fcpu%2Fusage  | python -m json.tool

Accessing the metric data for a counter metric with id 'hawkular-cassandra-1/7186d9dc-9d30-11e5-90d9-3c970e88a56f/cpu/usage' The following command will return the data for the metric for the last 10 minutes, placed into 5 buckets of 2 minutes each.

Note: date -d -10minutes +%s%3N will return a start time 10 minutes ago in milliseconds

$ curl -H "Authorization: Bearer XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
       -H "Hawkular-Tenant: test"
       -X GET https://hawkular-metrics.example.com/hawkular/metrics/counters/hawkular-cassandra-1%2F7186d9dc-9d30-11e5-90d9-3c970e88a56f%2Fcpu%2Fusage/data?buckets=5\&start=`date -d -10minutes +%s%3N`  | python -m json.tool

Accessing the metric rate data for a counter metric with id 'hawkular-cassandra-1/7186d9dc-9d30-11e5-90d9-3c970e88a56f/cpu/usage' The following command is the same as the previous one, but it return rate data instead of the raw data. Where the rate data is a delta between the previous values and not the absolute value.

This is useful for graphing cpu usage data as it gives you the usage between two points in time and not the absolute usage since the start of the container.

$ curl -H "Authorization: Bearer XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
       -H "Hawkular-Tenant: test"
       -X GET https://hawkular-metrics.example.com/hawkular/metrics/counters/hawkular-cassandra-1%2F7186d9dc-9d30-11e5-90d9-3c970e88a56f%2Fcpu%2Fusage/rate?buckets=5\&start=`date -d -10minutes +%s%3N`  | python -m json.tool

Consolidating Metric Data Across Multiple Containers

You may want to consolidate metric data across various individual metrics.

Determining the average CPU usage across multiple pods

For the following example, we want to determine what the average cpu usage is for all containers within the 'openshift-infra' project.

$ curl -H "Authorization: Bearer XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
       -H "Hawkular-Tenant: test"
       -X GET https://hawkular-metrics.example.com/hawkular/metrics/counters/data?tags=descriptor_name:cpu/usage,pod_namespace:openshift-infra\&buckets=3\&start=`date -d -10minutes +%s%3N`  | python -m json.tool

Get the CPU usage for all containers in a pod

Metric data is stored per individual containers, but you may want to get metric data based on pods instead of containers.

The following example shows how to get the cpu/usage metric for all containers within a pod named myPod.

Note that since we are looking for the overall usage of a pod, and not just the average usage, then we cannot use something like previous example. For this we need to use the stacked option which will perform individual queries on the tags requested and then add the resulting buckets together.

So if the tag query matches two metrics and the average value for the bucket of the first metric is 5 and the average value of the second bucket is 10, then with stacked=true the bucket returned will be 10.

$ curl -H "Authorization: Bearer XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
       -H "Hawkular-Tenant: test"
       -X GET https://hawkular-metrics.example.com/hawkular/metrics/counters/data?tags=descriptor_name:cpu/usage,pod_name:myPod\&stacked=true\&buckets=3\&start=`date -d -10minutes +%s%3N`  | python -m json.tool

Calculating Percentage CPU Usage

From the Heapster schema, what the cpu/usage stores is the amount of time in nanoseconds that a core of the CPU has been in use since the container has started.

With just this raw data, it is not intuitive what this data really means or what you can do with it.

The following examples will take this data, along with the uptime and cpu/limit metrics and transform it into something more consumable.

Calculating CPU Core Percentage

A more intuitive measurement would be to calculate the CPU core usage.

Note

The calculations are based on the percentage of use for a single CPU core, therefor machine with multiple cores may report values higher than 100%.

We know from the cpu/usage metric how much of the CPU has been used in nanoseconds, but we also have access to the uptime metric which specifies how long the container has been running in milliseconds.

From this it is just a simple calculation to determine the percentage of a CPU core used

% core = `cpu/usage` / ( `uptime` * 1000000 )

The steps involved to this is up to the client to determine. Hawkular Metrics does not currently support performing this type of calculation directly. The client will have to fetch the metric values from Hawkular Metrics and then perform the calculations itself.

To determine the CPU usage between a particular start and stop time requires the following steps:

  • usageStart = the cpu/usage at the start time

  • usageEnd = the cpu/usage at the end time

  • uptimeStart = fetch the uptime at the start time

  • uptimeEnd = the uptime at the end time

core_percentage = ( $usageEnd - $usageStart ) / ( ($uptimeEnd - $uptimeStart) * 1000000 )

If you want to do the calculation for the total since the container has been running, then you will only need to fetch the cpu/usage and uptime at the end time. You do not need to fetch anything at the start time (since at the start both of the values will be 0).

Important

When a container is restarted, its cpu/usage and uptime will go back to being zero. These values are reset and do not continue counting from where they were previously.

Calculating CPU Millicore Percentage

If you want to convert percent of cores used into a millicore used metric instead you just need to multiple the percentage by 1000:

millicore_percentage = core_percentage * 1000

Calculating Percentage Usage Based on Limit

It is possible to set the limit on a container for both the cpu and memory usage. For limiting based on CPU, you specify the maximum millicores that a container is allowed to use.

For example, the following will startup the ruby-hello-world container and limits its usage CPU usage to 300 millicores:

apiVersion: "v1"
kind: "Pod"
metadata:
  name: test
spec:
  containers:
  - image: openshift/ruby-hello-world
    name: test
    resources:
      limits:
        cpu: 300m

The cpu/limit metric returns the cpu limit for that container in millicores.

From the previous step we showed how to calculate millicores used. To get the percentage used, we then just need to take that value and divide it by the cpu/limit value.

limit_percentage = ( % millicores used ) / ( 'cpu/limit')