Add option to enforce strict monotonicy in metrics #141

chrschn · 2023-12-11T15:11:21Z

This change adds the new option force_monotonicy to metric configurations. It is intended for quasi monotonic sources, such as monotonic counters which reset when the sensor is restarted.

When this option is set to true, the source metric value is regularly written to disk. This allows us to detect and compensate counter resets even between restarts. When a reset is detected, the last value before the reset becomes the new offset, which is added to the metric value going forth. The result is a strictly monotonic time series, like an ever increasing counter.

My use-case is measuring the energy production and consumption with Shelly devices. I want to keep track of the total energy ever produced by solar panels or consumed by certain devices in my household. When those time series are truly monotonic, it is much simpler to produce delta graphs over long periods (months, years) in Prometheus.

hikhvar · 2023-12-14T22:01:32Z

First of all, thank you for the PR.
From my understanding with prometheus, you don't need that logic in the exporter. All prometheus queries for counters, will handle counter resets. See for example increase(). According to the docs:

Breaks in monotonicity (such as counter resets due to target restarts) are automatically adjusted for

Due to that handling in Prometheus server, I will not merge this. From an architecture point of view, I thing this is something the prometheus server is capable of, and already implemented. I would like this exporter to not write to disk regularly, as many use this exporter on devices like Raspberry Pis with SD-cards.

chrschn · 2023-12-15T18:52:23Z

First of all, thank you for the PR. From my understanding with prometheus, you don't need that logic in the exporter. All prometheus queries for counters, will handle counter resets. See for example increase(). According to the docs:

Breaks in monotonicity (such as counter resets due to target restarts) are automatically adjusted for

It's correct that Prometheus handles counter resets, but increase() doesn't work well for very long time series (months, years) as the ability to detect resets depends on the sample resolution, which becomes very coarse. By default, the maximum no. of data points per time series in Prometheus is 11,000. For a time period of 1 year, the max. resolution is ~48 minutes. This leaves wide gaps for Prometheus to miss resets.

Due to that handling in Prometheus server, I will not merge this. From an architecture point of view, I thing this is something the prometheus server is capable of, and already implemented. I would like this exporter to not write to disk regularly, as many use this exporter on devices like Raspberry Pis with SD-cards.

I took care to only write data when the force_monotonicy setting is used (defaults to off), so existing users would not notice the difference unless they enable this feature. We could add a warning to that option, if you think it would help.

Actually, I've already implemented the next step to support arithmetic expressions that reuses the disk storage, but I haven't sent out a PR yet (see my branch).

My use-case is measuring the energy produced by my solar panels. I want to be able to see the total energy ever produced over months and years, as well as how much energy was used in my house vs. drained to the grid. My energy meter reports the total energy (kWh) obtained from the grid and it reports the current power usage (W) as a negative value when the solar panels produce more energy than is used. However, it does not integrate the current power to the energy that was "lost" to the grid. I'm currently using a cascade of Prometheus recording rules with several labeling hacks to integrate the values over time:

  - name: aggregation:1m
    interval: 1m
    rules:
      # Solar energy lost to the grid over 1m [kWh].
      - record: energy_loss:1m
        expr: avg_over_time(power_out{power_source="solar"}[1m]) / 60000
        labels:
          power_source: solar
          power_sink: grid

  - name: aggregation:5m
    interval: 5m
    rules:
      # Solar energy lost to the grid over 5m, with additional labels for day, month, year [kWh].
      - record: daily:energy_loss:5m
        expr: label_join(
                label_replace(
                  label_replace(
                    label_replace(
                      sum_over_time(energy_loss:1m[5m:1m]) + ignoring(year, month, day) group_right
                        count_values without() ("year", year(timestamp(
                          count_values without() ("month", month(timestamp(
                            count_values without() ("day", day_of_month(timestamp(
                              sum_over_time(energy_loss:1m[5m:1m])
                            )))
                          )))
                        ))) * 0,
                      "day", "0$1", "day", "^([1-9])$"),
                    "month", "0$1", "month", "^([1-9])$"),
                  "yy", "$1", "year", "^..(..)$"),
                "date", "-", "year", "month", "day"
              )

This is good enough to produce graphs like the one below, but this technique does not scale beyond one year. The cleanest solution is to have truly monotonic counters without resets for energy production and usage. That way, I can always see the correct totals and compute a delta without having to take the sampling resolution into account.

chrschn · 2023-12-19T11:31:07Z

I rebased my change on top of the current master. I'd be happy to hear your thoughts on my previous response.

hikhvar · 2024-01-13T08:43:03Z

Hello,

sorry for the late response. Didn't had the capacity in the past to read your long and good response!
I can see your use case.

Let us integrate this, and improve on it, if users report problems.

This patch renames the ``received_messages`` metric to ``mqtt2prometheus_received_messages_total``, making it a bit more in line with conventional Prometheus metric naming. I also slightly adjusted the descriptions.

JSON needs to be upper case, otherwiese "could not setup a metric extractor {"error": "unsupported object format: json"} " is thrown.

This change adds the new option `force_monotonicy` to metric configurations. It is intended for almost-but-not-really monotinic sources, such as counters which reset when the sensor is restarted. When this option is set to `true`, the source metric value is regularly written to disk. This allows us to detect and compensate counter resets even between restarts. When a reset is detected, the last value before the reset becomes the new offset, which is added to the metric value going forth. The result is a strictly monotonic time series, like an ever increasing counter.

chrschn · 2024-02-05T21:16:55Z

Hey there. Sorry, now I was the one who was busy. I changed the code to make sure it works with Go 1.14. LMK if you have any comments.

chrschn · 2024-03-21T14:17:28Z

Friendly ping. :)

chrschn force-pushed the monotonic_gauge branch from 5a21fb7 to 9b4d0a7 Compare December 19, 2023 09:33

alexbakker and others added 12 commits February 5, 2024 21:04

Rename received_messages metric to be more conventional

6d695bd

This patch renames the ``received_messages`` metric to ``mqtt2prometheus_received_messages_total``, making it a bit more in line with conventional Prometheus metric naming. I also slightly adjusted the descriptions.

Update Readme.md

d562b92

JSON needs to be upper case, otherwiese "could not setup a metric extractor {"error": "unsupported object format: json"} " is thrown.

Handle json payload in metric_per_topic mode

9da7b1b

Update dependency golang to v1.21

1f018bf

Update dependency prom/prometheus to v2.48.1

a963576

Bump newest version to go 1.21 in tests

70de9d6

Bump to Go 1.21 in tests in relase

7419ffa

Fix goreleaser deprecation

6e9f1a2

Fix go releaser config

c071e92

Bump action versions

5c1917f

Restrict to Go 1.14 functionality

84ed960

chrschn force-pushed the monotonic_gauge branch from 9b4d0a7 to 84ed960 Compare February 5, 2024 21:11

Merge branch 'master' into monotonic_gauge

72289ee

hikhvar merged commit 66d4028 into hikhvar:master Mar 25, 2024

chrschn mentioned this pull request Mar 25, 2024

Expressions #143

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add option to enforce strict monotonicy in metrics #141

Add option to enforce strict monotonicy in metrics #141

chrschn commented Dec 11, 2023

hikhvar commented Dec 14, 2023

chrschn commented Dec 15, 2023

chrschn commented Dec 19, 2023

hikhvar commented Jan 13, 2024

chrschn commented Feb 5, 2024

chrschn commented Mar 21, 2024

Add option to enforce strict monotonicy in metrics #141

Add option to enforce strict monotonicy in metrics #141

Conversation

chrschn commented Dec 11, 2023

hikhvar commented Dec 14, 2023

chrschn commented Dec 15, 2023

chrschn commented Dec 19, 2023

hikhvar commented Jan 13, 2024

chrschn commented Feb 5, 2024

chrschn commented Mar 21, 2024