Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add statsdreceiver skeleton #566

Merged
merged 21 commits into from
Aug 19, 2020
Merged

Add statsdreceiver skeleton #566

merged 21 commits into from
Aug 19, 2020

Conversation

sonofachamp
Copy link
Contributor

Description:

Adding the scaffolding and some basic starting parsing for a statsdreceiver plugin. It is functional, but incomplete. I'm not entirely sure what to do about aggregation yet and whether it should be included in the receiver plugin or deferred to a processor plugin, so I've started with raw mappings.

Link to tracking Issue:
#290

Testing:
There are some basic unit tests and I did some manual testing writing out ingested metrics to a file locally.

Documentation:
For now I added some basic details in the README that outlines the configuration options and supported message formats.

@sonofachamp sonofachamp requested a review from a team as a code owner July 27, 2020 16:18
@project-bot project-bot bot added this to In progress in Collector Jul 27, 2020
@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Jul 27, 2020

CLA Check
The committers are authorized under a signed CLA.

@codecov
Copy link

codecov bot commented Jul 27, 2020

Codecov Report

Merging #566 into master will increase coverage by 0.10%.
The diff coverage is 91.54%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #566      +/-   ##
==========================================
+ Coverage   87.57%   87.68%   +0.10%     
==========================================
  Files         211      217       +6     
  Lines       11488    11693     +205     
==========================================
+ Hits        10061    10253     +192     
- Misses       1087     1098      +11     
- Partials      340      342       +2     
Flag Coverage Δ
#integration 69.55% <ø> (ø)
#unit 87.49% <91.54%> (+0.11%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
receiver/statsdreceiver/transport/udp_server.go 82.45% <82.45%> (ø)
receiver/statsdreceiver/reporter.go 86.84% <86.84%> (ø)
receiver/statsdreceiver/transport/mock_reporter.go 91.66% <91.66%> (ø)
receiver/statsdreceiver/protocol/statsd_parser.go 95.74% <95.74%> (ø)
receiver/statsdreceiver/factory.go 100.00% <100.00%> (ø)
receiver/statsdreceiver/receiver.go 100.00% <100.00%> (ø)
processor/resourcedetectionprocessor/factory.go 91.11% <0.00%> (-0.89%) ⬇️
extension/observer/hostobserver/factory.go 100.00% <0.00%> (ø)
... and 5 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 19d53ca...0d26db0. Read the comment docs.


switch metricType {
case "c":
i, err := strconv.ParseInt(metricValue, 10, 64)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does statsd restrict counter values to only ints?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good question, I'm not actually sure. I can't find anything that definitively states only ints, so I must have assumed this from all the examples showing ints. I'll expand this to support both.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to influxdata/telegraf#556 at least consul emits floats for counter.

Copy link
Contributor Author

@sonofachamp sonofachamp Jul 29, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like in that scenario it's still being cast to an int64, so it's not exactly supporting a float counter, but is able to gracefully handle it. Do you think we should do the same here or look to support Point_DoubleValue metrics?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this should support integers and floating points using either the Int64 or Double data point representation.

Copy link
Contributor

@jmacd jmacd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really nice. Great work! 💯

I've written a few suggestions, but all of them can be treated as TODOs for a future PR.

Comment on lines 29 to 43
General format is:

`<name>:<value>|<type>|@<sample-rate>|#<tag1-key>:<tag1-value>,<tag2-k/v>`

### Counter

`<name>:<value>|c`

<!-- ### Gauge

`<name>:<value>|g`

### Timer/Histogram

`<name>:<value>|<ms/h>|@<sample-rate>` -->
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The @sample-rate parameter won't have a place to go in the protocol yet, but we can leave a TODO about how to handle it.

These Timer/Histogram events should be translated the same way a ValueRecorder event from the OTel API would be, and we are meant to multiply the Count of this event by 1/sample-rate. In open-telemetry/opentelemetry-proto#159 we have discussed a potential future for OTLP exemplars that include this inverse-of-sample-rate value as well, which we termed sample-count. In other words, I like the idea of sample-rate, but I prefer to think in terms of its inverse; sample-count is a floating-point > 1, whereas sample-rate is a floating-point > 0 and < 1. (It's one less byte when sample-rate is the inverse of an integer, due to the ., which is a common case for simple probability sampling.)

What interested me the most here was that you omitted @sample-rate from Counter and Gauge events. I believe the @sample-rate may be used with Counter and Gauge events as well as Timer/Histogram events. In the PR linked above, any raw or exemplar value could include an optional sample-count > 1 to indicate that it was probabilistically sampled. There are not many (if any) definitive documents about the statsd format across the internet, so this README will be a great resource. (I think I'm asking that you add @sample-rate to the Counter and Gauge syntax above, otherwise to be proven wrong and that there exists a StatsD specification 😀 .)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you suggesting that we should expect @<sample-count> instead @<sample-rate> on incoming messages, or when processing incoming @<sample-rate> to translate it into sample-count? I expect sample-rate on incoming messages currently as that seems to be standard.

I will update the Counter and Gauge examples to include sample rates 👍

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I expect @<sample-rate>, I was just explaining my thoughts for how to get this information into OTLP at a later date.


metricType := parts[1]

// TODO: add sample rate and tag parsing
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, let's focus on tag parsing first. 😁

Copy link
Contributor

@keitwb keitwb Aug 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add to this the ability to generate OTel labels from metric name patterns. For example, when monitoring AWS AppMesh (and the underlying Envoy instances), you will encounter metric names like

cluster.cds_egress_ecommerce-demo-mesh_gateway-vn_tcp_8080.upstream_rq_3xx:8|c

Where the metric name is of the form:

cluster.cds_{traffic}_{mesh}_{service}-vn_{port}.{action}

Ideally this would be an OTel metric with a name like cluster_cds_{action} with the labels being all the other pattern groups.

We have seen this not infrequently with other statsd services as well.

I looked at the Metric Transform Processor proposal but didn't see anything that would let you derive labels from a metric name pattern, but perhaps that should be part of that processor. Either way, I think this type of transformation is essential for avoiding metric name cardinality explosion and better usability of metrics.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to see if we can incorporate logic and configuration from https://github.com/prometheus/statsd_exporter for this task. Hopefully we can think of this as a separate processor stage to be combined with the statsdreceiver, and I would still prioritize parsing DogStatsd-style tags.


switch metricType {
case "c":
i, err := strconv.ParseInt(metricValue, 10, 64)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this should support integers and floating points using either the Int64 or Double data point representation.

receiver/statsdreceiver/receiver.go Outdated Show resolved Hide resolved
@sonofachamp
Copy link
Contributor Author

@bogdandrutu can we merge this? I've added TODOs for feedback from @jmacd and everything else has been addressed for now I believe.

receiver/statsdreceiver/factory.go Outdated Show resolved Hide resolved
Collector automation moved this from In progress to Review in progress Aug 19, 2020
@sonofachamp
Copy link
Contributor Author

It seems the Lint step failed due to promtheusexecreceiver unrelated to these changes:

../../Makefile.Common:110: recipe for target 'lint' failed
make[1]: *** [lint] Killed
make[1]: Leaving directory '/home/circleci/project/receiver/prometheusexecreceiver'
Makefile:91: recipe for target 'for-all-target-./receiver/prometheusexecreceiver' failed
make: *** [for-all-target-./receiver/prometheusexecreceiver] Error 2
make: *** Waiting for unfinished jobs....

Is there a way to kick off the CircleCI build manually, I don't guess I have permission to do so.

Collector automation moved this from Review in progress to Reviewer approved Aug 19, 2020
@bogdandrutu bogdandrutu merged commit 2559612 into open-telemetry:master Aug 19, 2020
Collector automation moved this from Reviewer approved to Done Aug 19, 2020
@sonofachamp sonofachamp deleted the statsd branch August 19, 2020 20:50
bogdandrutu pushed a commit that referenced this pull request Aug 25, 2020
dyladan referenced this pull request in dynatrace-oss-contrib/opentelemetry-collector-contrib Jan 29, 2021
* Update README.md

* Update extension/README.md

Co-Authored-By: Tigran Najaryan <[email protected]>

* Update extension/README.md

Co-Authored-By: Tigran Najaryan <[email protected]>

* Update extension/README.md

Co-Authored-By: Tigran Najaryan <[email protected]>

Co-authored-by: Tigran Najaryan <[email protected]>
codeboten pushed a commit that referenced this pull request Nov 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Collector
  
Done
Development

Successfully merging this pull request may close these issues.

None yet

6 participants