Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TLS config in TA receiver doesn't take effect #33370

Open
ItielOlenick opened this issue Jun 4, 2024 · 19 comments · May be fixed by #34035
Open

TLS config in TA receiver doesn't take effect #33370

ItielOlenick opened this issue Jun 4, 2024 · 19 comments · May be fixed by #34035
Labels
bug Something isn't working receiver/prometheus Prometheus receiver

Comments

@ItielOlenick
Copy link

Component(s)

receiver/prometheus

What happened?

Description

After #31449 was merged it is expected that the client will use the TLS setting to communicate with the TA.

When adding TLS config to the config.yaml file, the client is not sending the tls certs in the requests to the Target Allocator.

In practice the TA server (or mock of) complains that the client does not provide certificates in the requests.

Steps to Reproduce

Set up a TA (Or mock) with mTLS - forcing the client to provide certificates.

Setup the collector with the tls key in the config:

collector.yaml

receivers:
    prometheus:
        config: {}
        target_allocator:
            collector_id: ${POD_NAME}
            endpoint: https://tls-debug-service:443
            interval: 30s
            tls:
                ca_file: /tls/ca.crt
                cert_file: /tls/tls.crt
                key_file: /tls/tls.key

Expected Result

TLS certificates added to the request

Actual Result

Server complains that the client didn't provide a certificate

From the TA (or mock):
TA:

http: TLS handshake error from 10.42.0.201:40804: tls: client didn't provide a certificate

From mock using open_ssl:

4007B2B0CD7F0000:error:0A0000C7:SSL routines:tls_process_client_certificate:peer did not return a certificate:../ssl/statem/statem_srvr.c:3510:

Collector version

0.101.0

Environment information

Environment

Kubernetes

OpenTelemetry Collector configuration

exporters:
    debug:
        verbosity: detailed
receivers:
    prometheus:
        config: {}
        target_allocator:
            collector_id: ${POD_NAME}
            endpoint: https://tls-debug-service:443
            interval: 30s
            tls:
                ca_file: /tls/ca.crt
                cert_file: /tls/tls.crt
                key_file: /tls/tls.key
service:
    pipelines:
        metrics:
            exporters:
                - debug
            processors: []
            receivers:
                - prometheus

Log output

From the Collector:

2024-06-04T13:18:58.967Z info [email protected]/service.go:102 Setting up own telemetry...
2024-06-04T13:18:58.967Z info [email protected]/telemetry.go:103 Serving metrics {"address": ":8888", "level": "Normal"}
2024-06-04T13:18:58.967Z info [email protected]/exporter.go:275 Development component. May change in the future. {"kind": "exporter", "data_type": "metrics", "name": "debug"}
2024-06-04T13:18:58.968Z info [email protected]/service.go:169 Starting otelcol... {"Version": "0.101.0", "NumCPU": 12}
2024-06-04T13:18:58.968Z info extensions/extensions.go:34 Starting extensions...
2024-06-04T13:18:58.969Z info [email protected]/metrics_receiver.go:275 Starting discovery manager {"kind": "receiver", "name": "prometheus", "data_type": "metrics"}
2024-06-04T13:18:58.970Z info [email protected]/metrics_receiver.go:121 Starting target allocator discovery {"kind": "receiver", "name": "prometheus", "data_type": "metrics"}
2024-06-04T13:18:58.980Z info [email protected]/metrics_receiver.go:253 Scrape job added {"kind": "receiver", "name": "prometheus", "data_type": "metrics", "jobName": "serviceMonitor/default/nginx-servicemonitor/0"}
2024-06-04T13:18:58.980Z info [email protected]/service.go:195 Everything is ready. Begin running and processing data.
2024-06-04T13:18:58.980Z warn localhostgate/featuregate.go:63 The default endpoints for all servers in components will change to use localhost instead of 0.0.0.0 in a future version. Use the feature gate to preview the new default. {"feature gate ID": "component.UseLocalHostAsDefaultHost"}
2024-06-04T13:18:58.980Z info [email protected]/metrics_receiver.go:340 Starting scrape manager {"kind": "receiver", "name": "prometheus", "data_type": "metrics"}
2024-06-04T13:18:58.995Z error refresh/refresh.go:71 Unable to refresh target groups {"kind": "receiver", "name": "prometheus", "data_type": "metrics", "discovery": "http", "config": "serviceMonitor/default/nginx-servicemonitor/0", "err": "Get \"https://tls-debug-service:443/jobs/serviceMonitor%2Fdefault%2Fnginx-servicemonitor%2F0/targets?collector_id=collector-with-ta-collector-0\": remote error: tls: certificate required"}
github.com/prometheus/prometheus/discovery/refresh.(*Discovery).Run

Additional context

No response

Copy link
Contributor

github-actions bot commented Jun 4, 2024

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@swiatekm
Copy link
Contributor

swiatekm commented Jun 4, 2024

I honestly can't see how this can be an issue with the prometheus receiver. The use of confighttp in #31452 is incredibly basic, and there's even a test verifying that the TLS config does in fact get read. So it's either a bug in confighttp, or the test methodology is wrong somehow. @ItielOlenick are you using the standalone collector for your tests? Or the Operator?

@ItielOlenick
Copy link
Author

ItielOlenick commented Jun 4, 2024

@swiatekm-sumo I'm using both.

The TLS cert is in fact read from the config. When testing with a cert that doesn't exist it complains about the file not existing. I think it is somehow ignored. Could this be due to the fact that it is used for servers, and in this case the TA is the server and the receiver is a client?

I've tested with both a standalone TA, operator one and a mock server serving the same endpoints.
In all cases I used the same certs (both server and client). In all cases I am able to get a response using curl or openssl. From both withing the cluster and using port forwarding.

Tried everything I can think of.

@swiatekm
Copy link
Contributor

swiatekm commented Jun 4, 2024

It looks like it's loaded and added to the transport here: https://github.com/open-telemetry/opentelemetry-collector/blob/15faf6206bedc9d5a188f131fc5891904e29a34b/config/confighttp/confighttp.go#L122. Not sure what the problem could be. I'll try to do a minimal reproduction tomorrow using the otlp exporter.

@ItielOlenick
Copy link
Author

ItielOlenick commented Jun 4, 2024

It seems like the first call to /scrape_configs is OK:

info [email protected]/metrics_receiver.go:253 Scrape job added {"kind": "receiver", "name": "prometheus", "data_type": "metrics", "jobName": "serviceMonitor/default/nginx-servicemonitor/0"}

but then the call to the /jobs endpoint is the issue:

error refresh/refresh.go:90 Unable to refresh target groups {"kind": "receiver", "name": "prometheus", "data_type": "metrics", "discovery": "http", "config": "serviceMonitor/default/nginx-servicemonitor/0", "err": "Get \"https://tls-debug-service:443/jobs/serviceMonitor%2Fdefault%2Fnginx-servicemonitor%2F0/targets?collector_id=collector-with-ta-collector-0\": remote error: tls: certificate required"}
github.com/prometheus/prometheus/discovery/refresh.(*Discovery).Run
github.com/prometheus/[email protected]/discovery/refresh/refresh.go:90

@swiatekm
Copy link
Contributor

swiatekm commented Jun 4, 2024

It seems like the first call to /scrape_configs is OK:

info [email protected]/metrics_receiver.go:253 Scrape job added {"kind": "receiver", "name": "prometheus", "data_type": "metrics", "jobName": "serviceMonitor/default/nginx-servicemonitor/0"}

but then the call to the /jobs endpoint is the issue:

error refresh/refresh.go:90 Unable to refresh target groups {"kind": "receiver", "name": "prometheus", "data_type": "metrics", "discovery": "http", "config": "serviceMonitor/default/nginx-servicemonitor/0", "err": "Get \"https://tls-debug-service:443/jobs/serviceMonitor%2Fdefault%2Fnginx-servicemonitor%2F0/targets?collector_id=collector-with-ta-collector-0\": remote error: tls: certificate required"}
github.com/prometheus/prometheus/discovery/refresh.(*Discovery).Run
github.com/prometheus/[email protected]/discovery/refresh/refresh.go:90

That makes sense, the client config only applies to calls to /scrape_configs. The calls to /jobs are done by Prometheus' HTTP Service Discovery, and there we don't configure TLS.

In that case we have two simple solutions: we can either get targets over plain http even when TLS is enabled, or we can configure HTTP SD config to use TLS as well. The latter seems a bit hacky in that we probably should make an effort to configure it the same way as the confighttp client is configured, which may not be so obvious.

@ItielOlenick
Copy link
Author

ItielOlenick commented Jun 4, 2024

First one sounds easier but the second sounds more complete.
Implementing the first suggestion would mean providing the regular non secure endpoint as well I assume.
At any rate, now that we know the source of the issue I'll move forward with open-telemetry/opentelemetry-operator#1669

@Frapschen Frapschen removed the needs triage New item requiring triage label Jun 5, 2024
@ItielOlenick
Copy link
Author

@swiatekm-sumo looking over this again, couldn't we add the TA client tls config we provided to the httpSD?

httpSD.HTTPClientConfig.TLSConfig has a TLS config similar to what was implemented in #31449

@ItielOlenick
Copy link
Author

Tested with

httpSD.HTTPClientConfig.TLSConfig.CAFile = allocConf.TLSSetting.CAFile
httpSD.HTTPClientConfig.TLSConfig.CertFile = allocConf.TLSSetting.CertFile
httpSD.HTTPClientConfig.TLSConfig.KeyFile = allocConf.TLSSetting.KeyFile

Works.

@swiatekm
Copy link
Contributor

swiatekm commented Jun 5, 2024

Yeah, that does work. My point was more that if we start applying settings from confighttp to httpSD, then we should try to do it as comprehensively as possible, or at least document that only the TLS parts carry over.

@ItielOlenick
Copy link
Author

Yeah, that does work. My point was more that if we start applying settings from confighttp to httpSD, then we should try to do it as comprehensively as possible, or at least document that only the TLS parts carry over.

I'd like to work on this so we can move forward with open-telemetry/opentelemetry-operator#3015

@swiatekm
Copy link
Contributor

swiatekm commented Jul 6, 2024

@dashpole @Aneurysm9 are you ok with this approach?

@dashpole
Copy link
Contributor

dashpole commented Jul 8, 2024

Can you just configure http_sd_config's TLS settings from the operator?

https://prometheus.io/docs/prometheus/latest/configuration/configuration/#http_sd_config

@ItielOlenick
Copy link
Author

@dashpole as is
httpSD.HTTPClientConfig.TLSConfig = allocConf.TLSSetting ?

If not I'm not sure how you intend the collector reading and applying this setting

@dashpole
Copy link
Contributor

dashpole commented Jul 8, 2024

Sorry, nevermind. I agree with #33370 (comment) that we should ideally translate as much of httpsd as possible.

@ItielOlenick
Copy link
Author

Ok, makes sense. I'll draft something up.

@ItielOlenick
Copy link
Author

Just to clarify, we are talking about using allocConf's http related config in the httpSD.HTTPClientConfig?

@dashpole
Copy link
Contributor

dashpole commented Jul 8, 2024

yes.

@ItielOlenick
Copy link
Author

It seems like the only relevant fields for httpSD to be translated from the allocConf's config are in fact TLSSetting and maybe ProxyURL

Am i missing something?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment