Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using request.host in expressions for metrics is problematic at ingress gateway #18818

Closed
douglas-reid opened this issue Nov 9, 2019 · 14 comments · Fixed by #18821
Closed

Using request.host in expressions for metrics is problematic at ingress gateway #18818

douglas-reid opened this issue Nov 9, 2019 · 14 comments · Fixed by #18821
Assignees
Milestone

Comments

@douglas-reid
Copy link
Contributor

douglas-reid commented Nov 9, 2019

Bug description

In #16892, we added request.host to the expression for destination_service labels. This was added as a fallback for cases when destination.service.host would not be set (pass-thru, blackhole, NR).

In theory, this is a nice idea. However, this is problematic at ingress, as the Host header can be set to arbitrary values.

Example of sending such a request:

curl --verbose --header 'Host: %22%3E%3Cscript%3Ealert('Qualys_XSS_Joomla_2.5.3')%3C%2Fscript%3E' <gateway ip>/no-route

This results in metrics with a destination service label containing junk -- some of it potentially harmful.

Example from Prometheus:

image

This is problematic for a number of reasons:

  • allows arbitrary explosion of cardinality of metric, as bad actors could send an unending stream of requests with all different Host headers
  • pollution of the service variables in grafana dashboards
  • potential XSS, etc., when rendering metrics with bad request.host

Expected behavior
Arbitrary NR traffic at ingress should not be able to pollute istio generated metrics with junk destination_service labels.

Either we should set destination.service.host to be "BlackHole" or similar OR we should exclude request.host from the expression for destination_service in configuration for ingressgateway (or a third, better option that someone may have).

Steps to reproduce the bug

Install Istio, have a gateway for wildcard host on port 80. Have a virtual service with an exact match route and nothing else. send traffic to gw not matching the route with a host header with junk info. observe results.

Version (include the output of istioctl version --remote and kubectl version)
1.4.0-beta.2, but any version of istio with the linked PR is affected.

How was Istio installed?
helm template

Environment where bug was observed (cloud vendor, OS, etc)
GKE

@douglas-reid
Copy link
Contributor Author

@douglas-reid
Copy link
Contributor Author

Compounding this issue, our grafana dashboards render destination_service labels directly as variables for individual dashboards.

@douglas-reid
Copy link
Contributor Author

For the grafana issue, a short-term workaround is to replace all service templating as follows:

replace: label_values(destination_service)
with: query_result(sum(istio_requests_total{source_app!="istio-ingressgateway"}) by (destination_service) or sum(istio_tcp_connections_closed_total{source_app!="istio-ingressgateway"}) by (destination_service))
and add a regex of {destination_service="(.*)".*

I'll want someone to double-check that, however.

@Stono
Copy link
Contributor

Stono commented Nov 9, 2019

I fully understand the problem, however I can’t stress how useful this feature is for us identifying black hole traffic. As we don’t use ingress gateway (we use Nginx) this isnt much of a concern for us either as random external hosts won’t get to envoy.

I would ask that you at least make the feature optionally enabled so the operator can make the call.

@nrjpoddar
Copy link
Member

nrjpoddar commented Nov 9, 2019

@douglas-reid IIUC, the issue is only in ingress gateways when you have non-wildcard host matching in Gateways and VirtualService, and for any non-matching host traffic the blackhole cluster will be hit you can have a metric cardinality explosion due to random internet traffic host getting populated in destination service label.

It’s very easy to set destination service label to BlackHoleCluster only for ingress gateway but leave it as is for sidecars like you suggested. This means for any outbound traffic from the mesh sidecars we still capture the host header @Stono. I can that in today before 1.4 goes out.

Based on our conversation yesterday Doug, you were saying that we never capture inbound traffic metrics for ingress gateway, isn’t this inbound?

@howardjohn
Copy link
Member

I believe this is happening at ingress when traffic does not hit BlackHole cluster but is 404ing as well I will double check the config and add more details here later.

@nrjpoddar
Copy link
Member

You might be right, does ingress gateways even have a concept of BlackHoleCluster? Afaik ingress gateways just return a direct response of 404 for non matching traffic.

@mandarjog
Copy link
Contributor

Gateways do not have a black hole cluster. So these are real 404 NRs.

@mandarjog
Copy link
Contributor

For v1 Mixer this is a purely configuration issue.
@Stono you can set the Mixer rule to include the host header fallback. That way we can remove it from 1.4. Is that ok for you ?

@nrjpoddar
Copy link
Member

nrjpoddar commented Nov 9, 2019

@mandarjog I agree, we can update the default config to be this:

destination_service: destination.service.host | conditional(destination.service.name == "", "unknown", request.host)

This will make sure that the value is set to unknown in all cases destination.service.name is not explicitly set or is BlackHoleCluster/PassthroughCluster, which addresses ingressgateway concerns.

@mandarjog
Copy link
Contributor

Either that or use response_flag == NR

@nrjpoddar
Copy link
Member

Ok, I will send a PR soon after verifying configuration changes work and we can cherry-pick in release-1.3, 1.4 branches.

@Stono
Copy link
Contributor

Stono commented Nov 9, 2019

@mandarjog as long as there is a way to get the host header for BlackHoleCluster I'm not fussed about the implementation.

I (and multiple) people tweeted about this feature and its value - so please make sure you clearly document how to keep the capability for those who are, or want to use it!

@nrjpoddar
Copy link
Member

I (and multiple) people tweeted about this feature and its value - so please make sure you clearly document how to keep the capability for those who are, or want to use it!

Don't worry, you won't lose this functionality. :)

@nrjpoddar nrjpoddar self-assigned this Nov 9, 2019
nrjpoddar pushed a commit to nrjpoddar/istio that referenced this issue Nov 9, 2019
Updated the default metrics to only use `request.host` in `destination_service`
if the cluster is known including BlackHole/Passthrough, otherwise set it to
`unknown`.

This prevents random traffic via ingress gateway to report `request.host` in
telemetry which can cause cardinality explosion.

Fixes: istio#18818
istio-testing pushed a commit that referenced this issue Nov 9, 2019
Updated the default metrics to only use `request.host` in `destination_service`
if the cluster is known including BlackHole/Passthrough, otherwise set it to
`unknown`.

This prevents random traffic via ingress gateway to report `request.host` in
telemetry which can cause cardinality explosion.

Fixes: #18818
istio-testing pushed a commit to istio-testing/istio that referenced this issue Nov 9, 2019
Updated the default metrics to only use `request.host` in `destination_service`
if the cluster is known including BlackHole/Passthrough, otherwise set it to
`unknown`.

This prevents random traffic via ingress gateway to report `request.host` in
telemetry which can cause cardinality explosion.

Fixes: istio#18818
istio-testing pushed a commit to istio-testing/istio that referenced this issue Nov 9, 2019
Updated the default metrics to only use `request.host` in `destination_service`
if the cluster is known including BlackHole/Passthrough, otherwise set it to
`unknown`.

This prevents random traffic via ingress gateway to report `request.host` in
telemetry which can cause cardinality explosion.

Fixes: istio#18818
istio-testing added a commit that referenced this issue Nov 9, 2019
Updated the default metrics to only use `request.host` in `destination_service`
if the cluster is known including BlackHole/Passthrough, otherwise set it to
`unknown`.

This prevents random traffic via ingress gateway to report `request.host` in
telemetry which can cause cardinality explosion.

Fixes: #18818
istio-testing added a commit that referenced this issue Nov 11, 2019
Updated the default metrics to only use `request.host` in `destination_service`
if the cluster is known including BlackHole/Passthrough, otherwise set it to
`unknown`.

This prevents random traffic via ingress gateway to report `request.host` in
telemetry which can cause cardinality explosion.

Fixes: #18818
sdake pushed a commit to sdake/istio that referenced this issue Dec 1, 2019
Updated the default metrics to only use `request.host` in `destination_service`
if the cluster is known including BlackHole/Passthrough, otherwise set it to
`unknown`.

This prevents random traffic via ingress gateway to report `request.host` in
telemetry which can cause cardinality explosion.

Fixes: istio#18818
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants