Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Customizing SPIFFE ID format if using an external SPIFFE-compliant SDS should be supported #43105

Open
bleggett opened this issue Feb 2, 2023 · 41 comments
Assignees
Labels
area/environments area/security area/user experience kind/docs kind/enhancement lifecycle/staleproof Indicates a PR or issue has been deemed to be immune from becoming stale and/or automatically closed

Comments

@bleggett
Copy link
Contributor

bleggett commented Feb 2, 2023

Summary

  1. When using the default Istio SDS, the current SPIFFE ID format should be the default
  2. If using an alternative SPIFFE-compliant SDS, using an alternative SPIFFE ID format should be allowed without having to resort to DestRule hacks
  3. Other parts of Istio should actively avoid introducing unnecessary implicit assumptions about the SPIFFE ID format and what is minting them to avoid creating more de-facto restrictions around their use, format, and the level of attestation a given user-supplied SPIFFE-compliant SDS server is actually configured for.

Detail

Currently, Istio uses a nonstandard variant of the SPIFFE ID spec, that mandates a SPIFFE ID format in the URI SAN field of the x509 workload certs:

spiffe:https://<trust_domain>/ns/<workload_namespace>/sa/<workload_service_account>

This means that workload certs minted by the default Istio SDS are indistinguishable - if I have 5 pods under the same service account, they share the same credentials, even if they may have different containers, run on different nodes, etc etc.

That is because the default Istio SDS is simplistic, does no granular workload identity attestation, and merely passes through trust and workload identity to K8S service accounts, which is Good Enough Most Of The Time.

Now that Istio supports replacing the default SDS provider with alternative SPIFFE-compliant SDS servers, such as SPIRE, this restriction makes less sense - the SDS server does (and should) control the format of the SPIFFE ID, and the granularity of the workload identity - for instance, if I use SPIRE with Istio and want to do workload attestation beyond just the service account level, I can easily do that today, and the SPIFFE ID format is defined with SPIRE, not Istio.

In fact, it is perfectly possible to do this today - I can integrate SPIRE with Istio as per our current docs, and configure SPIRE to mint SPIFFE IDs in a non-Istio-standard format, appending more granularity to the SPIFFE identifier to suit the level of attestation granularity my SPIFFE authority is actually engaging in:

spiffe:https://<trust_domain>/ns/<workload_namespace>/sa/<workload_service_account>/nodeid/<node_id/wl/<workload_name> - for instance

This works just fine with Istio, with the following exception - SPIFFE SAN validation is a hardcoded Envoy config that requires an exact match on spiffe:https://<trust_domain>/ns/<workload_namespace>/sa/<workload_service_account> - even though other forms of matching for SANs are supported by Envoy, we do not support them or expose them as configurable options.

This can be worked around with a DestinationRule such as the following:

# TODO destination rules need to be created for any SPIFFE IDs that don't follow the
# format that Istio expects (ns/NAMESPACE/sa/TARGET_POD_SVC_ACCOUNT)
# because ATM Istio defaults to clientside SAN checks that assume that SPIFFE ID format
# and this is not currently configurable
#
# Additionally, since DestinationRules override Istio's "default automTLS" settings, we need `mode: ISTIO_MUTUAL`
# in each DestRule to tell Istio that even though we have a custom destination config, we still want mTLS.
---
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
  name: {{ .serviceName }}-custom-spire-destrule
spec:
  host: {{ .serviceName }}
  trafficPolicy:
    tls:
      mode: ISTIO_MUTUAL
      subjectAltNames:
      - spiffe:https://example.org/ns/{{ $.Release.Namespace }}/sa/{{ .serviceName }}/wl/foo

Once you do this, SPIFFE IDs can be constructed with whatever level of granularity you desire, and workload certs can be distinguishable by consumers at the level of attestation that is actually performed by the SDS, rather than the level of attestation that Istio's default SDS performs.

There has been resistance to changing the default SPIFFE ID format due to back compat with existing customer rules that also hardcode SPIFFE IDs in the format that the default Istio SDS emits, and that's reasonable - but given that we support pluggable SPIFFE-compliant SDS implementations there is no good reason why Istio itself should forbid or otherwise prevent customers from using an alternate SDS from using more granular SPIFFE IDs than the default.

Especially since this works fine today with a simple DestinationRule tweak, indicating that the problem is a simple set of currently-unconfigurable defaults, and not a systemic obstacle.

Frankly, outside of maybe requiring that a SPIFFE ID have at a minimum several expected parsable fields in it so Istio itself can extract the information it needs from SPIFFE IDs (/ns/<namespace> and /sa/<serviceaccount>/), it isn't really Istio's business what the SPIFFE ID format is - the SPIFFE ID format is and should be owned by the SPIRE-compliant SDS instance, and we support more than one SPIRE-compliant SDS instance. We just make bad assumptions elsewhere in the code that force those compliant instances to hew exclusively to the SPIFFE ID format our default SDS emits, which is an unnecessary restriction.

Affected product area (please put an X in all that apply)

[x] Ambient
[x] Docs
[x] Installation
[ ] Networking
[ ] Performance and Scalability
[ ] Extensions and Telemetry
[x] Security
[ ] Test and Release
[x] User Experience
[ ] Developer Infrastructure

Affected features (please put an X in all that apply)

[ ] Multi Cluster
[ ] Virtual Machine
[ ] Multi Control Plane

Additional context

@bleggett
Copy link
Contributor Author

bleggett commented Feb 2, 2023

Some context:

#28712

#42114

@bleggett bleggett changed the title Customizing SPIFFE ID format if using an external SPIFFE-compliant SDS should be possible Customizing SPIFFE ID format if using an external SPIFFE-compliant SDS should be supported Feb 2, 2023
@kyessenov
Copy link
Contributor

kyessenov commented Feb 2, 2023

Our authz API requires extracting the source principal from SPIFFE ID, for example, to restrict access by the source namespace. How would you support this API with custom SPIFFE IDs?

Similarly, our telemetry is very much "workload" oriented, meaning we drop the qualified pod name as soon as possible and only report the deployment name, in order to reduce the metric cardinality.

In general, using per-pod identity might give a false sense of security. Kubernetes itself doesn't distinguish between pods of the same KSA from authorization perspective (in RBAC, etc), and even then, tenancy by SA is very weak, and namespaces are much closer to an isolation unit.

@bleggett
Copy link
Contributor Author

bleggett commented Feb 2, 2023

Our authz API requires extracting the source principal from SPIFFE ID, for example, to restrict access by the source namespace. How would you support this API with custom SPIFFE IDs?

Couple of ways I see

  1. Allowing the AuthZ API to use SPIFFE ID URI matching to allow users to customize for themselves what constitutes "source principal" as defined by whatever SDS/workload CA they are using, if they want.

  2. Or simply saying "whatever SPIFFE ID your workload CA mints, it MUST have /sa/<service_account> and /ns/<namespace> fields in it somewhere so K8S-service account authZ rules can work, but your workload CA/SDS server is free to add as many other fields to the SPIFFE ID as you need"

Similarly, our telemetry is very much "workload" oriented, meaning we drop the qualified pod name as soon as possible and only report the deployment name, in order to reduce the metric cardinality.

Do our telemetry APIs rely on parsing the ISTIO-SPIFFE ID format today? If they do, the above solutions would work. If they do not, they shouldn't be affected.

In general, using per-pod identity might give a false sense of security. Kubernetes itself doesn't distinguish between pods of the same KSA from authorization perspective (in RBAC, etc), and even then, tenancy by SA is very weak, and namespaces are much closer to an isolation unit.

Depends - my point is that attesting pod identity is the provenance and sole responsibility of the SDS server/workload CA you happen to be using - and the granularity to which identity is attested also belongs to that. Even today - the default istiod SDS/workload CA owns those guarantees, and is the trust root of all of them.

If we support pluggable SDS servers, and we do, we should consider respecting whatever the workload CA attests (or at least just respect the parts we care about and ignore the rest), rather than creating downstream de-facto assumptions that constrain what the workload CA can attest and encode in the cert, which is frankly backwards.

If you use the default istiod SDS workload CA, all we attest is Kubernetes service account, as attested by the Kubernetes API - but we are necessarily trusting SDS server to attest those things. If you swap that out for another SPIFFE-compliant SDS server, like say SPIRE, it can attest a superset of that - we don't have to care about the superset, but we shouldn't prevent the superset from being represented, which is what we do today.

@kyessenov
Copy link
Contributor

  1. Allowing the AuthZ API to use SPIFFE ID URI matching to allow users to customize for themselves what constitutes "source principal" as defined by whatever SDS/workload CA they are using, if they want.

Yes, but that's an API change. We're very wary of making any semantic changes to the existing APIs since any change can potentially break users.

  1. Or simply saying "whatever SPIFFE ID your workload CA mints, it MUST have /sa/<service_account> and /ns/<namespace> fields in it somewhere so K8S-service account authZ rules can work, but your workload CA/SDS server is free to add as many other fields to the SPIFFE ID as you need"

That could work, but our implementation does strict regex matching I think. We'd need to make sure pattern matching is backwards compatible.

Do our telemetry APIs rely on parsing the ISTIO-SPIFFE ID format today? If they do, the above solutions would work. If they do not, they shouldn't be affected.

It matters because we report principals literally as primary metric tags. Having a pod name as a principal will overwhelm the metric systems (none of them scale well to POD^2 cardinality).

If you use the default istiod SDS workload CA, all we attest is Kubernetes service account, as attested by the Kubernetes API - but we are necessarily trusting SDS server to attest those things. If you swap that out for another SPIFFE-compliant SDS server, like say SPIRE, it can attest a superset of that - we don't have to care about the superset, but we shouldn't prevent the superset from being represented, which is what we do today.

If SPIFFE certs are only used by Istio, then it's better to propose to SPIRE to generate Istio-compatible identities, because Istio in general simply doesn't make use of pod names in the APIs. The only issue is inter-op with another system that shares the identities, and for that, we'd need more details on what the other system is.

@bleggett
Copy link
Contributor Author

bleggett commented Feb 2, 2023

  1. Allowing the AuthZ API to use SPIFFE ID URI matching to allow users to customize for themselves what constitutes "source principal" as defined by whatever SDS/workload CA they are using, if they want.

Yes, but that's an API change. We're very wary of making any semantic changes to the existing APIs since any change can potentially break users.

  1. Or simply saying "whatever SPIFFE ID your workload CA mints, it MUST have /sa/<service_account> and /ns/<namespace> fields in it somewhere so K8S-service account authZ rules can work, but your workload CA/SDS server is free to add as many other fields to the SPIFFE ID as you need"

That could work, but our implementation does strict regex matching I think. We'd need to make sure pattern matching is backwards compatible.

Yep. Also, this would only matter to people using a nonstandard SDS. Anyone continuing to use the default istiod SDS shouldn't be affected - if you make an explicit choice to use a different SDS than the one we ship, we should support that and document it, but it (and the config required to support extended SPIFFE IDs) doesn't need to be the default.

Do our telemetry APIs rely on parsing the ISTIO-SPIFFE ID format today? If they do, the above solutions would work. If they do not, they shouldn't be affected.

It matters because we report principals literally as primary metric tags. Having a pod name as a principal will overwhelm the metric systems (none of them scale well to POD^2 cardinality).

That is useful info and something to consider.

If you use the default istiod SDS workload CA, all we attest is Kubernetes service account, as attested by the Kubernetes API - but we are necessarily trusting SDS server to attest those things. If you swap that out for another SPIFFE-compliant SDS server, like say SPIRE, it can attest a superset of that - we don't have to care about the superset, but we shouldn't prevent the superset from being represented, which is what we do today.

If SPIFFE certs are only used by Istio, then it's better to propose to SPIRE to generate Istio-compatible identities, because Istio in general simply doesn't make use of pod names in the APIs. The only issue is inter-op with another system that shares the identities, and for that, we'd need more details on what the other system is.

The only issue is inter-op with another system that shares the identities, and for that, we'd need more details on what the other system is. Or, we could not care about what other identity properties other systems want or need and simply act as a minimal intermediary between whatever workload CA you want to use that is compatible with us, and whatever external requirements you have - when it comes to SPIFFE, we are the ones that aren't compliant with (or allow fully-compliant implementations of) the published standard, and IMO that's on us, not SPIRE (or any other SDS).

We may require a subset of the spec for our own purposes, but we should not disallow (or refuse to pass thru) a superset of our requirements that are fully within the spec we support just due to some naive validation rules on our part - that's what we do today, and IMO that's a bug and demonstrably not strictly necessary if you are already using an alternate SDS/workload CA due to nonstandard requirements.

We should expect the things we need in the cert to be in the cert - if there are more things that external entities might want, that should be negotiated between the custom workload CA you are using to mint Istio workload certs, and your external entities - we shouldn't get in the middle of that and block it, or try to support all permutations of that ourselves.

@hzxuzhonghu
Copy link
Member

For what case do you need more granular workload identity? For stateless application, k8s designed deployment as a logic concept for a group of instances, and each one has same permission. Why does istio need to separate them for auth?

@bleggett
Copy link
Contributor Author

bleggett commented Feb 3, 2023

For what case do you need more granular workload identity? For stateless application, k8s designed deployment as a logic concept for a group of instances, and each one has same permission. Why does istio need to separate them for auth?

It doesn't - but there are external systems or integrations that will handle workload certs that might want or need that (see #42114 and @costinm's use cases), and Istio should not prevent you from putting more granular workload identity in the certs than what Istio itself needs. Especially since we support alternative workload CAs that provide more granularity - we just make it unnecessarily difficult to use that additional granularity.

Today Istio does prevent you from doing that, practically speaking, even if you use a non-default workload CA that supports this.

Additionally, we support replacing the istiod-default workload CA with alternate workload CAs which can attest a much more granular identity than istiod can - which is good - but rather than passing thru additional granularity that the customizable workload CA might put in workload certs, we put an upper bound on it

  1. Istio itself doesn't need more granular identity attestation than the identity attestation that istiod's default workload CA supports.
  2. Istio supports alternative workload CAs
  3. Istio currently prevents you from using the more granular identity attestation of those external workload CAs because it insists on a de-facto standard, rather than simply insisting that the fields it needs are present, and ignoring+passing thru more granular specifiers.

@howardjohn howardjohn removed the area/ambient Issues related to ambient mesh label Feb 13, 2023
@dafang982
Copy link

We may require a subset of the spec for our own purposes, but we should not disallow (or refuse to pass thru) a superset of our requirements that are fully within the spec we support just due to some naive validation rules on our part - that's what we do today, and IMO that's a bug and demonstrably not strictly necessary if you are already using an alternate SDS/workload CA due to nonstandard requirements.

@bleggett, you have hit the nail on its head. I'm working on a project now and would like to use Istio, but this lstio limitation stops me choosing it. As we have our own SPIFFE CA that creates the identity that doesn't follow the Istio required pattern, even thought it is 100% SPIFFE compliant!

JamesCallaghan added a commit to controlplaneio/threat-modelling-zero-trust-talk that referenced this issue Apr 24, 2023
- modify cluster spiffe ids to use custom format
- modify federation trust relationships to use new ids
- add templated destination rule to workloads 1 and 2 with a DestinationRule as suggested in istio/istio#43105
@costinm
Copy link
Contributor

costinm commented Apr 28, 2023

I am quite in favor of having more flexibility in how we check identities and apply authz - but I am not sure Spiffe URL and having the 'workload name' as part of the URL is the right solution.

It will be critical for interop with other mesh implementations - that may not use spiffee but DNS or other identities, and it will also allow passing secured info about node, cluster, etc which are missing.

One proposal that I think would solve this nicely ( and much more ) is to add a second SAN with the fully qualified
pod hostname - and/or add custom extensions for the extra info, and extend the API to allow the use of such extensions.

Having a URL with hard to predict format and regex or other ugly ways to guess what variant of spiffee was used is quite dangerous and complicated.

@bleggett
Copy link
Contributor Author

bleggett commented Apr 28, 2023

Having a URL with hard to predict format and regex or other ugly ways to guess what variant of spiffee was used is quite dangerous and complicated.

You'd have to do practically the same thing in the same way with a fully-qualified DNS name in most cases if you wanted to extract parts of the hostname identity as "descriptive metadata" - e.g. parsing segments out of pod-ip-address.my-namespace.pod.cluster-domain.example doesn't strike me as inherently more efficient than doing the same with spiffeid:https://cd/<cluster-domain>/ns/<namespace>/<...>/pod/<podid>, so I'm not sure it's realistically better to use DNS - tho it is more in line with what K8S has (for now) chosen to standardize on, I'll grant you that.

In general the problem here is that Istio is overly prescriptive of the SPIFFE format - it doesn't just expect certain fields, it precludes any other fields or additional specifiers from being used and imposes an ordering on the fields which are present, which is an unnecessary/overly-opinionated fragility that makes it impossible to consume SPIFFE IDs generated outside of Istio, among other things.

Istio does not need to predict the SPIFFE format at all - it can simply expect certain named segments to exist in whatever SPIFFE URI it gets, as per the SPIFFE spec, and complain if they are not there.

The fact that it mandates a complete SPIFFE format today is an Istio bug, really - if we need to integrate with things like e.g. Cilium I expect we will have to fix this and become more flexible WRT the SPIFFE formats we handle anyway in a way similar to what I'm describing above.

Instead of this, we could add a SAN, or extra non-compliant x509 cert fields - the thing that bothers me is that we don't really need to, if we fix the above.

At the end of the day though, I am interested in a standardized, not-just-Istio-parsable form of globally-unique workload identity, however we can get there. And I am very interested in Istio not coming up with it's own mechanism for this. SPIFFE is designed to address exactly this problem, we already use it in spots, other projects also use it, and it's a well-defined CNCF spec - so IMO a clear and compelling argument needs to exist for why we shouldn't use it, if we don't want to.

The best argument against it so far is "because K8S chose not to follow it as a standard" - which is fine, but since we span clusters and might have mesh interop concerns, we might have needs that extend beyond current K8S requirements.

@bleggett
Copy link
Contributor Author

bleggett commented Apr 28, 2023

The other nice thing about SPIFFE is that identity can be described in a way that is not inherently rooted in DNS server trust, which is nice because not everyone can rely on a fully end-to-end attestably-secure DNS stack in all scenarios - Google and other cloud providers naturally do not have this problem within their own clouds.

@costinm
Copy link
Contributor

costinm commented Apr 29, 2023 via email

@costinm
Copy link
Contributor

costinm commented Apr 29, 2023 via email

@bleggett
Copy link
Contributor Author

bleggett commented May 1, 2023

I don't see why a name in the hostname syntax is 'rooted' in DNS - it's also an opaque identifier like an email address or URL. You don't need to do any DNS lookup in any of the verifications you do.

If you want to do any sort of attesting that a specific identifier belongs to a specific service and all you are using is DNS, you have to trust the DNS server to make that attestation, and all it can reasonably express is a name <-> IP mapping.

I think we're saying the same thing here.

SPIRE identifiers are also fully opaque, but unlike DNS, SPIRE offers many forms of workload attestation that go far beyond trusting the DNS records a given server possesses. All DNS can do is attest the validity of a name <-> IP map entry. That's a weak form of workload identity, and is not multifactor.

That is not actually the case with Istio and SPIFFE today - if DNS is hacked, the VIP of a different service can be returned and the entire security and checks are messed up. This is well known and why Istio REQUIRES a secure DNS to be secure.

This is a constraint of Istio (and kubernetes) yes. It has very little to do with SPIRE - I think the point you are making is that it will always be a constraint of Istio and Kubernetes, whether we used DNS or SPIRE to identify workloads, which I would certainly agree with - workload identity is one part of the puzzle.

There is nothing special about expressing something as URL instead of hostname or email, from a security perspective. The advantage of DNS over URL, when client authenticates the server, is that it is independent of a discovery server mapping DNS to VIP to URLs, and fully interoperable and well known mechanism.

Correct - the difference is in what attestations you can practically cryptographically attest against that identifier - DNS is not designed to attest anything besides a name <-> IP mapping, which by itself is not sufficient for attesting workload identity.

@bleggett
Copy link
Contributor Author

bleggett commented May 1, 2023

I think the root problem is that SPIFFE is over-selling the use of a URL (that in most cases is NOT a workload identity) to magically make things secure, and ignoring the complexities and insecure side-channels it introduces.

Even if they had a well defined schema - like the distinguished name or JWT claims - it would still be tied to a discovery system to map what users want - access example.namespace.svc - to the URLs representing identity in whatever control plane is used.

Sure - parsing the specific fields of the identifier is out of scope of what SPIFFE (and SPIRE) offers. Same with the DNS naming you're suggesting - DNS has no such constraint or standard but conventions can be overlaid on it.

The difference is that DNS identifiers are only designed to attest a single identity factor historically (and until recently didn't even offer any real security guarantees about that attestation), and SPIFFE is expressly more general than that - crypographically binding attestations of multiple factors of workload identity (as SPIRE does to SPIFFE IDs) to DNS records is simply not something you can or will ever be able to do within the DNS standard.

It's a fundamentally unsound basis for workload identity - unless you invent several layers of de-facto standards that live outside normative DNS implementations, at which point you arrive at something that looks exactly like what SPIFFE/SPIRE already is, but with DNS names instead of SPIRE IDs. Which seems like a rather extreme and unproductive form of NIH.

And at that point what you have done is invest a lot of work to avoid using an existing standard, so you can craft another, even more de-facto standard around DNS records that is potentially worse, and certainly no better.

I'm not against putting DNS records in certs as a shortcut, or for admitting that we are probably, in the short term, bound to what K8S has decided to do - but I am saying that (vanilla, secure or not) DNS records are not, in the long term, a sufficient mechanism for representing workload identity (or for acting as a generic identifier that more specific forms of workload identity attestations can be cryptographically bound to), unless we invent several layers of nonstandard extensions to/assumptions around DNS. And if we do that, we have essentially reinvented SPIFFE/SPIRE but done some violence to an older, established, and simpler standard to get there.

@costinm
Copy link
Contributor

costinm commented May 1, 2023 via email

@costinm
Copy link
Contributor

costinm commented May 1, 2023 via email

@costinm
Copy link
Contributor

costinm commented May 1, 2023 via email

@elinesterov
Copy link

I think the list of discussions there was toward the format that adds additional information to the end of the spiffe path, but it can be added in the beginning, e.g., /cluster-id/ns/namespace/sa/service account cluster_id already a part of istio configuration and env variables and makes total sense in the multicluster scenarios.

Adding it would also solve the problem of multicluster deployment when the limitation is that you HAVE TO avoid namespace collision.

Adding it will not have to change a lot of internals here because it is already part of the configuration.

I understand that making the arbitrary SPIFFE ID support, as @bleggett mentioned, has some challenges to make it work with telemetry and access control. Still, using Istio for mTLS only and other systems like OPA for authorization might be a conscious choice. In this case, the user can opt-in to disable the default istio spiffe id template scheme.

@elinesterov
Copy link

@costinm

I think the root problem is that SPIFFE is over-selling the use of a URL
(that in most cases is NOT a workload identity)
to magically make things secure, and ignoring the complexities and insecure
side-channels it introduces.

If you read the SPIFFE specification, you can see that it is not only about URL, and not only about x.509 as an identity document. SPIFFE is the only mechanism that can enable federation easily for Istio and as authentication in multiple CA federated environments spiffe auth prevents cases of identity spoofing because of the trust domain.

and insecure side-channels it introduces.
I would love to learn more about side channels here :) in the case of istio and SPIRE, it uses the same mechanism of delivery of X509-SVID to the envoy. It actually makes it better from a security standpoint because doesn't need to rely on service accounts only as a security mechanism (which forces Istio users to create SA they might not even use or they all just use default)

@elinesterov
Copy link

I was thinking: what if allowed to provide a scheme that should be used by Istio and leave the default as it is today: ns/namespace/sa/service-account in this case, if I opt-in using a different scheme, I just need to provide it to Istio and as long it has namespace and service accounts in the spiffe id everything should function as it is now.

@costinm
Copy link
Contributor

costinm commented Sep 27, 2023 via email

@costinm
Copy link
Contributor

costinm commented Sep 27, 2023 via email

@elinesterov
Copy link

AFAIK you can already use any SAN you want - using the explicit APIs we
support.

@costinm do you mean by using the Destination rule? or any other mechanism? Would you mind please to point me to the direction where I can read/find more about it

@elinesterov
Copy link

Also is ti possible to configure envoy through istio to use different formats of spiffe id in different contexts (e.g. internal inside cluster and external when talking to services outside of the mesh e.g. via service records + destination rules?)

@costinm
Copy link
Contributor

costinm commented Sep 28, 2023 via email

@costinm
Copy link
Contributor

costinm commented Sep 28, 2023 via email

@elinesterov
Copy link

"Identity Federation" usually means communicating with a peer that has a
different identity provider/roots.
It has 2 sides - client verifying the server and server authorizing a
client.

DestinationRule allows you to specify which root CAs to trust and any SAN
you want - URL or DNS are both fine.

@costinm I think this flexibility only works when you use secrets in the destination rule.
Something like:

spec:
  host: mydbserver.prod.svc.cluster.local
  trafficPolicy:
    tls:
      mode: MUTUAL
      clientCertificate: /etc/certs/myclientcert.pem
      privateKey: /etc/certs/client_private_key.pem
      caCertificates: /etc/certs/rootcacerts.pem

When it comes to using SDS I cannot find a way to do that. I think using different contexts would solve that e.g. using builtin:https://external would tell sds to use context foo so the envoy can get client cert\key and root for that external out-of-cluster mTLS (this is where different spiffe id and any other setting can be used). I think that wouldn't mess with internal Istio spiffe id format.

@bleggett
Copy link
Contributor Author

bleggett commented Sep 28, 2023

The DestinationRule approach is per-workload. That means if you have 1000 workloads you need to create 1000 DestinationRule overrides to "fix" the Istio-hardcoded SAN format. That's a kludge, not an API.

The hardcoded SAN format is:

  • Istio specific
  • Not sufficient for all use cases
  • Enforced in one spot in the Istio envoy config
  • Implicitly assumed everywhere else

It's just not particularly robust as-implemented - the inflexibility is a side effect of the fragility.

While changing the default format would be invasive (though certainly not the most invasive change Istio has ever presented users with), there's clearly several other opt-in options that wouldn't harm the default functionality that are worth pursuing here.

the tl;dr is that if istiod is currently the only thing that can act as a CA for workloads, that's tech debt - not essential protected functionality. "You must use our CA because our CA is special" is a bug. Nothing about Istio's functionality needs to depend on you selecting a specific workload CA. We simply overindex on our built-in implementation of the workload CA.

It simply should not be Istio's business how a CA ties a workload to a cert. Currently, it is, and that creates a lot of problems - not just "I have a niche use case" problems, but also general fragility and scoping problems from the auth APIs on down.

@costinm
Copy link
Contributor

costinm commented Sep 29, 2023 via email

@costinm
Copy link
Contributor

costinm commented Sep 29, 2023 via email

@bleggett
Copy link
Contributor Author

bleggett commented Sep 29, 2023

The Istio spiffe format IS itio specific. There is no standard URL format.

There is nothing Istio specific about the format. There is a hardcoded default in Istio, which is not exposed in any API, but which several of our APIs have to implicitly assume and be aware of.

There are only 2 questions here:

Istio currently DOES hardcode the SPIFFE ID format - prefix AND postfix, which some code (and some APIs) opaquely assume.

  1. Does that code actually NEED to make that assumption?
  2. If it does, does it NEED to make that assumption about the prefix and the postfix, or just one or the other?

The answer to 2 is clearly "no, it does not".

The answer to 1 is likely "no, it does not, but it's a little hard to fix".

There's not much more to argue about here from a design perspective.

A SPIFFE ID is a pointer. It is not an identity. All Istio really, genuinely needs to do is hand that pointer to a workload CA and get a x509 cert back. To the degree that istio cares about the contents of the pointer, versus the thing it points to, is a degree to which Istio is making fragile assumptions. We should minimize, not maximize, the number of fragile assumptions.

This is exactly analogous to DNS - assumptions should be made about the thing the pointer points to, and not the format of the pointer itself. k8s needs to force a DNS name postfix format, and does, but it hardcodes/forces a minimal opinion - just the postfix is non-negotiable, it has no opinion about any other part of the DNS name.

There's no reason why we can't, or shouldn't, follow that for SPIFFE ID formats.

If you have 1000 workloads not using Istio Spiffee - maybe you should not use Istio, or modify whatever is generating the certs to use Istio format.

Again - Istio's value is not in being a workload CA. Nor is Istio's value in being a remarkably inflexible workload CA. It's 5% of the functionality and there are many implementations. It really shouldn't matter what workload CA Istio uses at all. Istio binds policy to workload identities. How those workload identities are generated, or mapped to workloads, is an implementation detail of the workload CA. Istio provides a basic workload CA implementation, much like it provides a basic waypoint implementation. Istio currently overindexes on its own basic workload CA implementation as a de-facto standard, when that is not strictly required.

Istio's invariants around the workload CA are

  • Every workload should have a cryptographic identity (how the workload is bound to the identity -> owned by the workload CA, not Istio's CP).
  • Istio should be able to figure out which workload CA to talk to.
  • Istio should be able to obtain a cryptographic identity from the workload CA by giving it some pointer.
  • The pointer should be "sufficiently" (where "sufficient" is determined by the workload CA) unambiguous, and return a single cryptographic identity (which encodes the pointer)
  • The cryptographic identity should be validate-able by Istio.

That's it. Nothing there requires the use of a specific workload CA. It requires a (very basic, probably x509) contract between Istio and the workload CA.

Why would we take the risks and do the work maintaining a separate format ( one ? 10 ? how many ways do you want to parse a URL - and make sure this is reflected in all code that deals with certs). Are we going to have regex to extract arbitrary URLs ? And the format is the least of your problems - how do you discover what identity to use for each of the 1000 workloads ?

Again - see DNS and the previous examples. How many "DNS formats" does Istio (or K8S) support? What parts of the DNS name MUST FOLLOW a fixed format for Istio to function? It's not a matter of "what Istio must support" - it's a matter of Istio making minimal, versus maximal, assumptions for the constraints it places around an external system (DNS, SPIFFE, x509, etc).

Secure naming generates istio style URLs.

No. Secure naming, from Istio's current perspective, generates URLs that contain at least a trust domain, a service account name, and a namespace. We have some code that confuses at least with at most, for no particularly defensible reason.

And again - the pointer is not "secure", it just needs to be "sufficiently" unambiguous, and tightly bound to the cryptographic identity the workload CA resolves it to. The latter is where the "secure" comes from.

  • Istio overspecifies an external system's spec, which creates needless inflexibility and API problems.
  • Istio does not actually need to engage in that overspecification, but does.

I'm all for improving the UX for DestinationRule loading certificates. It's
a pretty stable API/feature, we went through this with gateway so
if anything is broken or hard to use - I would rather fix it then try to
hack the URL format in the spiffe cert and any associated
code.

The current URL format is an over-fragile hack, with no API.

We're right back at what I said previously:

that creates a lot of problems - not just "I have a niche use case" problems, but also general fragility and scoping problems from the auth APIs on down.

Why on earth would we break a very well-scoped public API (arguably one of the few well-scoped APIs we have) in order to work around an unnecessarily hard-coded internal default for a workload's SPIFFE ID format?

DestinationRules are a symptom, they aren't the problem here.

If I have a system where the requirement is that "DNS names MUST HAVE AT LEAST a postfix that matches this pattern" and I write code that creates an implicit requirement that "DNS names CAN ONLY match this pattern" - I've made a mistake, not created a defensible internal standard.

@kyessenov
Copy link
Contributor

What exactly assumes the format in Istio? My understanding is that it's fairly small:

  • Authz API translates source principal namespace constraints into regex templates.
  • Ambient translates trust domain constraints into regex / prefix template.
  • Telemetry code validates namespace from SPIFFE against peer header.

Is there anything consuming the principal as a non-opaque string? There's a lot of infrastructure that produces the principals (CA, constraints, etc), but I'm asking about the consumers strictly.

@keithmattix
Copy link
Contributor

SAN matching for TLS context is the one that comes to my mind; you may have implied that in your first bullet though

@howardjohn
Copy link
Member

howardjohn commented Sep 29, 2023 via email

@kyessenov
Copy link
Contributor

@keithmattix Where is it parsing SPIFFE ID? I only see exact matching (and some trust domain stuff, which is also precise match). Ambient does something different (point 2), but we can ignore that

@keithmattix
Copy link
Contributor

keithmattix commented Sep 29, 2023

Where is it parsing SPIFFE ID

Ah I see what you mean. In that case, I don't think we have a ton of parsing; SPIFFE as a whole just seems like a convenient format to store certain information.

@kyessenov
Copy link
Contributor

Yeah, it's an encoding of certain attested workload attributes. John's point is valid - xDS control plane is attesting the server SVIDs on the client by using the service registry and filling out the templates, which requires that the client and the server agree on the template format.

So there are two major places:

  1. A client must attest server SPIFFE ID from the server service account (fill the template on the client).
  2. A server must be able to extract the namespace from the client SPIFFE ID (parse the template on the server). This is both to express a policy and to emit telemetry.

@linsun
Copy link
Member

linsun commented Dec 15, 2023

@EItanya any update on the design doc for this?

@istio-policy-bot istio-policy-bot added the lifecycle/stale Indicates a PR or issue hasn't been manipulated by an Istio team member for a while label Jun 13, 2024
@howardjohn howardjohn added lifecycle/staleproof Indicates a PR or issue has been deemed to be immune from becoming stale and/or automatically closed and removed lifecycle/stale Indicates a PR or issue hasn't been manipulated by an Istio team member for a while labels Jun 13, 2024
@bleggett
Copy link
Contributor Author

bleggett commented Jul 26, 2024

Update on this - one of the things we have settled on I think is that

  • Allowing completely arbitrary SPIFFEIDs is problematic and requires quite a bit of rework, breaking existing policies, and several knock-on effects to the trust model potentially - we probably will not do this in the short term.
  • Allowing appending arbitrary segments to the suffix/prefix of the Istio SPIFFEID should be pretty doable (suffix might just be a slight Envoy tweak to support actually) - we probably will do this in the medium term.

@costinm
Copy link
Contributor

costinm commented Jul 28, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/environments area/security area/user experience kind/docs kind/enhancement lifecycle/staleproof Indicates a PR or issue has been deemed to be immune from becoming stale and/or automatically closed
Projects
None yet
Development

No branches or pull requests