-
Notifications
You must be signed in to change notification settings - Fork 38.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow use of $ref in in CRD validation schema #62872
Comments
/sig api-machinery |
/assign mbohlool |
/assign @mbohlool I agree it seems good to allow references within a single CRD (as opposed to across resources, which I think we're not ready for). |
The reference-expanding code in the schema lib we're using for CRD validation is scary… if we do this, we shouldn't use an expander that has any possibility of making network or filesystem checks in the process |
👍 /area custom-resources |
/area custom-resources |
Sounds like we have to abstract the reference resolver in kube-apiserver and instantiate it with something less scary. |
/cc @ayushpateria |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
/remove-lifecycle stale |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
/remove-lifecycle stale Will need to investigate if/how we want to do this. |
Has there been any progress on this investigation? We have a usecase where using references from within a single CRD would be useful. |
The motivation behind this is better user experience when dealing with Kubernetes CRDs. CRDs in an openapi definition are expressed as one massive definition rather than the Kubernetes builtin types which rarely nest and instead make use of references within openapi. These references are desirable as it makes it more manageable to sparsely define resources utilizing the nested defaults as you go down the structure. Unfortunately it looks like support for references wrt CRDs is unlikely to be supported in the near future: kubernetes/kubernetes#62872 As a workaround, this adds a new option `--splitPaths` that takes a path and an optional new model name. The path roughly emulates the idea behind `kubectl explain` but I ended up using the `~` character instead of `.` such that the full parent model name could still be used without ambiguities. During type conversion any matches of path will cause a Dhall reference to be injected with the nested definition being pushed back on the stack of definitions to convert. The top level model name can either be specified on the command line via `=com.some.ModelName` or be guessed in the case the CRD author follows the best practice of using the field name as the first word in the description. Some future work that might need exploring related to this are: 1. Allow specifying `splitPaths` from a file 2. Clean up which types get accumulated into the `typesUnion`. I noticed while doing this that all? top level definitions are being dumped into this union where really it should only be top level definitions that have an `apiVersion` and `kind` 3. See if the `mergeNoConflicts` function needs to be improved such that it takes semantic equality (traversing imports) into account.
The motivation behind this is better user experience when dealing with Kubernetes CRDs. CRDs in an openapi definition are expressed as one massive definition rather than the Kubernetes builtin types which rarely nest and instead make use of references within openapi. These references are desirable as it makes it more manageable to sparsely define resources utilizing the nested defaults as you go down the structure. Unfortunately it looks like support for references wrt CRDs is unlikely to be supported in the near future: kubernetes/kubernetes#62872 As a workaround, this adds a new option `--splitPaths` that takes a path and an optional new model name. The path roughly emulates the idea behind `kubectl explain` but () must be used around some model names to disambiguate with a `.` character. During type conversion any matches of path will cause a Dhall reference to be injected with the nested definition being pushed back on the stack of definitions to convert. The top level model name can either be specified on the command line via `=com.some.ModelName` or be guessed in the case the CRD author follows the best practice of using the field name as the first word in the description. Some future work that might need exploring related to this are: 1. Allow specifying `splitPaths` from a file 2. Clean up which types get accumulated into the `typesUnion`. I noticed while doing this that all? top level definitions are being dumped into this union where really it should only be top level definitions that have an `apiVersion` and `kind` 3. See if the `mergeNoConflicts` function needs to be improved such that it takes semantic equality (traversing imports) into account.
The motivation behind this is better user experience when dealing with Kubernetes CRDs. CRDs in an openapi definition are expressed as one massive definition rather than the Kubernetes builtin types which rarely nest and instead make use of references within openapi. These references are desirable as it makes it more manageable to sparsely define resources utilizing the nested defaults as you go down the structure. Unfortunately it looks like support for references wrt CRDs is unlikely to be supported in the near future: kubernetes/kubernetes#62872 As a workaround, this adds a new option `--splitPaths` that takes a path and an optional new model name. The path roughly emulates the idea behind `kubectl explain` but () must be used around some model names to disambiguate with a `.` character. During type conversion any matches of path will cause a Dhall reference to be injected with the nested definition being pushed back on the stack of definitions to convert. The top level model name can either be specified on the command line via `=com.some.ModelName` or be guessed in the case the CRD author follows the best practice of using the field name as the first word in the description. Some future work that might need exploring related to this are: 1. Allow specifying `splitPaths` from a file 2. Clean up which types get accumulated into the `typesUnion`. I noticed while doing this that all? top level definitions are being dumped into this union where really it should only be top level definitions that have an `apiVersion` and `kind` 3. See if the `mergeNoConflicts` function needs to be improved such that it takes semantic equality (traversing imports) into account.
Is there still any chance you let the users use I see you points @liggitt, still would love any feedbeek if anyone will be implementing this. |
This doesn't seem likely to make progress in an issue (no one is working on a design that addresses the significant additional complexity and failure modes introduced). I'd suggest closing this issue, and bringing up the topic in a sig-api-machinery meeting to gauge whether addition of this feature is likely to be accepted and whether there is priority / bandwidth to shepherd a design and implementation for it now |
I think this issue has value as the place everyone is apparently watching. I agree it's not making progress right now and to make progress needs an owner to drive it. The first hurdle here is to gather requirements and demonstrate that some (possibly limited) version of this would be very helpful to a sufficient number of users. This could either start with a SIG meeting agenda item, or end with a SIG meeting agenda item once evidence has been collected, or both. Schema recursion seems actually necessary to me to express some schemas? So I expect some utility to be forthcoming. However it is (obviously) extremely dangerous, too. |
I would call any schema meant to describe an object hierarchy, that can't handle self-referential recursion... impoverished. Would it really be that hard to handle a $ref that just recursively imported the schema pointed to by $ref? Inline the entire thing and then run the validation, which should be the same as having to write the whole thing out, which is currently allowed. I'm just looking for a way to avoid massive cut and paste that I need to find/replace on whenever I make a change. It can not be that hard to allow $ref in the same file as a starting point. I would do it but I find Go impoverished and hate grinding my teeth while I code. I know, that last comment is both unfair and unnecessary, and I apologize in advance, but I'm leaving it because it's true. |
Some such references expand infinitely and can't be expanded, that's a large part of what makes it dangerous. |
I don't want to exclude others' use cases, but I would be satisfied with a mechanism to reference types from built-in schemas. For instance, we have a CRD where our controller will create Pods based on the contents of the custom resource. It would be very convenient to be able to reference existing types for volume and volume mounts, etc. These types (e.g. volumes) if properly expressed in a CRD's schema are very large. Note: there are other solutions such as having the custom resource reference a PodTemplate, but this isn't as convenient for our users. |
github helpfully hid my comment at #62872 (comment), but even referencing types from built-in schemas has a lot of potential problems |
How about a recursion limit as a tried and tested solution? Expand, look for expansions, expand... oh we've done that 10 times, throw an error. I mean, it's my machine and my compute time so I would appreciate a simple argument to set the limit, but I'd happily take a reasonable static limit. |
I'm facing a similar situation where Specifically e.g. we have rules to validate |
This gets dicey around upgrades when a k8s type does some (legal) modification like add a field. This is not an easy change to make and it's not currently prioritized (but we might do it eventually). |
We've also seen some general problems with type sharing in the API. This came up recently with pod containers and initContainers sharing the same type, which caused problems when wanting to add a field needed only for initContainers (xref #115362). That said, I can see some value to doing a $ref to things like podTemplate, which seems reasonable if the goal is to just envelope the podTemplate without needing to care about the contents. |
This use case would really help. We have a variety of operators that create pods and deployments and where our customers would like to customize the generated pods by specifying sidecar containers, volumes, etc. We've experimented with letting the customer specify a reference to a pod template, but they found this confusing. So, instead, we've reproduced the rather large schemas of these types in our CRD. |
(combined with #76965)
$ref
scenario 1:In go types, we can define a type and use it in several other types. When generating CRD from this go types, we want to use reference for that type in the validation so that the schema has better readability.
What you expected to happen:
In the CRD for this type, we expect to see the validation as
$ref
scenario 2:Original type information is lost when CRD authors expand their openapi definitions.
openapi-gen
-based operator code is similarly affected. CRDs forbid$ref
(implicit GVK link) andx-*
properties (GVK hints).This information enables early, developer-friendly policy checks and useful manifest transforms. Our CICD automation extends the base k8s schema with custom policy and relies on the schema's
$ref
-erential knowledge to inspect and operate on target resource types, anywhere they appear, eg. deeply nested or embedded within custom resources. Prior to validation, a simple callback might first:ObjectMeta
app.kubernetes.io/*
labels.app
,app.kubernetes.io/name
,team
, ...Container
resources.limits.*
are set.PodSpec ... PodSecurityContext ... Volume ...
Deployment ... CronJobSpec ... JobSpec ...
ServiceSpec ... SecurityContext ... (many more) ...
$ref
expansion makes CRDs opaque to our automation. A pre-processing step is required to repair the type information and restore visibility into CRDs. Options considered are:openapi-gen
'ed CRDs directly (wherever they writeopenapi_generated.go
) and call eachGetOpenAPIDefinitions
function. Merge all results.openapi-gen
on k8s and CRD sources ourselves. Avoids multipleGetOpenAPIDefinitions
calls and potential linking problems.kube-apiserver
. Runhack/update-openapi-spec.sh
with extra CRDs "builtin" and enabled soGET /openapi/v2
simply returns what we need.I'd love
$ref
support, but I understand why that could be difficult or undesirable. Alas, if CRDs permitted a known property, eg.x-kubernetes-group-version-kind
(or similar), CRD libraries can set it accordingly for each unique$ref
expansion (no need to nest GVK hints), allowing consumers (or even k8s itself) to restore this relationship if and when useful.Are there options we can move on?
The text was updated successfully, but these errors were encountered: