Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix schema definitions #215

Merged
merged 1 commit into from
Jun 17, 2024
Merged

Conversation

tomleb
Copy link
Contributor

@tomleb tomleb commented May 16, 2024

Issue rancher/rancher#45157

Background

The schema definition endpoint in steve (/v1/schemaDefinitions/<schema ID>) provide a schema definition for a Kubernetes resource. The UI can then use this definition for whatever it wants. For example, the endpoint /v1/schemaDefinitions/management.cattle.io.projects will provide a schema definition for the resource Projects in the management.cattle.io group.

Here's an example schema definition for that object (trimmed because it's a big output):

{
  "id": "management.cattle.io.projects",
  "type": "schemaDefinition",
  "links": {
    "self": "https://schema-def.cattle.tomlebreux.com/v1/schemaDefinitions/management.cattle.io.projects"
  },
  "definitionType": "io.cattle.management.v3.Project",
  "definitions": {
    "io.cattle.management.v3.Project": {
      "resourceFields": {
        "apiVersion": {
          "type": "string",
          "description": "APIVersion defines the versioned schema of this representation of an object. Servers should convert recognized schemas to the latest internal value, and may reject unrecognized values. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources"
        },
        "kind": {
          "type": "string",
          "description": "Kind is a string value representing the REST resource this object represents. Servers may infer this from the endpoint the client submits requests to. Cannot be updated. In CamelCase. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds"
        },
        "metadata": {
          "type": "io.k8s.apimachinery.pkg.apis.meta.v1.ObjectMeta",
          "description": "Standard object's metadata. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#metadata"
        },
        "spec": {
          "type": "io.cattle.management.v3.Project.spec",
          "description": "Spec is the specification of the desired configuration for the project."
        },
        "status": {
          "type": "io.cattle.management.v3.Project.status",
          "description": "Status is the most recently observed status of the project."
        }
      },
      "type": "io.cattle.management.v3.Project",
      "description": "Project is a group of namespaces. Projects are used to create a multi-tenant environment within a Kubernetes cluster by managing namespace operations, such as role assignments or quotas, as a group."
    },
    "io.cattle.management.v3.Project.spec": {
      "resourceFields": {
        "clusterName": {
          "type": "string",
          "description": "ClusterName is the name of the cluster the project belongs to. Immutable.",
          "required": true
        }
      },
      "type": "io.cattle.management.v3.Project.spec",
      "description": "Spec is the specification of the desired configuration for the project."
    },
  
    "io.k8s.apimachinery.pkg.apis.meta.v1.ObjectMeta": {
        "name": {
          "type": "string",
          "description": "Name must be unique within a namespace. Is required when creating resources, although some resources may allow a client to request the generation of an appropriate name automatically. Name is primarily intended for creation idempotence and configuration definition. Cannot be updated. More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/names#names"
        }
      },
      "type": "io.k8s.apimachinery.pkg.apis.meta.v1.ObjectMeta",
      "description": "ObjectMeta is metadata that all persisted resources must have, which includes all objects users must create."
    }
  }
}

It's essentially flattening the Kubernetes resource structure so that each nested objects becomes a top-level object in the schema.

We obtain this information by polling the OpenAPI V2 endpoint exposed by the Kubernetes API server. You can have a look at it yourself by doing the following command (warning: very long output):

# Get the whole thing
$ kubectl get --raw /openapi/v2

# Filter for a specific resource (in this case, ConfigMap)
$ kubectl get --raw /openapi/v2 | jq '.definitions."io.k8s.api.core.v1.ConfigMap"'

The problems

Unrepresentable fields in OpenAPI V2

CustomResourceDefinition objects (CRDs) are defined by an OpenAPI V3 schema. There are major differences between V2 and V3, such that some fields representable by V3 are not representable by V2. For example, a field will nullable: true in V3 is not representable in V2.

Here's an example for the Cluster object in provisioning.cattle.io/v1 group.

OpenAPI V2

$ kubectl get --raw /openapi/v2 | jq '.definitions."io.cattle.provisioning.v1.Cluster"'
{
  "type": "object",
  "properties": {
    ...
    "spec": {
      "type": "object",
      "properties": {
        ...
        "rkeConfig": {}         <---- Notice the empty object here, this is wrong!
      }
    }
  },
  ...
}

OpenAPI V3

$ kubectl get --raw /openapi/v3/apis/provisioning.cattle.io/v1 | jq '.components.schemas."io.cattle.provisioning.v1.Cluster"'
{
  "type": "object",
  "properties": {
    "spec": {
      "type": "object",
      "properties": {
        "rkeConfig": {
          "type": "object",
          "nullable": true,
          "properties": {
             (The fields are actually defined here)
          }
        }
      }
    },
  }
}

CRD that is NOT a proto.Kind

We're parsing the OpenAPI V2 schemas of some objects. The parsed document is represented by the interface proto.Schema, with the top-level object generally being a *proto.Kind.

For some odd reason, the UserAttribute object is seen as a *proto.Map instead. The previous code assumed we would get a Kind, so the UserAttribute object returns an empty object.

In addition, it seems like because this is treated as a proto.Map, the API server does not automatically inject the usual metadata, apiVersion and kind fields. We can see the difference again between OpenAPI V2 and OpenAPI V3:

CRD

The kind, apiVersion and metadata fields are not set here.

apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
  name: userattributes.management.cattle.io
spec:
  ...
  versions:
  - name: v3
    schema:
      openAPIV3Schema:
        type: object
        x-kubernetes-preserve-unknown-fields: true

OpenAPI V2

The kind, apiVersion and metadata fields are not set here.

$ kubectl get --raw /openapi/v2 | jq '.definitions."io.cattle.management.v3.UserAttribute"'
{
  "type": "object",                                                              <----- Notice no fields injected here
  "x-kubernetes-group-version-kind": [
    {
      "group": "management.cattle.io",
      "kind": "UserAttribute",
      "version": "v3"
    }
  ]
}

OpenAPI V3

The kind, apiVersion and metadata fields are set here (automagically added by the API server).

$ kubectl get --raw /openapi/v3/apis/management.cattle.io/v3 | jq '.components.schemas."io.cattle.management.v3.UserAttribute"'
{
  "type": "object",
  "properties": {
    "apiVersion": {
      "description": "APIVersion defines the versioned schema of this representation of an object. Servers should convert recognized schemas to the latest internal value, and may reject unrecognized values. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources",
      "type": "string"
    },
    "kind": {
      "description": "Kind is a string value representing the REST resource this object represents. Servers may infer this from the endpoint the client submits requests to. Cannot be updated. In CamelCase. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds",
      "type": "string"
    },
    "metadata": {
      "description": "Standard object's metadata. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#metadata",
      "allOf": [
        {
          "$ref": "#/components/schemas/io.k8s.apimachinery.pkg.apis.meta.v1.ObjectMeta"
        }
      ]
    }
  }
  ..
}

The fix

It's pretty clear that OpenAPI V2 lacks information, so we need to add another source. We could use OpenAPI V3, but we opted not too because it wasn't clear what minimal version of Kubernetes we were tackling for the release (OpenAPI V3 is GA only in k8s 1.27). So for now, we're going to use the CRD definition instead (which is technically OpenAPI V3, but at least it's GA already).

The high-level view of the changes are the following:

  1. byHandler function now does the following:
  • If the schema is from a BaseSchema (a schema that doens't exist in Kubernetes), then build the schema definition for it and return early. Otherwise:
  • Build a schema definition from the OpenAPI V2 info.
  • If the schema is also a CRD (remember, built-in resources like Pods, Deployments, etc aren't CRDs), build a schema definition from it
  • Merge both schema definitions from the OpenAPI V2 and CRD. CRD will ALWAYS override whatever is in OpenAPI V2. This makes sense because CRD is defined by OpenAPI V3, so has more information.

This fixes the rkeConfig case.

  1. Inject the apiVersion, kind and metadata fields in the OpenAPI V2 definition builder. To do this, we take those fields from a known object (ConfigMap) and just inject them as fields. To avoid those fields being overriden by the CRD (which ends up having less information in that case... 😕 ...), we delete those fields in the CRD definition builder. (Yes, really, lol)

This fixes the case of missing apiVersion, kind and metadata fields, specifically for the UserAttribute object.

  1. Treat a *proto.Map with a GVK extension as a *proto.Kind. So now we can reuse the visitor code to correctly handle the UserAttribute object.

@tomleb tomleb force-pushed the fix-schema-definitions branch 9 times, most recently from 5468cbf to daa89b4 Compare May 16, 2024 15:25
}
}

func Test_byID(t *testing.T) {
defaultDocument, err := openapi_v2.ParseDocument([]byte(openapi_raw))
func TestRefresh(t *testing.T) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The previous test code was using private fields of the handler to change the behavior. This made it difficult for me when I was working on this since initially I changed the implementation quite a bit. (eg: the models/schemaToModel fields were no longer in the handler).

Those fields are now back in the handler so I could revert the change here, but I felt this way also worked fine. Let me know what you think.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of things to keep in mind:

  • It's ok to refactor tests if the old way was making it hard to test
  • It's best to avoid refactoring tests if you are also changing the production code. In this case, since you changed the prod code (not too much on this specific function), I'd personally say you should avoid refactoring the test.
  • You can't drop coverage if you refactor the tests. When checking the coverage there's a few error cases of the refresh function you no longer seem to cover.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, once I have a design I'm okay with code wise I'll revisit this. I'll probably be reverting the changes to the test.

@tomleb tomleb marked this pull request as ready for review May 16, 2024 15:37
@tomleb tomleb requested a review from a team as a code owner May 16, 2024 15:37
@MbolotSuse MbolotSuse self-requested a review May 20, 2024 13:36
Copy link
Contributor

@MbolotSuse MbolotSuse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall comments:

  • I'd like to see if we can share as much logic as possible between the v2 approach and the CRDs. I think this will be easier to maintain if we can do that.
  • Overall, there's some tests that would be good to add.
  • Some minor code organization things to improve.
  • There's a changed return code - that will be a problem. I think this will need to be changed back.

pkg/schema/converter/k8stonorman.go Show resolved Hide resolved
pkg/schema/definitions/fixtures_test.go Show resolved Hide resolved
pkg/schema/definitions/handler.go Outdated Show resolved Hide resolved

// mergeSchemaDefinitions the given schemaDefinitions in order, overriding values
// when they are set.
func mergeSchemaDefinitions(schemas ...schemaDefinition) schemaDefinition {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I don't think that this function belongs in this file. It's not used anywhere in here. It seems like something that should go in the handler, since the handler is the only place that needs to do this.

There are a few other options here (like having this be a method on the schemaDefinition struct, or having a separate types.go file) but I think the current place isn't optimal in any other place.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved it to be a method on the schemaDefinition struct. It also no longer accepts many schemaDefinition since that's not really needed.

return merged
}

merged.DefinitionType = schemas[0].DefinitionType
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct me if I am wrong, but it seems like we could makemerged simply be schemas[0] rather than selecting the definition type, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the new .Merge method just does an inplace merge

pkg/schema/definitions/converter.go Show resolved Hide resolved
}

func convertJSONSchemaPropsToDefinition(props apiextv1.JSONSchemaProps, path proto.Path, definitions map[string]definition) {
if props.Type != "array" && props.Type != "object" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here and elsewhere we seem to only cover the following types:

  • array
  • object
  • string
  • boolean
  • integer
  • number
    How do we know that these are the only types?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are the only types defined by the OpenAPI V3 spec. I have added a comment that describes the available types and where they come from.

// This supports all OpenAPI V3 types: boolean, number, integer, string, object and array
// as defined here: https://swagger.io/specification/v3/

pkg/schema/definitions/converter.go Show resolved Hide resolved
pkg/schema/definitions/fixtures_test.go Show resolved Hide resolved
}
}

func Test_byID(t *testing.T) {
defaultDocument, err := openapi_v2.ParseDocument([]byte(openapi_raw))
func TestRefresh(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of things to keep in mind:

  • It's ok to refactor tests if the old way was making it hard to test
  • It's best to avoid refactoring tests if you are also changing the production code. In this case, since you changed the prod code (not too much on this specific function), I'd personally say you should avoid refactoring the test.
  • You can't drop coverage if you refactor the tests. When checking the coverage there's a few error cases of the refresh function you no longer seem to cover.

@tomleb tomleb force-pushed the fix-schema-definitions branch 6 times, most recently from fb97cb9 to 7b8e6e8 Compare May 28, 2024 14:47
@tomleb tomleb force-pushed the fix-schema-definitions branch 2 times, most recently from 00e3fca to 85632ec Compare May 28, 2024 15:50
@tomleb
Copy link
Contributor Author

tomleb commented May 28, 2024

Made some changes. The refresh does most of the job now since we have to do OpenAPI requests and merge. The byHandler handler simply returns the latest computed merged result.

What's missing:

  • Tests for crdToDefinition
  • More test for CRD
  • Need to find more information on handling SchemaOrArray
  • Need to do something about weird union of merge
  • EDIT: And revert and add more tests for Refresh()

@tomleb tomleb force-pushed the fix-schema-definitions branch 2 times, most recently from b03b163 to ca9ad29 Compare June 12, 2024 16:26
@tomleb tomleb requested a review from MbolotSuse June 12, 2024 16:41
Copy link
Contributor

@MbolotSuse MbolotSuse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, mostly nits and some test cases to add.

Comment on lines 44 to 46
// TODO: What do we mean by required? OpenAPI has both concept:
// - required: The field must be present, whatever its value is (eg: null would be a valid value)
// - nullable: The field can be null
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove the TODO.

Comment on lines 50 to 51
// mergeSchemaDefinitions the given schemaDefinitions in order, overriding values
// when they are set.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit:

Suggested change
// mergeSchemaDefinitions the given schemaDefinitions in order, overriding values
// when they are set.
// Merge merges the provided schema into s. All conflicting values (i.e. that are in both schema and s)
// are replaced with the values from s.

func TestSchemaDefinitionMerge(t *testing.T) {
tests := []struct {
name string
schemas [2]schemaDefinition
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Since we only ever can merge two schemas, and the direction of merge is important (since it determines which values are overridden), I think separating these out into two params makes sense.

},
},
{
name: "empty definition type",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I think we also want a case where there is a definition type, but it's not the same between the two schemas.

// models that has a GVK. The schemaDefinition is the result of merging
// OpenAPI V2 information with the CRD information (if any).
gvkModels map[string]gvkModel
models proto.Models
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the comment here was removed by accident, can you re-add it?

pkg/schema/definitions/converter.go Show resolved Hide resolved
field.SubType = getPrimitiveType(item.Type)
}
}
case "object":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the swagger docs it might be possible for us to identify some of these objects as maps.

Comment on lines 305 to 309
name: "not refreshed",
schemaName: "management.cattle.io.globalrole",
wantError: true,
wantErrorCode: intPtr(503),
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This tests seems to have been removed. I think the case is still valid.

require.Equal(t, test.wantModels, handler.models)
require.Equal(t, test.wantSchemaToModel, handler.schemaToModel)
require.Equal(t, test.wantGVKModels, handler.gvkModels)
})

}
}

func Test_byID(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some other tests that would be good to add:

  • Error when trying to build definition for model
    • Error when trying to construct opnapiV2 definition
    • Error when trying to merge definitions

apiextv1 "k8s.io/apiextensions-apiserver/pkg/apis/apiextensions/v1"
)

func TestCRDToDefinition(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some other cases to cover:

  • Array of type arrray/object
  • Required field
  • No items schema in the array
  • For an array props.Items.JSONSchemas has some values.

@MbolotSuse MbolotSuse requested a review from nflynt June 12, 2024 21:21
@tomleb tomleb force-pushed the fix-schema-definitions branch 2 times, most recently from 118285e to 93ac195 Compare June 13, 2024 19:26
Copy link
Contributor

@nflynt nflynt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good! This is simultaneously a metric ton of tests, and not quite enough tests, but I'm not going to block the PR on that; The happy path is covered decently enough, and we can always add more tests as future work.

@tomleb tomleb merged commit 9ac9be9 into rancher:master Jun 17, 2024
1 check passed
@tomleb tomleb deleted the fix-schema-definitions branch June 17, 2024 14:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants