Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenAPI vocabulary or dialect for code generation #2542

Open
mkistler opened this issue Apr 15, 2021 · 12 comments
Open

OpenAPI vocabulary or dialect for code generation #2542

mkistler opened this issue Apr 15, 2021 · 12 comments

Comments

@mkistler
Copy link

mkistler commented Apr 15, 2021

Code generation tools often have special requirements or restrictions on the structure of an OpenAPI definition (document?) that improve the generated code. Here are some examples of restrictions from the IBM OpenAPI SDK generator:

  • Parameters must be unique by name only, irrespective of "in".

    Rationale: Operation parameters are often rendered as the parameters on a function or method in the target language of the code generator. Since most languages require parameters to have unique names, the code generator would need to incorporate the in of a parameter into its name to prevent name collisions. This is undesirable, since it exposes the mechanics of the API without adding any value.

  • There should be at most one success response with a response body. A 204 and other 2XX is okay, but no other combination of two or more 2XX responses.

    Rationale: In statically-typed languages like Java, the return value of a method must have a single static type. This makes it difficult to represent an operation with two different response schemas as a single method returning a single response type.

  • Property names and parameter names must be "case-insensitive" unique

    Rationale: Code generators often reformat the names of parameters, properties, and schemas to use idomatic case formatting for the target language: lower_snake_case for Python, lowerCamelCase for Java, etc. But this reformatting could introduce naming conflicts if two parameters, e.g. "foo_bar" and "fooBar" are not "case-insensitive" unique.

  • Arrays must contain items of a single type

    Rationale: Many languages require an array to contain only values of a single type.

  • Schema type must specify a single type -- no type arrays

    Rationale: Some widely-used statically-typed languages, e.g. Java and Go) have no provision for "union" types, making it impossible to define a type: [ integer, string ] typed property or parameter.

  • Don't use "nullable"

    Rationale: it's deprecated, and is just an alternate way of expressing type arrays

  • Don't use JSON schema "not"

    Rationale There's no obvious way to represent this in many widely used programming languages.

  • No "if-then-else" in JSON schema

    Rationale There's no obvious way to represent this in many widely used programming languages.

  • The API document should be "self contained" (no external "$refs")

    Rationale: External refs can easily create multiple namespaces for schemas, parameters, security schemes, etc. These are unnecessary complications for code generators.

  • All "$refs" must be to elements in the "components" section of the document

    Rationale: "$ref" targets outside of "components" are unnecessary complications for code generators.

It would be nice to have a common set of rules like this that could be codified into a "Code generation" vocabulary or dialect for OpenAPI.

@mkistler
Copy link
Author

Related:

stoplightio/spectral#476

@handrews
Copy link
Member

@mkistler great to see you getting this going! I have some thoughts, but take them with a grain of salt as I'll probably just drop in on this discussion periodically and don't have the bandwidth to push anything. I'm just offering some ideas in case they help.

I think that this is a good comprehensive overview in what's needed to create successful tooling, but there's also a separation of concerns here in how it might be best approached. I would see three components:

  • A JSON Schema vocabulary. This adds new JSON Schema keywords (which would no longer need an entirely new OAS version to use) to make code generation from JSON Schemas easier (or in some cases, possible at all). This could be used both inside and outside of OAS. A vocabulary defines keyword semantics, and enforces syntax with a corresponding meta-schema. It does not restrict usage of existing keywords (although a meta-schema can do so by forbidding some syntaxes- see the next bullet point).
  • A JSON Schema linter rule set. This would restrict usage, and could be used inside or outside of OAS. Some of this could be enforced through a meta-schema (e.g. only allowing a string for type and not an array of strings, or forbidding if/then/else and not entirely) Other things would have to be done in a linter if they can't be described by a meta-schema, for example case-insensitive uniqueness of property names.
  • An OAS linter rule set. Parameter uniqueness regardless of in or case, restrictions on responses, self-contained references would all go in this category (b/c your reference conditions are at least partially OAS-specific, not generic JSON Schema $ref usage).

I admit to being confused over what $ref target location has to do with APIs. If you can load the resource, who cares where it came from? But perhaps there's something about how OAS tooling handles $ref that I don't fully understand- I see that restriction is in the Azure thing as well.

I would caution against calling this the code generation system, as requirements will vary and not all tools will target the same language(s). A code generator targeting Python will have different needs and capabilities from one targeting Java.

Regarding JSON Schema vocabularies, the approach that looks most promising is one that disambiguates JSON Schema validation constructs that are challenging for code generation by placing new annotation keywords adjacent to (in the same schema object as) the keyword being disambiguated. The latest JSON Schema specification gives an example of this approach. In some cases (not comes to mind) it's probably better to just exclude usage altogether. But for others (if/then/else, perhaps), flagging usage patterns that map well to coding idioms (and excluding non-flagged usage) might be a better option.

@jdesrosiers
Copy link
Contributor

jdesrosiers commented Apr 19, 2021

To me, an ideal code gen vocabulary is one that allows me to use the full power of JSON Schema for validation while also allowing me to use the same schemas for code gen. That would mean that tooling would have to ignore some things that only relate to validation. It also means tooling can't make assumptions about how a pattern is to be interpreted in OO. For example, if/then can be used to express the same thing (and more) as discriminator, but if tooling doesn't recognize if/then, the expressive power for both code gen and validation is limited. Another example is tooling that assumes that allOf means an intersection type and anyOf/oneOf means a union type. That's not always true. The OpenAPI schema includes an example where anyOf is used to express that at least one of "paths", "components", or "webhooks" is required. A third example is anyOf/oneOf being used to emulate enum when you want to give each option a description.

I believe that the way forward is a vocabulary of annotation keywords that allows you to be explicit about how you expect a schema to be used for code gen without it having an effect on validation. Here are a couple examples of the kind of thing I'm thinking of.

{
  "$comment": "Example of anyOf expressing an enum",
  "interpretAs": "enum",
  "anyOf": [
    { "const": 0, "title": "Sunday" }
    { "const": 1, "title": "Monday" }
    { "const": 2, "title": "Tuesday" }
    { "const": 3, "title": "Wednesday" }
    { "const": 4, "title": "Thursday" }
    { "const": 5, "title": "Friday" }
    { "const": 6, "title": "Saturday" }
  ]
}
{
  "$comment": "className is used for the internal name because URI $ids don't work as class names",
  "className": "Foo",
  "type": "object",
  "properties": {
    "foo": { "type": "string" }
  }
}
{
  "$comment": "The baseClass keyword makes it explicit that a reference is intended to represent an inheritance relationship",
  "className": "FooBar",
  "allOf": [{ "$ref": "/schema/foo", "baseClass": true }],
  "properties": {
    "bar": {
      "$comment": "A reference that does not express inheritance. (In a way it does, but that's not how we would expect code to be generated)",
      "$ref": "/schema/common#/nonnegative",
      "maximum": 100
    }
  }
}

This is just off the top of my head. It's probably not the best approach and the names certainly will need some workshopping, but hopefully this gets across the idea of the general idea. I think it would be useful to get some details about the reason for each of the restrictions in the original proposal. That way we can work backwards to try to solve those problems in ways that don't require restrictions for JSON Schema validation.

One more thing I want to point out is that the current proposal is coupled to the OpenAPI document. I believe that we should be solving the general case. OpenAPI users aren't the only JSON Schema users that are interested in code gen and it would be great if we could solve for their needs as well.

@MikeRalphson
Copy link
Member

This is a note to hopefully remember a point for this discussion. Possibly we can learn something from TypeScript type annotations, as they add strong typing to an untyped language, similarly structured annotations might help the code-generation case.

@landrito
Copy link

For clarification:

All "$refs" must be to elements in the "components" section of the document

Should this be restricted to the specific top-level component sections?

ie: $ref: #components/schemas/Pet

Or would references within a top level component be allowed?

ie: $ref: #/components/schemas/Pet/properties/name

@handrews
Copy link
Member

@landrito it's not really a good practice to $ref things that are interior to a usable schema. It doesn't matter (to me) whether it's OAS's #/components/schemas or JSON Schemas #/$defs, but if you're going to re-use a schema, put it somewhere re-usable.

Of course nothing bad automatically happens if you don't. But it's like abusing the leading underscore convention in Python, which usually indicates a private method. You can call it like a public one, but you're doing something that most people reading or maintaining the code wouldn't expect. Most people will assume that a random property schema is not being re-used elsewhere and will feel free to change it without looking for $refs. But in a re-usable location, most people know that they should take re-use into account when making changes.

@MikeRalphson good idea on TypeScript. Which may be the only time I've ever said something positive about TypeScript but that's just my preference for loose/dynamic typing speaking 😝

@mkistler
Copy link
Author

I just updated the original description to add rationale to each of my original bullet points.

@handrews
Copy link
Member

The approach @jdesrosiers illustrated is very much the sort of thing I had in mind when I talked about disambiguating validation constructs. I'll probably get out of the way on this point now and let him carry it forward 😃

@mkistler Many of your rationales cite limitations in specific programming languages, or specific sorts of programming languages. If this is to be THE code generation system endorsed by OpenAPI, that would damage the specification for other environments. I recall you stating something to the effect that everything should be consumable by those languages, but that is a design choice that OAS should allow but not enforce.

I would really like to see some acknowledgement and discussion of this point. I have absolutely no objection to there being a strict no-union statically typed code gen vocabulary, as long as there is also one that allows for full idiomatic usage of loosely typed languages. These would likely build on each other in some way, perhaps a core shared vocabulary and an additional vocabulary for one approach, or two additional vocabularies, one for each, if each has additional needs.

Please do not limit OAS for those of us who live entirely in the loosely typed world.

@mkistler
Copy link
Author

@handrews I have no intention and would indeed resist "limiting OAS" to adopt any of these restrictions in general.

At the same time, I think it would be helpful to have a means to describe subsets of OAS that are amenable to code generators. It was towards this end that I posted the issue.

I think it is likely that there will be many code generation vocabularies -- as you suggest, one might be for statically-typed languages, another for dynamically typed languages, and perhaps others for special situations.

@handrews
Copy link
Member

Thanks for the clarification @mkistler. It just read as if it were a proposal for a single system to me, and I remember someone (I'm not even sure if it was in the context of OAS or on the JSON Schema slack or what) arguing that everything should be done based on strongly typed languages only.

@jdesrosiers
Copy link
Contributor

@handrews I agree. I think we are very much on the same page.

@mkistler Thanks for adding rationale for each of the constraints you proposed. The first thing that jumps out at me is that we aren't just taking about a JSON Schema vocabulary/dialect here. Several of the points seem to apply only to OpenAPI. I assumed you are talking about a JSON Schema vocabulary because OpenAPI doesn't have a similar concept, but now I think you may be extending the concept to OpenAPI as a whole. Either way, I think it would be good to clarify the scope of what we are talking about here.

  • Arrays must contain items of a single type

    Rationale: Many languages require an array to contain only values of a single type.

If you are referring to an unconstrained array that allows items of any type, I don't know of any language that doesn't have such a concept. For example, in Java you can use Array<Object>. If you are referring to position based arrays, many languages have tuples as native types and if they don't, it's not hard to generate a type based on a tuple definition. This constraint is great for a style guide, but it would be an unnecessary restriction in a code generation library.

  • Schema type must specify a single type -- no type arrays

    Rationale: Some widely-used statically-typed languages, e.g. Java and Go) have no provision for "union" types, making it impossible to define a type: [ integer, string ] typed property or parameter.

It's not impossible, you just need to return an Either type of some sort. If the language doesn't have one built in, you can always provide one.

  • Don't use JSON schema "not"

  • No "if-then-else" in JSON schema

    Rationale There's no obvious way to represent this in many widely used programming languages.

Why does it have to be forbidden? Why can't it just ignore it? Code gen has no use for minimum/maximum either, but we wouldn't forbid those. Ideally I would be able to use the same schemas for validation and code gen. There's no reason to cripple validation just because certain keywords aren't useful for code gen. I may be missing something here.

  • The API document should be "self contained" (no external "$refs")

    Rationale: External refs can easily create multiple namespaces for schemas, parameters, security schemes, etc. These are unnecessary complications for code generators.

On the contrary, I consider external $refs extremely necessary and uncomplicated. I always define my schemas individually for the same reasons you would split up code in any other language. No one wants to have to deal with thousands of lines of code in one file. As for complication, following a reference is really easy, so I'm not sure what the concern is there.

  • All "$refs" must be to elements in the "components" section of the document

    Rationale: "$ref" targets outside of "components" are unnecessary complications for code generators.

The biggest problem with this is that it's coupled to OpenAPI. Ideally, this would be a vocabulary that any JSON Schema users could use even if they aren't using OpenAPI. This constraint would couple this vocabulary to OpenAPI. Other than that, why do you think it's unnecessary and why do you think it's complicated? It seems to me an unnecessary complication for the processing engine to have to care where a reference is pointing.

@jonaslagoni
Copy link

Just wanted to mention that I started a discussion in the JSON Schema community repository suggesting to work together on this problem of using JSON Schema in tooling. The outcome of the suggested process tackles the very issues talked about here.

Especially since this is a common issue that we all currently trying to solve in parallel, each with their own way of doing it. - looking forward to hearing your thoughts on the matter! 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants
@MikeRalphson @jdesrosiers @mkistler @handrews @landrito @jonaslagoni and others