Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dereference() fails to deference circular ref #37

Closed
bcherny opened this issue Apr 24, 2017 · 15 comments
Closed

dereference() fails to deference circular ref #37

bcherny opened this issue Apr 24, 2017 · 15 comments

Comments

@bcherny
Copy link
Contributor

bcherny commented Apr 24, 2017

Hi!

I'm using json-schema-ref-parser to power ref resolution in https://github.com/bcherny/json-schema-to-typescript. Thank you for the great work - it makes my life much easier.

I am running into an issue with circular $refs. Here is the simplest repro case:

Input:

{
  properties: {
    foo: {
      $ref: '#/properties/foo'
    }
  }
}

Output (actual):

{
  properties: {
    foo: {
      $ref: '#/properties/foo'
    }
  }
}

Output (expected):

{
  properties: {
    foo: {
      $ref: (reference to properties/foo)
    }
  }
}

This is likely related to #36.

@JamesMessinger
Copy link
Member

JamesMessinger commented Apr 24, 2017

When I first read your bug report, I agreed with your expected behavior. In fact, I was shocked that such an obvious bug had existed for so long in JSON Schema $Ref Parser. 😱

But then I created a quick test to make sure, and that's when I realized that the current behavior is actually correct, and it's actually the same thing as your expected behavior.

Effectively, your example schema equates to this:

schema.properties.foo = schema.properties.foo;

So, de-referencing the schema is effectively a no-op and has no effect. The $ref property simply points to the same object, which is just an object with a $ref property.

@bcherny
Copy link
Contributor Author

bcherny commented Apr 24, 2017

@BigstickCarpet Thanks for the quick response!

Reading through the JSON-Schema and JSON-Pointer specs, the behavior for this case does not seems to be well specified. However, the way json-schema-ref-parser currently handles this case can lead to a cascade of unresolved $refs, and seems somewhat internally inconsistent:

// 1. case in this issue (A -> A, silent failure):
{
  properties: {
    foo: { $ref: '#/properties/foo' }
  }
} => {
  properties: {
    foo: { $ref: '#/properties/foo' }
  }
}

// 2. related case (A -> B -> B, silent failure):
{
  properties: {
    foo: { $ref: '#/properties/bar' },
    bar: { $ref: '#/properties/bar' }
  }
} => {
  properties: {
    foo: { $ref: '#/properties/bar' },
    bar: { $ref: '#/properties/bar' }
  }
}

// 3. related case (A -> B -> A, stack overflow):
{
  properties: {
    foo: { $ref: '#/properties/bar' },
    bar: { $ref: '#/properties/foo' }
  }
} => [Stack Overflow]

// 4. related case (A -> B, C -> A, ok):
{
  properties: {
    foo: { $ref: '#/properties/bar' },
    bar: { properties: { baz: { $ref: '#/properties/foo' }}}
  }
} => {
  properties: {
    foo: [Ref to bar],
    bar: { properties: { baz: [Ref to foo] }}
  }
}

A couple of options come to mind:

  1. Handle 1-3 gracefully: do not throw, remove all $refs, and update references as you do for case 4 (ie. foo and bar would compile to empty objects in 1-3),
  2. Explicitly detect and throw errors when given unresolvable cycles (ie. 1-3 would throw, 4 would continue to work)

What do you think?

@JamesMessinger
Copy link
Member

Thanks for all the example cases. I'll add them to the test suite.

I think your question basically comes down to how a self-referencing $ref should be dereferenced. The spec doesn't explicitly say how this case should be handled, but it seems like one of these two solutions is best:

1) The $ref resolves to itself

This is the current behavior, which is basically a no-op. The $ref resolves to itself, which is an object with a $ref property. Note that I don't consider this to be a silent failure. The $ref is resolved successfully, so there is no error condition.

Pros:

  • It's technically correct. The $ref is correctly resolved to itself.
  • It's unambiguous. The resolved object is clearly a reference to itself, as the schema author intended.
  • It's un-intrusive. I don't attempt to "correct" the circular reference by changing it to an empty object, which could cause confusion and unintended behavior for the schema author.

Cons:

  • There is still a $ref in the schema, which seems counter-intuitive after calling .dereference()
  • Downstream tools/libraries might not be able to handle the remaining $refs, which could cause unexpected errors

2) The $ref resolves to an empty object

I think this is the behavior that you're suggesting (please correct me if I'm wrong).

Pros:

  • There is no $ref left in the schema, which makes sense after calling .dereference()
  • Downstream tools/libraries won't throw errors due to unexpected $refs

Cons:

  • It's not technically correct, since it resolves the $ref to an empty object rather than the object that it actually points to
  • The behavior is ambiguous and potentially confusing
  • It's intrusive, since it attempts to "auto-correct" the circular reference by changing it to an empty object, which the schema author may not have wanted

@bcherny
Copy link
Contributor Author

bcherny commented Apr 24, 2017

Thanks for the writeup. My vote is for (2). I'm not sure it's less correct than (1), since the behavior is unspecified, and as you noted, if we go with (1) then downstream software must always handle the case of unresolved $refs.

In any case, it seems that behavior would need to be updated for my 3rd example above.

I would defer to people closer to the spec. @handrews @dlax @awwright would one of you mind chiming in?

@bcherny
Copy link
Contributor Author

bcherny commented Apr 26, 2017

@handrews
Copy link

For this kind of self-reference, a parser MUST NOT get stuck in an infinite loop (so the stack overflow case is a bug in terms of implementing the schema), but aside from that the behavior is explicitly undefined. Schema authors SHOULD NOT produce such schemas:

https://tools.ietf.org/html/draft-wright-json-schema-01#section-8

A schema MUST NOT be run into an infinite loop against a schema. For
example, if two schemas "#alice" and "#bob" both have an "allOf"
property that refers to the other, a naive validator might get stuck
in an infinite recursive loop trying to validate the instance.
Schemas SHOULD NOT make use of infinite recursive nesting like this;
the behavior is undefined.

Do you have a use case for this self-reference? I can't figure out how it would ever be used, but perhaps I'm missing something?

@Relequestual
Copy link

Agreed.
Maybe we could add a clause that says something along the lines of, "Implementors may choose to protect against such infinite recursive loops, however this should not be expected by schema authors." or some such.

@epoberezkin Does ajv do anything about this?

@bcherny
Copy link
Contributor Author

bcherny commented Apr 26, 2017

@handrews Thanks for the comment.

I do not have a valid use case, and it also seems like an error case to me. For my application (https://github.com/bcherny/json-schema-to-typescript), I could either catch these types of cycles in the validation phase (hopefully deferring to json-schema-ref-parser), or be lenient and compile to empty objects. It was not clear to me what the right approach is, so I am deferring to @BigstickCarpet and everyone else in this thread.

@bcherny
Copy link
Contributor Author

bcherny commented Apr 30, 2017

@BigstickCarpet What do you think? Any chance you'll have time to resolve this in the near future?

@handrews
Copy link

@bcherny

or be lenient and compile to empty objects.

Were I using the program, I'd rather it error out because writing such a schema is almost certainly a mistake, or at best a misunderstanding of how the schema is supposed to work. Then again I generally prefer fail-fast over tolerance in machine communication (unlike, say, HTML, where a human can probably make sense of a best effort interpretation of malformed HTML anyway).

@bcherny
Copy link
Contributor Author

bcherny commented Apr 30, 2017

@handrews I agree! As long as traces/error messages are good, fail fast is almost certainly more helpful than silently correcting the user's input.

@Julian
Copy link

Julian commented Apr 30, 2017

+1 on what @handrews said about what the behavior should be, and that self referencing doesn't really make much sense.

FWIW the test suite does not currently cover circular references, simply because at the moment it has no way of representing the expected outcome (that nothing loops forever), but that was also not really explicitly in the spec before.

@awwright
Copy link

I should point out too, JSON Schema doesn't use JSON Reference (draft-pbryan-zyp-json-ref-03) anymore -- the expression is only allowed where a schema is expected, now.

And there's probably no good reason to ever have an infinite loop/a reference back to yourself, that should probably be a fatal error.

JSON Reference was removed because there's no way to have references without at least somewhat understanding the semantics of the JSON Schema document: It only made sense to have the reference in a place where a schema is expected, you have to know if that schema descends into an instance or not (items/properties vs. allOf/anyOf/oneOf) if you're going to handle infinite loops as a fatal error, and you need to know the current URI base.

@JamesMessinger
Copy link
Member

@bcherny - I will get started on this soon (hopefully this week), depending on how busy I am at work. I believe the stack overflow bug is already fixed in the 4.0 branch. But I'll also address the reference-to-self problem.

@philsturgeon
Copy link
Member

I think based on lack of everyones time and lack of interest from other people this is going to have to be closed. It's been almost 4 years. If anyone who's struggling with this problem can contribute with a pull request, that would be really appreciated!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants