Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validation is extremely slow for large queries with fragment spreads due to lack of implementers_map caching #862

Closed
xuorig opened this issue Jun 1, 2024 · 7 comments · Fixed by #863
Labels
apollo-compiler issues/PRs pertaining to semantic analysis & validation bug Something isn't working

Comments

@xuorig
Copy link

xuorig commented Jun 1, 2024

I've been investigating slow parsing time in our application. It appears to be due to validate_fragment_spread_type recomputing a large map of implementers everytime it is called. For large schemas, this creates a very large hashmap everytime. Combined with larger queries this can have a very large effect and we're seeing parse times in the seconds.

Instead, the implementers_map can be computed once either through a SchemaWithCache structure, which is used elsewhere already, or by passing along a reference to the pre-computed map during validation. It would be great if this map could be computed only once per schema rather than once per query.

I'm happy to contribute a fix if you are opened to it!

@xuorig xuorig added bug Something isn't working triage labels Jun 1, 2024
@goto-bus-stop goto-bus-stop added apollo-compiler issues/PRs pertaining to semantic analysis & validation and removed triage labels Jun 3, 2024
@goto-bus-stop
Copy link
Member

goto-bus-stop commented Jun 3, 2024

That's a not great oversight. thanks for the report!

One approach we could consider here is to add a cache to Schema itself and provide access to the cache through Valid<Schema> only. Inside schema validation we can manually pass a reference even if the cache is not populated yet (if we need it). operation validation requires a valid schema so we can update the function signatures and reuse the cache between different queries.

@goto-bus-stop
Copy link
Member

goto-bus-stop commented Jun 3, 2024

I think using the SchemaWithCache solution is also acceptable, though it would still require recomputing the map once for every query. If you'd like to contribute that then I'd be totally happy 😄

As a more concrete idea for caching on the schema itself so that it's reusable between different queries, we might have something like the below. If we cache something inside Schema, we have to guarantee that it is only cached when the schema is immutable through Valid<>, and we have to clear the cache when going back to a mutable schema through Valid::into_inner. that could be achieved with a trait.

struct Schema {
    // .. all the current stuff ..
    /// Implementers cache: this must only be accessed through Valid<Schema>.
    implementers_map: OnceLock<HashMap<Name, Implementers>>,
}

impl Valid<Schema> {
    // This could have the API that SchemaWithCache has today, that populates the cache on demand.
    fn implementers_of(&self, name: Name) {}
}

// & then the stuff to support invalidating the cache:
struct Valid<T: Invalidate>(T);
impl<T> Valid<T> {
    fn into_inner(self) -> T {
        self.0.invalidate()
    }
}
trait Invalidate {
    // default implementation for types that can be used with Valid<> but do not require invalidation
    fn invalidate(self) -> Self { self }
}

impl Invalidate for Schema {
    fn invalidate(mut self) -> Self {
        self.implementers_map.take();
        self
    }
}
// this can use the default implementation.
impl Invalidate for ExecutableDocument {}

@xuorig
Copy link
Author

xuorig commented Jun 3, 2024

@goto-bus-stop I was thinking about this over the weekend, another option is to not cache it but instead have it as a living thing in the schema, that gets built along the schema.

Of course it's a bit more complex, again because of mutable schemas and updating this mapping as it is built / modified.

But it's a very common structure to access in GraphQL schemas in general, so it's tempting to just make it a "first class" kind of thing. Thoughts?

@goto-bus-stop
Copy link
Member

I'm not sure if that would work with the current design, which encourages directly mutating the schema structs. Wouldn't users have to manually update the implementers map, or apollo-compiler to provide specific methods to modify the schema?

@xuorig
Copy link
Author

xuorig commented Jun 3, 2024

which encourages directly mutating the schema structs

Gah good point, I thought this was happening through add_type and add_document kind of APIs only. nevermind!

@goto-bus-stop
Copy link
Member

I'll work on this Invalidate idea today so we can reuse the cache between validation runs.

@xuorig
Copy link
Author

xuorig commented Jun 4, 2024

Thank you @goto-bus-stop, sounds great 🙇

goto-bus-stop added a commit that referenced this issue Jun 5, 2024
Fixes #862.

I had proposed a different approach that would let us cache the
implementers map for as long as the schema is immutable, so it could be
reused for different query validations. That approach had an issue that
made `Valid::assume_valid_ref` very, very subtle to use, so we are not
doing that right now.

This is a less ideal version but it does solve the immediate problem.
We already pass around an `OperationValidationConfig` structure and
this seems like a nice, non-invasive place to put the implementers
cache.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
apollo-compiler issues/PRs pertaining to semantic analysis & validation bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants