Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Start adding lazy import references to name lookup. #3475

Merged
merged 2 commits into from
Dec 12, 2023

Conversation

jonmeow
Copy link
Contributor

@jonmeow jonmeow commented Dec 8, 2023

Adds a LazyImportRef instruction. Versus CrossRef, this is intended to represent an instruction which cannot be used directly, and must be replaced when it comes up due to name lookup. The intent is to use this to avoid recursive loading of imported IR instructions.

Note, under this model, when ResolveIfLazyImportRef is called, it essentially needs to load both inst and type information to a sufficient point where any further attempts would hit name lookup again. That will probably be complex, and the current implementation is just touching the surface of the issue. I was heading down this route because it would mean we have a limited number of points that need to consider whether they're going to talk about a LazyImportRef.

I'm considering whether CrossRef should be dropped in favor of more specific Builtin special-casing, due to the divergence of desired behaviors. This could mean dropping the builtins IR since it's not looking useful right now.

Modify NameScope to track whether the scope is associated with a load error. This is to handle cases where one or more imports failed, so we do not want to issue warnings for related scopes.

The 0-size on ValueStore comes up due to the changes to NameScope, which make it too large for the default handling. After discussion with zygoloid, the thought was we might want to try reserving a roughly correct value based on parse node counts, but the stack default wasn't buying much.

Fixes a bug where the implicit import used the package name instead of the invalid identifier.

github-merge-queue bot pushed a commit that referenced this pull request Dec 8, 2023
Noticed while working on #3475, but felt like it would be easy enough to
separate out.
Copy link
Contributor

@zygoloid zygoloid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[No action required for this comment, just musings.] Thinking about the model here, suppose we have a chain of imports:

A <- B <- C <- ...

The gold standard would be that if A adds a public declaration that B doesn't use, then the output for B doesn't change at all. That way the build system can stop rebuilding dependents at that point. What I think happens here is:

  • A rebuilds and now has one more export.
  • B rebuilds and now has one more (unused) lazy import decl.
  • C rebuilds and its lazy import decls from B now have different instruction IDs in them, because everything in B got renumbered.
  • D rebuilds but doesn't change.

So we do two extra levels of rebuilds beyond what's strictly necessary before we quiesce. I think this is completely fine for now, but might be something we'll want to take another look at later (post-1.0) to reduce transitive rebuilds. (Perhaps this will turn out to be a non-issue anyway: if we only serialize the things that actually end up being used / reachable from public names, I think that would solve the problem, and we might want to do that regardless.)

Comment on lines +393 to +394
// The previous state of name_lookup_has_load_error_, restored on pop.
bool prev_name_lookup_has_load_error;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we use a backup / restore approach here, rather than always looking at the state on the top of the scope stack when querying name_lookup_has_load_error_? (I'm not opposed to this if there's a good reason, but I think maintaining this flag alongside other current-scope information like the names set seems simpler if either would work.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that'd work:

import library "broken";

class C {
  fn Foo() {
    // Top of the scope stack has no load error, but this does name lookup into the package.
    var a: i32 = y;
  }

  // This pops the scope to `C`, `C`'s name scope does not have a load error, but
  // this still shouldn't print because it's unqualified name lookup.
  var b: i32 = y;
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea would be that when you push a new scope, its "name lookup has load error" flag is set to that of the enclosing scope.

toolchain/sem_ir/formatter.cpp Show resolved Hide resolved
toolchain/sem_ir/formatter.cpp Outdated Show resolved Hide resolved
Copy link
Contributor

@zygoloid zygoloid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM once comments are considered; the only one I consider blocking is the missing FormatInstructionLHS / name printing for lazy imports.

@jonmeow
Copy link
Contributor Author

jonmeow commented Dec 12, 2023

[No action required for this comment, just musings.] Thinking about the model here, suppose we have a chain of imports:

A <- B <- C <- ...

The gold standard would be that if A adds a public declaration that B doesn't use, then the output for B doesn't change at all. That way the build system can stop rebuilding dependents at that point. What I think happens here is:

  • A rebuilds and now has one more export.
  • B rebuilds and now has one more (unused) lazy import decl.
  • C rebuilds and its lazy import decls from B now have different instruction IDs in them, because everything in B got renumbered.
  • D rebuilds but doesn't change.

So we do two extra levels of rebuilds beyond what's strictly necessary before we quiesce. I think this is completely fine for now, but might be something we'll want to take another look at later (post-1.0) to reduce transitive rebuilds. (Perhaps this will turn out to be a non-issue anyway: if we only serialize the things that actually end up being used / reachable from public names, I think that would solve the problem, and we might want to do that regardless.)

I'm not going to worry about this too much right now. I would expect a focus on trying to reduce transitive rebuilds, but I'd also expect transitive import probably should stop exposing names. As it is, I think it's incorrect to say that the build system can stop rebuilding dependents, as there is nothing to stop E from depending on the transitive import of A.

(*there may also be fun tricks in serialization to replace the inst id)

@jonmeow jonmeow added this pull request to the merge queue Dec 12, 2023
Merged via the queue into carbon-language:trunk with commit 032c0e0 Dec 12, 2023
6 checks passed
@jonmeow jonmeow deleted the xref branch December 12, 2023 00:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants