Start adding lazy import references to name lookup. #3475

jonmeow · 2023-12-08T00:23:03Z

Adds a LazyImportRef instruction. Versus CrossRef, this is intended to represent an instruction which cannot be used directly, and must be replaced when it comes up due to name lookup. The intent is to use this to avoid recursive loading of imported IR instructions.

Note, under this model, when ResolveIfLazyImportRef is called, it essentially needs to load both inst and type information to a sufficient point where any further attempts would hit name lookup again. That will probably be complex, and the current implementation is just touching the surface of the issue. I was heading down this route because it would mean we have a limited number of points that need to consider whether they're going to talk about a LazyImportRef.

I'm considering whether CrossRef should be dropped in favor of more specific Builtin special-casing, due to the divergence of desired behaviors. This could mean dropping the builtins IR since it's not looking useful right now.

Modify NameScope to track whether the scope is associated with a load error. This is to handle cases where one or more imports failed, so we do not want to issue warnings for related scopes.

The 0-size on ValueStore comes up due to the changes to NameScope, which make it too large for the default handling. After discussion with zygoloid, the thought was we might want to try reserving a roughly correct value based on parse node counts, but the stack default wasn't buying much.

Fixes a bug where the implicit import used the package name instead of the invalid identifier.

Noticed while working on #3475, but felt like it would be easy enough to separate out.

zygoloid

[No action required for this comment, just musings.] Thinking about the model here, suppose we have a chain of imports:

A <- B <- C <- ...

The gold standard would be that if A adds a public declaration that B doesn't use, then the output for B doesn't change at all. That way the build system can stop rebuilding dependents at that point. What I think happens here is:

A rebuilds and now has one more export.
B rebuilds and now has one more (unused) lazy import decl.
C rebuilds and its lazy import decls from B now have different instruction IDs in them, because everything in B got renumbered.
D rebuilds but doesn't change.

So we do two extra levels of rebuilds beyond what's strictly necessary before we quiesce. I think this is completely fine for now, but might be something we'll want to take another look at later (post-1.0) to reduce transitive rebuilds. (Perhaps this will turn out to be a non-issue anyway: if we only serialize the things that actually end up being used / reachable from public names, I think that would solve the problem, and we might want to do that regardless.)

zygoloid · 2023-12-08T22:16:53Z

toolchain/check/context.h

+ // The previous state of name_lookup_has_load_error_, restored on pop.
+ bool prev_name_lookup_has_load_error;


Why do we use a backup / restore approach here, rather than always looking at the state on the top of the scope stack when querying name_lookup_has_load_error_? (I'm not opposed to this if there's a good reason, but I think maintaining this flag alongside other current-scope information like the names set seems simpler if either would work.)

I don't think that'd work:

import library "broken"; class C { fn Foo() { // Top of the scope stack has no load error, but this does name lookup into the package. var a: i32 = y; } // This pops the scope to `C`, `C`'s name scope does not have a load error, but // this still shouldn't print because it's unqualified name lookup. var b: i32 = y; }

The idea would be that when you push a new scope, its "name lookup has load error" flag is set to that of the enclosing scope.

toolchain/sem_ir/formatter.cpp

zygoloid

LGTM once comments are considered; the only one I consider blocking is the missing FormatInstructionLHS / name printing for lazy imports.

jonmeow · 2023-12-12T00:11:32Z

[No action required for this comment, just musings.] Thinking about the model here, suppose we have a chain of imports:

A <- B <- C <- ...

The gold standard would be that if A adds a public declaration that B doesn't use, then the output for B doesn't change at all. That way the build system can stop rebuilding dependents at that point. What I think happens here is:

A rebuilds and now has one more export.

B rebuilds and now has one more (unused) lazy import decl.

C rebuilds and its lazy import decls from B now have different instruction IDs in them, because everything in B got renumbered.

D rebuilds but doesn't change.

So we do two extra levels of rebuilds beyond what's strictly necessary before we quiesce. I think this is completely fine for now, but might be something we'll want to take another look at later (post-1.0) to reduce transitive rebuilds. (Perhaps this will turn out to be a non-issue anyway: if we only serialize the things that actually end up being used / reachable from public names, I think that would solve the problem, and we might want to do that regardless.)

I'm not going to worry about this too much right now. I would expect a focus on trying to reduce transitive rebuilds, but I'd also expect transitive import probably should stop exposing names. As it is, I think it's incorrect to say that the build system can stop rebuilding dependents, as there is nothing to stop E from depending on the transitive import of A.

(*there may also be fun tricks in serialization to replace the inst id)

github-actions bot requested a review from zygoloid December 8, 2023 00:23

github-actions bot added the toolchain label Dec 8, 2023

jonmeow force-pushed the xref branch from 22449ee to b9b79e6 Compare December 8, 2023 00:26

jonmeow mentioned this pull request Dec 8, 2023

Fix LexicalLookupResult's node_id -> inst_id #3476

Merged

jonmeow force-pushed the xref branch 2 times, most recently from 219a679 to 376c1a9 Compare December 8, 2023 17:40

github-merge-queue bot pushed a commit that referenced this pull request Dec 8, 2023

Fix LexicalLookupResult's node_id -> inst_id (#3476)

18c51ea

Noticed while working on #3475, but felt like it would be easy enough to separate out.

Start adding import cross refs.

bb1a023

jonmeow force-pushed the xref branch from 376c1a9 to 6661b81 Compare December 8, 2023 17:57

zygoloid reviewed Dec 8, 2023

View reviewed changes

zygoloid approved these changes Dec 8, 2023

View reviewed changes

Addressing comments

ed55099

jonmeow force-pushed the xref branch from 6661b81 to ed55099 Compare December 11, 2023 23:58

jonmeow added this pull request to the merge queue Dec 12, 2023

Merged via the queue into carbon-language:trunk with commit 032c0e0 Dec 12, 2023
6 checks passed

jonmeow deleted the xref branch December 12, 2023 00:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Start adding lazy import references to name lookup. #3475

Start adding lazy import references to name lookup. #3475

jonmeow commented Dec 8, 2023 •

edited

zygoloid left a comment

zygoloid Dec 8, 2023

jonmeow Dec 11, 2023

zygoloid Dec 12, 2023

zygoloid left a comment

jonmeow commented Dec 12, 2023 •

edited

		// The previous state of name_lookup_has_load_error_, restored on pop.
		bool prev_name_lookup_has_load_error;

Start adding lazy import references to name lookup. #3475

Start adding lazy import references to name lookup. #3475

Conversation

jonmeow commented Dec 8, 2023 • edited

zygoloid left a comment

Choose a reason for hiding this comment

zygoloid Dec 8, 2023

Choose a reason for hiding this comment

jonmeow Dec 11, 2023

Choose a reason for hiding this comment

zygoloid Dec 12, 2023

Choose a reason for hiding this comment

zygoloid left a comment

Choose a reason for hiding this comment

jonmeow commented Dec 12, 2023 • edited

jonmeow commented Dec 8, 2023 •

edited

jonmeow commented Dec 12, 2023 •

edited