Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Properly guard UpsilonNode unboxed store #51853

Merged
merged 2 commits into from
Nov 5, 2023
Merged

Properly guard UpsilonNode unboxed store #51853

merged 2 commits into from
Nov 5, 2023

Conversation

Keno
Copy link
Member

@Keno Keno commented Oct 25, 2023

In #51852, we are coercing a boxed Union{@NamedTuple{progress::String}, @NamedTuple{progress::Float64}}
to @NamedTuple{progress::String} via convert_julia_type. This results in a jl_cgval_t that has
a Vboxed that points to a boxed @NamedTuple{progress::Float64} but with a @NamedTuple{progress::String}
type tag that the up upsilonnode code then tries to unbox into the typed PhiC slot. This ends up treating
the Float64 as a pointer and crashing in GC. Avoid this by adding a runtime check that the converted value
is actually compatible (we already had this kind of check for the non-boxed cases) and then making the
unboxing runtime-conditional on the type of the union matching the type of the phic.

Copy link
Sponsor Member

@vtjnash vtjnash left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm. Seems like a couple extraneous return statements added to codegen? But otherwise a good cleanup and fix.

@vtjnash vtjnash added compiler:codegen Generation of LLVM IR and native code status:merge me PR is reviewed. Merge when all tests are passing kind:don't squash Don't squash merge backport 1.10 Change should be backported to the 1.10 release labels Oct 25, 2023
src/codegen.cpp Outdated
@@ -2320,6 +2321,9 @@ static jl_cgval_t convert_julia_type(jl_codectx_t &ctx, const jl_cgval_t &v, jl_
if (v.TIndex && !jl_is_pointerfree(typ)) {
// discovered that this union-split type must actually be isboxed
if (v.Vboxed) {
if (skip) {
*skip = ctx.builder.CreateNot(emit_exactly_isa(ctx, v, (jl_datatype_t*)typ));
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't Vboxed potentially nullptr at runtime here? Maybe test that tindex==0x80 instead?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it is. However, tindex==0x80 check isn't sufficient here either. Needs an extra case in emit_exactly_isa to check for null (in this case only).

@gbaraldi
Copy link
Member

The underlying inference bug is still present right?

@oscardssmith
Copy link
Member

why is this on the backport list? we're pretty sure what this was a big introduced a few weeks ago

@topolarity
Copy link
Member

This appears to resolve the more standalone MRE (#51852 (comment)) but not the original one in the issue:

function f(;kwargs...)
    try
        kwargs = rand((values(kwargs), (progress=1.0,)))
    catch
    end
    GC.gc()
    return kwargs
end

f(; progress="0.5")
f(; progress="0.5")
f(; progress="0.5")
f(; progress="0.5")
f(; progress="0.5")

This segfaults for me very reliably locally, with the same signature.

@Keno
Copy link
Member Author

Keno commented Oct 25, 2023

This segfaults for me very reliably locally, with the same signature.

Indeed, although it takes more than one execution for me, so ... progress? I'll take a look.

@oscardssmith
Copy link
Member

since it calls rand it only segfaults half the time.

@Keno
Copy link
Member Author

Keno commented Oct 25, 2023

Fair observation.

@Keno
Copy link
Member Author

Keno commented Oct 26, 2023

Updated MRE:

julia> function phic_type5()
           a = (;progress = "a")
           try
               vals = (a, (progress=1.0,))
               a = vals[Base.inferencebarrier(false) ? 1 : 2]
           catch
           end
           GC.gc()
           return a
       end

@Keno Keno removed the status:merge me PR is reviewed. Merge when all tests are passing label Oct 26, 2023
@Keno
Copy link
Member Author

Keno commented Oct 26, 2023

I managed to fix the MRE, but I found a test case (after fixing a related inference bug that was hiding it) for @vtjnash's comment above, so I'll need to fix that tomorrow before we can merge this.

@Keno
Copy link
Member Author

Keno commented Oct 26, 2023

I'm gonna remove the backport label here. The bug is technically present on 1.10, but we don't have the inference precision to actually see it and I think there is some risk in destabilizing things, esp, since I included that inference precision fix (so it wouldn't backport cleanly anyway).

@Keno Keno removed the backport 1.10 Change should be backported to the 1.10 release label Oct 26, 2023
@Keno
Copy link
Member Author

Keno commented Oct 26, 2023

so I'll need to fix that tomorrow before we can merge this.

I thought about it and it was easier than I thought, so I just did it now.

@topolarity
Copy link
Member

What is the inference fix here for? Is it a required part of the change?

@oscardssmith
Copy link
Member

The change is in the details of effect convergence.

@Keno
Copy link
Member Author

Keno commented Oct 26, 2023

What is the inference fix here for? Is it a required part of the change?

It's a fix to the logic of the previous phic narrowing PR. It's not required, but without it's not possible to write a test for one of the branches of the codegen fix because it's hidden by the inference issue, so I fixed that too.

@oscardssmith
Copy link
Member

CI seems very unhappy about this currently.

@oscardssmith oscardssmith added the kind:bugfix This change fixes an existing bug label Oct 26, 2023
@Keno
Copy link
Member Author

Keno commented Oct 26, 2023

@nanosoldier runtests()

@nanosoldier
Copy link
Collaborator

The package evaluation job you requested has completed - possible new issues were detected.
The full report is available.

@topolarity
Copy link
Member

topolarity commented Oct 27, 2023

Another day, another MWE. It looks like the FLoops.jl failure is relevant.

Reduced to:

julia> foo() = try (a = a, b = 2) catch end
foo (generic function with 1 method)

julia> foo()
ERROR: UndefVarError: `a` not defined
Stacktrace:
 [1] foo()
   @ Main ./REPL[1]:1
 [2] top-level scope
   @ REPL[2]:1

Seems like we're incorrectly inferring that the catch block is unreachable.

We then illegally optimize away the try-catch:

julia> @code_typed optimize=false foo()
CodeInfo(
1%1 = $(Expr(:enter, #4))
2%2 = (:a, :b)::Core.Const((:a, :b))
│   %3 = Core.apply_type(Core.NamedTuple, %2)::Core.Const(NamedTuple{(:a, :b)})
│   %4 = Core.tuple(Main.a, 2)::Core.PartialStruct(Tuple{Any, Int64}, Any[Any, Core.Const(2)])
│   %5 = (%3)(%4)::NamedTuple{(:a, :b), <:Tuple{Any, Int64}}
└──      $(Expr(:leave, :(%1)))
3return %5
4 ┄      Core.Const(:($(Expr(:leave, :(%1)))))::Union{}
│        Core.Const(:($(Expr(:pop_exception, :(%1)))))::Union{}
└──      Core.Const(:(return nothing))::Union{}
) => NamedTuple{(:a, :b), <:Tuple{Any, Int64}}

julia> @code_typed optimize=true foo()
CodeInfo(
1nothing::Nothing%2 = Main.a::Any%3 = Core.tuple(%2, 2)::Tuple{Any, Int64}%4 = Core.typeof(%3)::Type{<:Tuple{Any, Int64}}%5 = Core.apply_type(Core.NamedTuple, (:a, :b), %4)::Type{NamedTuple{(:a, :b), var"#s179"}} where var"#s179"<:Tuple{Any, Int64}%6 = %new(%5, %2, 2)::NamedTuple{(:a, :b), <:Tuple{Any, Int64}}
└──      return %6
) => NamedTuple{(:a, :b), <:Tuple{Any, Int64}}

@Keno
Copy link
Member Author

Keno commented Oct 27, 2023

Ok, that should a quick fix. Looks like we're forgetting that the GlobalRef is not nothrow.

Introduce UNION_BOX_MARKER, to make it easier to grep for all the places
where this is being looked at and distinguish them from other uses of
0x80 as a constant.
if isdefined_globalref(g) && isconst(g)
return Const(ccall(:jl_get_globalref_value, Any, (Any,), g))
end
ty = ccall(:jl_get_binding_type, Any, (Any, Any), g.mod, g.name)
ty === nothing && return Any
return ty
end
abstract_eval_global(M::Module, s::Symbol) = abstract_eval_globalref(GlobalRef(M, s))
abstract_eval_global(M::Module, s::Symbol) = abstract_eval_globalref_type(GlobalRef(M, s))
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
abstract_eval_global(M::Module, s::Symbol) = abstract_eval_globalref_type(GlobalRef(M, s))
abstract_eval_global_type(M::Module, s::Symbol) = abstract_eval_globalref_type(GlobalRef(M, s))

to reflect the fact that it returns type only? I don't think this definition is being used anywhere though.

In #51852, we are coercing a boxed `Union{@NamedTuple{progress::String}, @NamedTuple{progress::Float64}}`
to `@NamedTuple{progress::String}` via convert_julia_type. This results in a jl_cgval_t that has
a Vboxed that points to a boxed `@NamedTuple{progress::Float64}` but with a `@NamedTuple{progress::String}`
type tag that the up upsilonnode code then tries to unbox into the typed PhiC slot. This ends up treating
the Float64 as a pointer and crashing in GC. Avoid this by adding a runtime check that the converted value
is actually compatible (we already had this kind of check for the non-boxed cases) and then making the
unboxing runtime-conditional on the type of the union matching the type of the phic.

Fixes #51852
Value *skip = NULL;
rval_info = convert_julia_type(ctx, rval_info, slot_type, &skip);
if (!allow_mismatch && skip) {
CreateTrap(ctx.builder);
Copy link
Member

@topolarity topolarity Nov 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this (incorrectly) trap unconditionally trap in the new (v.Vboxed && (v.isboxed || mustbox_union)) case?

This branch seems to be dead at least when compiling the sysimage + stdlibs, but perhaps it'd be a bug if it were exercised?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you try to come up with a testcase (or if you can't, you can try pkgevaling this with a compile-time error to find if there's one in the wild).

Keno added a commit that referenced this pull request Nov 11, 2023
Currently [1] it is illegal [2] in IRCode to have a GlobalRef in value
position that could potentially throw. This is because in IRCode, we
want to assign flags to every statement and if there are multiple things
with effects in a statement, we lose precision in tracking which they
apply to. However, we currently do allow this in `CodeInfo`. Now that
we're starting to make more use of flags in inference also, this is
becoming annoying (as it did for IRCode), so I would like to do this
transformation earlier. This is an attempt to do this during lowering.
It is not entirely clear that this is precisely the correct place for
it. We could alternatively consider doing it during the global resolve
pass in method.c, but that currently does not renumber SSAValues, so
doing it during the renumbering inside lowering may be easier.

N.B.: This is against #51853, because this needs some of the inference
precision improvements in that PR to avoid regressing the try/catch
elision tests (which before that PR, we were incorrectly computing
effects for statement-position GlobalRefs).

[1]
39c278b
[2]
https://github.com/JuliaLang/julia/blob/2f63cc99fb134fb4adb7f11ba86a4e2ab5adcd48/base/compiler/ssair/verify.jl#L54-L58

---------

Co-authored-by: Jeff Bezanson <[email protected]>
Co-authored-by: Oscar Smith <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler:codegen Generation of LLVM IR and native code kind:bugfix This change fixes an existing bug kind:don't squash Don't squash merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants