Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spurious segfault related to similar with struct containing Union with Missing #38224

Closed
oschulz opened this issue Oct 29, 2020 · 4 comments · Fixed by #38444
Closed

Spurious segfault related to similar with struct containing Union with Missing #38224

oschulz opened this issue Oct 29, 2020 · 4 comments · Fixed by #38444
Assignees
Labels
bug Indicates an unexpected problem or unintended behavior
Milestone

Comments

@oschulz
Copy link
Contributor

oschulz commented Oct 29, 2020

Note: I'm not sure if this is already fixed on the current nightly or not (see below).

Running this on Julia v1.5.2 (Linux x86_64)

julia> struct Foo
           i::Union{Int,Missing}
       end
julia> similar([Foo(i) for i in 1:10000])

Julia segfaults:

10000-element Array{Foo,1}:
signal (11): Segmentation fault
in expression starting at none:0
jl_get_nth_field_checked at /buildworker/worker/package_linux64/build/src/datatype.c:1049
_show_default at ./show.jl:406
show_default at ./show.jl:389 [inlined]
show at ./show.jl:384
unknown function (ip: 0x7ff1d4083575)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2214 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2398

So the segfault seems related to the REPL-display of the result, not the array creation itself.

System info:

julia> versioninfo()
Julia Version 1.5.2
Commit 539f3ce943 (2020-09-23 23:17 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-9.0.1 (ORCJIT, skylake)

I can reproduce this reliably using the official Julia binary in a CentOS-7 Singularity container on an Ubuntu 20.04 host. It doesn't seem to occur when running Julia v1.5.2 natively on Ubuntu 20.04, though - but then, given the fickle nature of segfaults that doesn't mean there's no issue, of course. I also can't reproduce the segfault using Julia v1.6.0-DEV.1354, neither inside nor outside of the container. But then segfaults don't always get triggered, or course, so I don't know if that means the
underlying issue is already fixed or not. So I'm reporting this in case it's still relevant.

@vtjnash vtjnash added the bug Indicates an unexpected problem or unintended behavior label Oct 29, 2020
@vtjnash
Copy link
Sponsor Member

vtjnash commented Oct 29, 2020

Yeah, doesn't look like this TODO ever got implemented (and the computation seems wrong):

$ grep -RI zeroinit src/
src/array.c:        if (tot > 0 && (!isunboxed || hasptr || isunion)) // TODO: check for zeroinit
src/array.c:        if (tot > 0 && (!isunboxed || hasptr || isunion)) // TODO: check for zeroinit
src/array.c:    if (a->flags.ptrarray || a->flags.hasptr) { // TODO: check for zeroinit
src/array.c:    if (a->flags.ptrarray || a->flags.hasptr) { // TODO: check for zeroinit
src/dump.c:            | (dt->zeroinit << 4)
src/dump.c:    dt->zeroinit = (memflags >> 4) & 1;
src/jltypes.c:                                                    "zeroinit",
src/datatype.c:    t->zeroinit = 0;
src/datatype.c:        st->zeroinit = 0;
src/datatype.c:        st->zeroinit = w->zeroinit;
src/datatype.c:        int zeroinit = 0;
src/datatype.c:                    zeroinit = 1;
src/datatype.c:                    if (!zeroinit)
src/datatype.c:                        zeroinit = ((jl_datatype_t*)fld)->zeroinit;
src/datatype.c:                zeroinit = 1;
src/julia.h:    uint8_t zeroinit; // if one or more fields requires zero-initialization

$ ./julia -q
julia> struct Foo
                  i::Union{Int,Missing}
              end

julia> Foo.zeroinit
false

@quinnj
Copy link
Member

quinnj commented Oct 30, 2020

For the uneducated among us, what does the zeroinit property indicate? That fields should be zero initialized? So Foo in this case isn't getting zero initialized even though it should?

@dkarrasch
Copy link
Member

That's weird. If you have x = similar(...); and then x, it gets displayed correctly.

@simeonschaub
Copy link
Member

For the uneducated among us, what does the zeroinit property indicate? That fields should be zero initialized? So Foo in this case isn't getting zero initialized even though it should?

IIUC correctly, the problem here is that Foo gets stored inline, containing the data and a byte indicating the type of i. This byte can only be 0x00 or 0x01, because there are only two possible types for i. If this byte is not always set to zero, jl_get_nth_field_checked segfaults because the type byte may be invalid.

It looks like jl_compute_field_offsets has a bug, which causes it not to set zeroinit correctly. With this change, at least Foo.zeroinit is true, but the example still segfaults, probably because of all the TODOs in array.c:

diff --git a/src/datatype.c b/src/datatype.c
index 3428e7479e..2a78cd098c 100644
--- a/src/datatype.c
+++ b/src/datatype.c
@@ -544,6 +544,7 @@ void jl_compute_field_offsets(jl_datatype_t *st)
             if (npointers)
                 free(pointers);
         }
+        st->zeroinit = zeroinit;
     }
     // now finish deciding if this instantiation qualifies for special properties
     assert(!isbitstype || st->layout->npointers == 0); // the definition of isbits

To fix these TODOs, would there need to be an additional field in jl_array_flags_t indicating whether the fields need to be zero initialized? Or is there already another way of determining that for a specific array?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Indicates an unexpected problem or unintended behavior
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants