Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backtraces are broken on OSX #7821

Closed
staticfloat opened this issue Aug 3, 2014 · 17 comments
Closed

Backtraces are broken on OSX #7821

staticfloat opened this issue Aug 3, 2014 · 17 comments
Assignees
Labels
kind:bug Indicates an unexpected problem or unintended behavior system:mac Affects only macOS

Comments

@staticfloat
Copy link
Sponsor Member

Well this is new. Backtraces appear to be broken on OSX, but only for C code. Example with my favorite call graph exploration utility @profile:

julia> @profile peakflops()
1.6386590022993734e10

julia> Profile.print()
33 task.jl; anonymous; line: 96
 32 REPL.jl; eval_user_input; line: 54
  32 profile.jl; anonymous; line: 14
   32 linalg/generic.jl; peakflops; line: 324
    5  linalg/generic.jl; peakflops; line: 69
     5 linalg/matmul.jl; gemm_wrapper!; line: 285
      5 linalg/blas.jl; gemm!; line: 526
    27 linalg/generic.jl; peakflops; line: 326
     27 random.jl; rand!; line: 129
 1  REPL.jl; eval_user_input; line: 55

julia> Profile.print(C=true)
...
33 ???; ???; line: 92035753
 33 ???; ???; line: 92041358
  33 task.jl; anonymous; line: 96
   33 ???; ???; line: 91703924
    33 ???; ???; line: 113759015
     32 REPL.jl; eval_user_input; line: 54
      32 ???; ???; line: 91725559
       32 ???; ???; line: 92069246
        32 profile.jl; anonymous; line: 14
         32 linalg/generic.jl; peakflops; line: 324
          5  linalg/generic.jl; peakflops; line: 69
           5 linalg/matmul.jl; gemm_wrapper!; line: 285
            5 linalg/blas.jl; gemm!; line: 526
             5 ???; ???; line: 131770600
              5 ???; ???; line: 133001818
               5 ???; ???; line: 133002515
                5 ???; ???; line: 134195766
                 4 ???; ???; line: 133003899
                  1 ???; ???; line: 153704009
                  1 ???; ???; line: 153704064
                  1 ???; ???; line: 153704084
...

This is running on:

julia> versioninfo()
Julia Version 0.3.0-rc1+314
Commit e8e461e* (2014-08-02 21:40 UTC)
Platform Info:
  System: Darwin (x86_64-apple-darwin13.3.0)
  CPU: Intel(R) Core(TM) i5-4258U CPU @ 2.40GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell)
  LAPACK: libopenblas
  LIBM: libopenlibm
  LLVM: libLLVM-3.3
@staticfloat staticfloat added the mac label Aug 3, 2014
@Keno
Copy link
Member

Keno commented Aug 3, 2014

Working on it.

@Keno
Copy link
Member

Keno commented Aug 3, 2014

Actually, never mind. This issue is simple. You need the .dSYM files for libjulia. Try dsymutil libjulia.dylib.

@Keno
Copy link
Member

Keno commented Aug 3, 2014

I suppose we should give you approximations to the backtrace (function names only, no line numbers), if the debug info is not available.

@staticfloat
Copy link
Sponsor Member Author

I ran dsymutil on libjulia.dylib and libjulia-debug.dylib, but that didn't fix it. I deleted the .dSYM files in case they had been there before and I didn't notice it, re-ran dsymutil, and that didn't fix it either.

@Keno
Copy link
Member

Keno commented Aug 3, 2014

Ok, will have another look.

@JeffBezanson
Copy link
Sponsor Member

Same thing on linux. We should also print nothing instead of the ??? and crazy numbers.

@Keno
Copy link
Member

Keno commented Aug 3, 2014

Ok, there is a similar issue with profiling in that I/O time is not reported properly on OS X because our unwinder fails miserably (also potentially a problem for ^C and one of the reasons we can segfault in I/O):

julia> @profile sleep(10)

julia> Profile.print(C=true)
8399 ???; ???; line: -1794431414

Just noting it down while we're talking about backtraces. I'll have a look at the other issue though.

@staticfloat
Copy link
Sponsor Member Author

While working on the tests/netload/ltests.jl stuff, I've found that I can get segfaults a pretty good chunk of the time when waiting on I/O code as well. I'll try to make some reproducible test cases and submit them as issues.

@Keno
Copy link
Member

Keno commented Aug 3, 2014

Ok, so this turns out to be mostly easy. I'll enable debug symbols in the release binaries and add a fallback mode that finds the function but bails on line numbers.

@staticfloat
Copy link
Sponsor Member Author

I should note that the debug build fares much better, but is not perfect:

$ cat test.jl
@profile peakflops()
Profile.print(C=true)
$ julia test.jl
37 ???; ???; line: 262313206
 37 ...e/src/julia/src/task.c; rec_backtrace_ctx; line: 606
  37 ???; ???; line: 262311876
   37 ???; ???; line: 262312789
    37 ...e/src/julia/src/gf.c; jl_gf_invoke; line: 1489
     37 ...e/src/julia/src/gf.c; jl_method_list_insert; line: 1121
      37 ...julia/src/builtins.c; jl_f_union; line: 846
       37 ...ulia/src/builtins.c; jl_f_top_eval; line: 406
        37 ???; ???; line: 290817769
         37 ./client.jl; _start; line: 354
          37 ./client.jl; process_options; line: 285
           37 ...rc/julia/src/gf.c; jl_gf_invoke; line: 1509
            37 ...rc/julia/src/gf.c; jl_method_list_insert; line: 1121
             37 ...ia/src/builtins.c; jl_f_union; line: 846
              37 ...ia/src/builtins.c; jl_f_top_eval; line: 406
               37 loading.jl; include_from_node1; line: 128
                37 .../julia/src/gf.c; jl_gf_invoke; line: 1489
                 37 .../julia/src/gf.c; jl_method_list_insert; line: 1121
                  37 ./boot.jl; include; line: 245
                   37 .../src/toplevel.c; jl_method_def; line: 709
                    37 ...src/toplevel.c; jl_method_def; line: 703
                     1  ...src/toplevel.c; jl_method_def; line: 682
                      1 ...ulia/src/ast.c; jl_lam_args; line: 603
                       1 ...flisp/./read.c; strtoull_0b0o; line: 24
                        1 ...lisp/./read.c; isnumtok_base; line: 73
                         1 ..././cvalues.c; fl_neg; line: 1021
                          1 ..././cvalues.c; cvalue_relocate; line: 562
                           1 ..../cvalues.c; fl_neg; line: 1021
                            1 ..../cvalues.c; cvalue_relocate; line: 562
                             1 ...isp/flisp.c; apply_cl; line: 1220
                     36 ...src/toplevel.c; jl_method_def; line: 690
                      36 ...rc/toplevel.c; jl_is_toplevel_only_expr; line: 363
                       36 ...rc/toplevel.c; jl_method_def; line: 673
                        36 ...rc/builtins.c; jl_f_union; line: 846
                         36 ...c/builtins.c; jl_f_top_eval; line: 406
                          36 profile.jl; anonymous; line: 14
                           36 .../generic.jl; peakflops; line: 324
                            8  .../generic.jl; peakflops; line: 69
                             8 ...g/matmul.jl; gemm_wrapper!; line: 285
                              8 linalg/blas.jl; gemm!; line: 526
                               8 ???; ???; line: 303835368
                                8 ???; ???; line: 305066586
                                 8 ???; ???; line: 305067283
                                  8 ???; ???; line: 306260534
                                   4 ???; ???; line: 305068667
                                    1 ???; ???; line: 325768730
                                    1 ???; ???; line: 325768756
                                    1 ???; ???; line: 325768832
                                    1 ???; ???; line: 325768852
                                   4 ???; ???; line: 305069442
                                    1 ???; ???; line: 325770121
                                    2 ???; ???; line: 325770131
                                    1 ???; ???; line: 325770157
                            28 .../generic.jl; peakflops; line: 326
                             28 random.jl; rand!; line: 129
                              2 ???; ???; line: 409812660
                              1 ???; ???; line: 409812752
                              1 ???; ???; line: 409812760
                              1 ???; ???; line: 409812777
                              1 ???; ???; line: 409812781
                              2 ???; ???; line: 409812825
                              3 ???; ???; line: 409812833
                              2 ???; ???; line: 409812841
1  ???; ???; line: 325754199
1  ???; ???; line: 325754246
1  ???; ???; line: 325754473
1  ???; ???; line: 325754590
... (many more lines)

@timholy
Copy link
Sponsor Member

timholy commented Aug 3, 2014

I noticed this on Linux ever since I upgraded to Kubuntu 14.04. It was better on 12.04.

@Keno Keno self-assigned this Aug 3, 2014
@Keno Keno closed this as completed in b26ff92 Aug 3, 2014
@timholy
Copy link
Sponsor Member

timholy commented Aug 3, 2014

It's working fine for me locally, and wow is this a nice improvement!

But very curiously it looks like this has Travis failures. It's not finding ucontext_t. Travis is running Ubuntu 12.04, so maybe the header files are different. Do we need to #include <ucontext.h>?

@Keno
Copy link
Member

Keno commented Aug 3, 2014

Fixed in the next commit.

@timholy
Copy link
Sponsor Member

timholy commented Aug 3, 2014

I should have checked. Thanks.

@staticfloat
Copy link
Sponsor Member Author

@Keno wow, this is so much better. Thank you!

Just one final note, there is still a single ??? line when running @profile peakflops():

julia> Profile.print(C=true)
30  ...sabae/src/julia/./julia; main; (unknown line)
 30 .../src/julia/src/init.c; julia_trampoline; line: 1002
  30 .../src/julia/src/task.c; jl_handle_stack_switch; line: 434
   30 .../src/julia/src/task.c; jl_switch_stack; line: 217
    30 .../src/julia/src/task.c; switch_stack; line: 209
     30 ...src/julia/src/task.c; start_task; line: 426
      30 .../julia/src/./julia.h; jl_apply; line: 975
       30 ...ulia/src/builtins.c; jl_trampoline; line: 786
        30 ...julia/src/./julia.h; jl_apply; line: 975
         30 task.jl; anonymous; line: 96
          30 ...src/julia/src/gf.c; jl_apply_generic; line: 1428
           30 ...ulia/src/./julia.h; jl_apply; line: 975
            30 ...ia/src/builtins.c; jl_trampoline; line: 786
             30 ...lia/src/./julia.h; jl_apply; line: 975
              30 ???; ???; line: 176047015
               29 REPL.jl; eval_user_input; line: 54
                29 ...a/src/builtins.c; jl_f_top_eval; line: 396
                 29 .../src/toplevel.c; jl_toplevel_eval; line: 516
...

Looking at show_backtrace(), I see there's some special handling of eval_user_input, could that have anything to do with this?

@JeffBezanson
Copy link
Sponsor Member

In the case where we print the instruction pointer, we should print ip = #### instead of line:. Handling unknown locations more gracefully will make it look much less buggy.

@Keno
Copy link
Member

Keno commented Aug 4, 2014

Yep, will do. I do think I can fix this one though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind:bug Indicates an unexpected problem or unintended behavior system:mac Affects only macOS
Projects
None yet
Development

No branches or pull requests

4 participants