Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Heap snapshot #42286

Closed
wants to merge 107 commits into from
Closed

Heap snapshot #42286

wants to merge 107 commits into from

Conversation

vilterp
Copy link
Contributor

@vilterp vilterp commented Sep 16, 2021

We expose a function GC.take_heap_snapshot(stream), which writes a heap snapshot in Chrome's .heapsnapshot JSON format to the given IO stream.

This can be loaded into Chrome Devtools' snapshot viewer to explore the heap and find memory leaks.

Usage

GC.take_heap_snapshot("snapshot.heapsnapshot")

Then open the Chrome Devtools, go to the "Memory" tab, hit the "load" button, choose the file, and you should see something like:
image

Implementation:

Piggybacks on the mark phase of the garbage collector, which is already traversing all the live objects in the heap. Creates (in C++) a Node object for every object in the heap, and an Edge object for every pointer from one object to another.

Mimicks Node.js/V8's heap snapshotter, which dumps into the same format: https://github.com/nodejs/node/blob/5fd7a72e1c4fbaf37d3723c4c81dce35c149dc84/deps/v8/src/profiler/heap-snapshot-generator.h#L509-L509

TODOS

  • testing
    • perf: demonstrate that this doesn't hurt perf when disabled (what benchmarks can we use?)
    • test: invoke this somewhere in the test suite, just to make sure it doesn't segfault?
    • see if object sizes sum up to Core.live_bytes()
  • features
    • make take_gc_snapshot call GC.gc within itself
    • make wrapper function in Julia so we don't have to use ccall
    • capture types of nodes with typeof
    • field names
    • array indices
    • values of strings and symbols
    • sizes for strings and symbols
    • sizes for arrays
    • record roots
      • tasks
      • stack frames
      • get sizes for tasks and stack frames
      • items in thread local storage
      • modules
    • idea: Can we indicate which GC Generation an object is in? Can we track that when we walk the heap?
    • docs: make sure there are examples of how to call and use it

Problems:

  • not sure if edges are accurate
  • seems to hang when loading into Chrome viewer
  • deadlocks sometimes (is GC getting triggered again during the GC we invoke?)
  • may be missing some objects (TODO: make sure we're recording nodes and edges every everywhere objprofile does)
  • object addresses (64 bit) may truncated when put into JS numbers
  • sometimes see objects with duplicate property names pointing at different addresses (probably as a result of not handling struct inlining)
  • sizes for arrays still isn't exactly right... (heap snapshot: Count the array size on the _buffer_ not on the array. vilterp/julia#3)
  • a bunch of compile warnings (missing type casts?)
  • Incorrectly reporting the name/index of edges to <malloc> objects: 387::<malloc>@3735923201
    • i think it's either a Hidden or Internal edge.
  • some root objects seem to be duplicated (e.g. current_task)

Co-Authored-By: @vchuravy
Co-Authored-By: @NHDaly

@JeffBezanson JeffBezanson added the GC Garbage collector label Sep 17, 2021
@vilterp
Copy link
Contributor Author

vilterp commented Sep 22, 2021

Would be interesting to think about how this relates to #31534 and its continuation, #33467

@jpsamaroo
Copy link
Member

jpsamaroo commented Sep 22, 2021

It would be great if we had a Julia-native parser and renderer for these snapshots so that we don't have to rely on a browser. Is there any intention of implementing that before this PR is merged? Or is it too complicated to be implemented directly in Julia Base/Stdlibs?

@vilterp
Copy link
Contributor Author

vilterp commented Sep 22, 2021

@jpsamaroo the snapshots are written as JSON (we'll write some notes about the format; had to reverse-engineer it); so a separate Julia package could parse and render them somehow. cc @NHDaly

@NHDaly
Copy link
Member

NHDaly commented Sep 22, 2021 via email

@vilterp
Copy link
Contributor Author

vilterp commented Sep 23, 2021

Looks like the Clang analyzer itself is crashing in analyzegc: https://buildkite.com/julialang/julia-master/builds/3944#30c92244-220b-4565-98eb-0f05921cd398/242-429 😦 Anyone know what's up with that?

Also: what performance benchmarks can we run on this to verify that it's not slowing down GC?

@maleadt
Copy link
Member

maleadt commented Sep 23, 2021

Looks like the snapshots can't be processed by https://heapviz.com/ yet.

Somewhat related: once this works, it would be nice to look into generating dumps for massif too, which has a pretty powerful visualizer that supports analyzing the heap over time (https://apps.kde.org/massif-visualizer/).

@vtjnash
Copy link
Sponsor Member

vtjnash commented Sep 23, 2021

I think it might be confused that _record_gc_edge is defined as JL_NOTSAFEPOINT, but not declared as JL_NOTSAFEPOINT.

static inline void _record_gc_edge(const char *node_type, const char *edge_type,
                                    jl_value_t *a, jl_value_t *b, size_t name_or_index) JL_NOTSAFEPOINT

Devtools Heap Snapshot viewer (.heapsnapshot extension), to the given
IO stream.
"""
function take_heap_snapshot(io)
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
function take_heap_snapshot(io)
function take_heap_snapshot(io::IOStream)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tried that earlier, but it seems not to be defined yet:

$ make
...
LoadError(at "sysimg.jl" line 3: LoadError(at "Base.jl" line 97: LoadError(at "gcutils.jl" line 119: UndefVarError(var=:IOStream))))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like this made the builds fail, e.g. https://build.julialang.org/#/builders/63/builds/3778/steps/8/logs/stdio

either we need to get IOStream in scope somehow, or take the annotation back off

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

taking the annotation back off until we can figure out how to get it in scope in this file… I guess we have to figure out what file IOStream is defined in and make sure it gets included before this

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think this one has been resolved, yeah?

Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still may be good to acquire the internal lock on io here first also

@LarkAnspach

This comment has been minimized.

Copy link
Sponsor Member

@vtjnash vtjnash left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(partial review)

src/gc-debug.c Show resolved Hide resolved
Comment on lines 2584 to +2588
else if (flags.how == 3) {
jl_value_t *owner = jl_array_data_owner(a);
uintptr_t nptr = (1 << 2) | (bits & GC_OLD);
// TODO: Keep an eye on the edge type here, we're _pretty sure_ it's right..
gc_heap_snapshot_record_internal_edge(new_obj, owner);
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, it is a reshape of a different array (the "owner"), so we can draw the graph * -> a -> owner -> data -> *

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

huh, so maybe this should just be a normal edge then? We probably need to figure out more precisely what all the edge types mean.

src/gc-heap-snapshot.h Outdated Show resolved Hide resolved
src/gc-heap-snapshot.cpp Outdated Show resolved Hide resolved
src/gc-heap-snapshot.cpp Show resolved Hide resolved
src/gc-heap-snapshot.cpp Outdated Show resolved Hide resolved


struct StringTable {
typedef unordered_map<string, size_t> MapType;
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
typedef unordered_map<string, size_t> MapType;
typedef llvm::StringMap<size_t> MapType;

(gives higher performance https://llvm.org/docs/ProgrammersManual.html#llvm-adt-stringmap-h)

src/gc-heap-snapshot.cpp Outdated Show resolved Hide resolved
Comment on lines +137 to +123
public:

// private:
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
public:
// private:

// private:

src/gc-heap-snapshot.cpp Outdated Show resolved Hide resolved
src/gc-heap-snapshot.cpp Outdated Show resolved Hide resolved
src/gc-heap-snapshot.cpp Outdated Show resolved Hide resolved
src/gc-heap-snapshot.cpp Show resolved Hide resolved
src/gc-heap-snapshot.cpp Show resolved Hide resolved
src/gc-heap-snapshot.cpp Outdated Show resolved Hide resolved
Comment on lines +224 to +213
} else if (type == (jl_datatype_t*)jl_malloc_tag) {
name = "<malloc>";
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just a sentinel value for a. It cannot show up here.

Suggested change
} else if (type == (jl_datatype_t*)jl_malloc_tag) {
name = "<malloc>";

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that true? We saw it's checked in objprofile_print, which is what we copied it from:

julia/src/gc-debug.c

Lines 688 to 689 in e9960d5

else if (ty == jl_malloc_tag)
jl_safe_printf("#<malloc>");

@@ -2403,12 +2422,16 @@ module_binding: {
}
void *vb = jl_astaggedvalue(b);
verify_parent1("module", binding->parent, &vb, "binding_buff");
// Record the size used for the box for non-const bindings
gc_heap_snapshot_record_internal_edge(binding->parent, b);
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

b isn't a jl_value_t, so this should fail to compile here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

weirdly, it prints a warning here, not an error. We're not sure why. It's on our todo list to deal with this.

Copy link
Member

@NHDaly NHDaly Sep 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(hehe i think we were just gonna cast it to (jl_value_t*) though)

@@ -2540,6 +2563,7 @@ mark: {
if (flags.how == 1) {
void *val_buf = jl_astaggedvalue((char*)a->data - a->offset * a->elsize);
verify_parent1("array", new_obj, &val_buf, "buffer ('loc' addr is meaningless)");
gc_heap_snapshot_record_internal_edge(new_obj, jl_valueof(val_buf));
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

val_buf is not a jl_value_t. You can probably charge this size directly against a (e.g. with gc_heap_snapshot_record_hidden_edge, or maybe better yet, when computing the summarysize of a in record_node_to_gc_snapshot)

Comment on lines +222 to +211
} else if (type == (jl_datatype_t*)jl_buff_tag) {
name = "<buffer>";
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These can only show up in very specific contexts (jl_binding_t and jl_array_t->data), so it might be better to just avoid passing those here and instead charge their size directly to their parent.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we considered doing that, too, but we figured it makes more sense to report them as "internal" edges, which (we think) tells the javascript frontend that there's some object there, but it's not a normal object and it's opaque. That seems like it's the right fit for this kind of value, too? That way we can still ask questions like "what is the size of all the array buffers," in case that was something we were interested in, i guess?

src/gc-heap-snapshot.cpp Outdated Show resolved Hide resolved
@vilterp
Copy link
Contributor Author

vilterp commented Sep 30, 2021

@jpsamaroo I wrote a small pure-Julia parser here :) https://github.com/vilterp/HeapSnapshotParser.jl

src/gc.c Outdated Show resolved Hide resolved
src/gc.c Outdated Show resolved Hide resolved
src/gc.c Show resolved Hide resolved
Copy link
Contributor Author

@vilterp vilterp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added the representation of stacks discussed with @vchuravy and @NHDaly; have a couple questions

snapshot->node_types.find_or_create_string_id("synthetic"),
"(stack frame)", // name
(size_t)frame, // id
1, // size
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vchuravy how should I get the size of a stack frame? (ofc it's jl_gcframe_t, not the whole stack frame). frame->nroots times the size of a pointer or something? 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, want to attribute the size of the stack overall to the task… how do we get the size of the entire stack? 🤔 you said it's fixed now, but it may not always be

Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can get the size of the stack from when copystack. See the gc mark part for task.

I think size of the jl_gcframe_t is (frame->nroots + 2)*sizeof()void*, but you could be double counting if you also measure size of stack.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but you could be double counting if you also measure size of stack

Yeah i was gonna say the same thing. Just doing the size of the whole stack seems reasonable, yeah

@vilterp
Copy link
Contributor Author

vilterp commented Oct 7, 2021

Also, Q for @vtjnash or @vchuravy: can we invoke GC.take_heap_snapshot somewhere in the test suite, to make sure it doesn't segfault or deadlock as people make changes to the code around it? Where would be a good place to invoke it?

gc_setmark_buf_(ptls, stkbuf, bits, ta->bufsz);
// TODO: attribute size of stack
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here you can add a internal edge to count the ta->bufsz against the task.

src/gc.c Show resolved Hide resolved
@@ -3065,8 +3077,10 @@ static int _jl_gc_collect(jl_ptls_t ptls, jl_gc_collection_t collection)
// 2.1. mark every object in the `last_remsets` and `rem_binding`
jl_gc_queue_remset(gc_cache, &sp, ptls2);
// 2.2. mark every thread local root
// TODO: treat these as roots
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// TODO: treat these as roots

@@ -3065,8 +3077,10 @@ static int _jl_gc_collect(jl_ptls_t ptls, jl_gc_collection_t collection)
// 2.1. mark every object in the `last_remsets` and `rem_binding`
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we could add:

gc_heap_snapshot_record_root_other(&jl_all_tls_states, "jl_all_tls_states");
for (int t_i = 0; t_i < jl_n_threads; t_i++) {
        jl_ptls_t ptls2 = jl_all_tls_states[t_i];
        gc_heap_snapshot_record_edge_other(&jl_all_tls_states, &ptls2);
        ....

Or something like that.

@@ -646,7 +646,7 @@ static int mark_reset_age = 0;

static int64_t scanned_bytes; // young bytes scanned while marking
static int64_t perm_scanned_bytes; // old bytes scanned while marking
static int prev_sweep_full = 1;
int prev_sweep_full = 1;
Copy link
Contributor Author

@vilterp vilterp Nov 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems that this is always set to 1? i.e. set to 1 here and never changed anywhere else? so it'd have no effect

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no i think it's set, here:
https://github.com/vilterp/julia/blob/2b2c3ab4db8c35f4deb8a44330dc765e73e61647/src/gc.c#L3261

Jameson suggested this change. the idea is to make sure that we're only doing the GC Mark edge-reporting during the second GC, since our call to jl_gc_collect(JL_GC_FULL) does two mark+sweeps: 1. mark, 2. sweep FULL, 3. mark, 4. sweep INCREMENTAL. Step 3 will be a full mark, because all mark bits have been reset. :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh I see, this variable was already defined here (and set elsewhere, before this commit), but you added an extern declaration to heap-snapshot.h so we could check it there… gotcha. wasn't reading the diff close enough. nice change!

@vilterp
Copy link
Contributor Author

vilterp commented Dec 9, 2021

Rebased onto master; appears to still work (we need more tests lol)

@vilterp
Copy link
Contributor Author

vilterp commented Dec 9, 2021

@NHDaly seems like the rebase broke fieldpath_for_slot_helper… I'm seeing this on the backports to release-1.6 and release-1.7 too. Let's debug sometime; maybe I messed something up due to git conflicts.

signal (11): Segmentation fault: 11
in expression starting at REPL[2]:1
union_isinlinable at /Users/vilterp/code/julia/src/datatype.c:269
ijl_islayout_inline at /Users/vilterp/code/julia/src/datatype.c:301 [inlined]
ijl_stored_inline at /Users/vilterp/code/julia/src/datatype.c:308
_fieldpath_for_slot_helper at /Users/vilterp/code/julia/src/gc-heap-snapshot.cpp:294
_fieldpath_for_slot_helper at /Users/vilterp/code/julia/src/gc-heap-snapshot.cpp:295
_fieldpath_for_slot_helper at /Users/vilterp/code/julia/src/gc-heap-snapshot.cpp:295
_fieldpath_for_slot at /Users/vilterp/code/julia/src/gc-heap-snapshot.cpp:313
_gc_heap_snapshot_record_object_edge at /Users/vilterp/code/julia/src/gc-heap-snapshot.cpp:420
gc_heap_snapshot_record_object_edge at /Users/vilterp/code/julia/src/./gc-heap-snapshot.h:68 [inlined]
gc_mark_scan_obj8 at /Users/vilterp/code/julia/src/gc.c:1962 [inlined]
gc_mark_loop at /Users/vilterp/code/julia/src/gc.c:2265
_jl_gc_collect at /Users/vilterp/code/julia/src/gc.c:3094
ijl_gc_collect at /Users/vilterp/code/julia/src/gc.c:3320
jl_gc_take_heap_snapshot at /Users/vilterp/code/julia/src/gc-heap-snapshot.cpp:161
take_heap_snapshot at ./gcutils.jl:116 [inlined]

@NHDaly NHDaly assigned NHDaly and unassigned NHDaly Dec 20, 2021
@IanButterworth
Copy link
Sponsor Member

Any chance this could sneak into 1.8?

@tkf
Copy link
Member

tkf commented Feb 3, 2022

  • object addresses (64 bit) may truncated when put into JS numbers

https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Number/MAX_SAFE_INTEGER says

The Number.MAX_SAFE_INTEGER constant represents the maximum safe integer in JavaScript (2^53 - 1).

x86-64 uses 48 bits virtual address space and ARM uses up to 52 bits, IIUC https://developer.arm.com/documentation/101811/0101/Address-spaces-in-AArch64

So, isn't JS number enough for the memory addresses (for now)?

@IanButterworth
Copy link
Sponsor Member

IanButterworth commented Feb 3, 2022

Just a quick note. I just tried rebasing this on current master, and gave it a go, and it worked at first after boot, but failed with similar errors to #42286 (comment) after running some intensive code.

It might be helpful to note that I did have to use Chrome Canary, because the "load" button on the memory tab did nothing on latest chrome dev tools.

I guess the segfault might be due to type mismatches reported during build:

/Users/ian/Documents/GitHub/julia/src/gc.c:1962:49: warning: incompatible pointer types passing 'char *' to parameter of type 'jl_value_t *' (aka 'struct _jl_value_t *') [-Wincompatible-pointer-types]
            gc_heap_snapshot_record_object_edge(parent, *slot, slot);
                                                ^~~~~~
./gc-heap-snapshot.h:66:68: note: passing argument to parameter 'from' here
static inline void gc_heap_snapshot_record_object_edge(jl_value_t *from, jl_value_t *to, void* slot) JL_NOTSAFEPOINT {
                                                                   ^
/Users/ian/Documents/GitHub/julia/src/gc.c:1998:49: warning: incompatible pointer types passing 'char *' to parameter of type 'jl_value_t *' (aka 'struct _jl_value_t *') [-Wincompatible-pointer-types]
            gc_heap_snapshot_record_object_edge(parent, *slot, slot);
                                                ^~~~~~
./gc-heap-snapshot.h:66:68: note: passing argument to parameter 'from' here
static inline void gc_heap_snapshot_record_object_edge(jl_value_t *from, jl_value_t *to, void* slot) JL_NOTSAFEPOINT {
                                                                   ^
/Users/ian/Documents/GitHub/julia/src/gc.c:2033:49: warning: incompatible pointer types passing 'char *' to parameter of type 'jl_value_t *' (aka 'struct _jl_value_t *') [-Wincompatible-pointer-types]
            gc_heap_snapshot_record_object_edge(parent, *slot, slot);
                                                ^~~~~~
./gc-heap-snapshot.h:66:68: note: passing argument to parameter 'from' here
static inline void gc_heap_snapshot_record_object_edge(jl_value_t *from, jl_value_t *to, void* slot) JL_NOTSAFEPOINT {
                                                                   ^
/Users/ian/Documents/GitHub/julia/src/gc.c:2426:51: warning: incompatible pointer types passing 'jl_module_t *' (aka 'struct _jl_module_t *') to parameter of type 'jl_value_t *' (aka 'struct _jl_value_t *') [-Wincompatible-pointer-types]
            gc_heap_snapshot_record_internal_edge(binding->parent, b);
                                                  ^~~~~~~~~~~~~~~
./gc-heap-snapshot.h:71:70: note: passing argument to parameter 'from' here
static inline void gc_heap_snapshot_record_internal_edge(jl_value_t *from, jl_value_t *to) JL_NOTSAFEPOINT {
                                                                     ^
/Users/ian/Documents/GitHub/julia/src/gc.c:2426:68: warning: incompatible pointer types passing 'jl_binding_t *' to parameter of type 'jl_value_t *' (aka 'struct _jl_value_t *') [-Wincompatible-pointer-types]
            gc_heap_snapshot_record_internal_edge(binding->parent, b);
                                                                   ^
./gc-heap-snapshot.h:71:88: note: passing argument to parameter 'to' here
static inline void gc_heap_snapshot_record_internal_edge(jl_value_t *from, jl_value_t *to) JL_NOTSAFEPOINT {
                                                                                       ^
/Users/ian/Documents/GitHub/julia/src/gc.c:2814:34: warning: incompatible pointer types passing '_Atomic(struct _jl_task_t *)' to parameter of type 'jl_value_t *' (aka 'struct _jl_value_t *') [-Wincompatible-pointer-types]
    gc_heap_snapshot_record_root(ptls2->current_task, "current task");
                                 ^~~~~~~~~~~~~~~~~~~
./gc-heap-snapshot.h:51:61: note: passing argument to parameter 'root' here
static inline void gc_heap_snapshot_record_root(jl_value_t *root, char *name) JL_NOTSAFEPOINT {
                                                            ^
/Users/ian/Documents/GitHub/julia/src/gc.c:2816:34: warning: incompatible pointer types passing '_Atomic(struct _jl_task_t *)' to parameter of type 'jl_value_t *' (aka 'struct _jl_value_t *') [-Wincompatible-pointer-types]
    gc_heap_snapshot_record_root(ptls2->current_task, "root task");
                                 ^~~~~~~~~~~~~~~~~~~
./gc-heap-snapshot.h:51:61: note: passing argument to parameter 'root' here
static inline void gc_heap_snapshot_record_root(jl_value_t *root, char *name) JL_NOTSAFEPOINT {
                                                            ^
/Users/ian/Documents/GitHub/julia/src/gc.c:2819:38: warning: incompatible pointer types passing '_Atomic(struct _jl_task_t *)' to parameter of type 'jl_value_t *' (aka 'struct _jl_value_t *') [-Wincompatible-pointer-types]
        gc_heap_snapshot_record_root(ptls2->current_task, "next task");
                                     ^~~~~~~~~~~~~~~~~~~
./gc-heap-snapshot.h:51:61: note: passing argument to parameter 'root' here
static inline void gc_heap_snapshot_record_root(jl_value_t *root, char *name) JL_NOTSAFEPOINT {
                                                            ^
/Users/ian/Documents/GitHub/julia/src/gc.c:2823:38: warning: incompatible pointer types passing '_Atomic(struct _jl_task_t *)' to parameter of type 'jl_value_t *' (aka 'struct _jl_value_t *') [-Wincompatible-pointer-types]
        gc_heap_snapshot_record_root(ptls2->current_task, "previous task");
                                     ^~~~~~~~~~~~~~~~~~~
./gc-heap-snapshot.h:51:61: note: passing argument to parameter 'root' here
static inline void gc_heap_snapshot_record_root(jl_value_t *root, char *name) JL_NOTSAFEPOINT {
                                                            ^
/Users/ian/Documents/GitHub/julia/src/gc.c:2827:38: warning: incompatible pointer types passing '_Atomic(struct _jl_task_t *)' to parameter of type 'jl_value_t *' (aka 'struct _jl_value_t *') [-Wincompatible-pointer-types]
        gc_heap_snapshot_record_root(ptls2->current_task, "previous exception");
                                     ^~~~~~~~~~~~~~~~~~~
./gc-heap-snapshot.h:51:61: note: passing argument to parameter 'root' here
static inline void gc_heap_snapshot_record_root(jl_value_t *root, char *name) JL_NOTSAFEPOINT {
                                                            ^
/Users/ian/Documents/GitHub/julia/src/gc.c:2839:34: warning: incompatible pointer types passing 'jl_module_t *' (aka 'struct _jl_module_t *') to parameter of type 'jl_value_t *' (aka 'struct _jl_value_t *') [-Wincompatible-pointer-types]
    gc_heap_snapshot_record_root(jl_main_module, "main_module");
                                 ^~~~~~~~~~~~~~
./gc-heap-snapshot.h:51:61: note: passing argument to parameter 'root' here
static inline void gc_heap_snapshot_record_root(jl_value_t *root, char *name) JL_NOTSAFEPOINT {
                                                            ^
/Users/ian/Documents/GitHub/julia/src/gc.c:2861:42: warning: incompatible pointer types passing 'jl_typemap_entry_t *' (aka 'struct _jl_typemap_entry_t *') to parameter of type 'jl_value_t *' (aka 'struct _jl_value_t *') [-Wincompatible-pointer-types]
            gc_heap_snapshot_record_root(v, "type_map");
                                         ^
./gc-heap-snapshot.h:51:61: note: passing argument to parameter 'root' here
static inline void gc_heap_snapshot_record_root(jl_value_t *root, char *name) JL_NOTSAFEPOINT {
                                                            ^
/Users/ian/Documents/GitHub/julia/src/gc.c:2866:38: warning: incompatible pointer types passing 'jl_array_t *' to parameter of type 'jl_value_t *' (aka 'struct _jl_value_t *') [-Wincompatible-pointer-types]
        gc_heap_snapshot_record_root(jl_all_methods, "all_methods");
                                     ^~~~~~~~~~~~~~
./gc-heap-snapshot.h:51:61: note: passing argument to parameter 'root' here
static inline void gc_heap_snapshot_record_root(jl_value_t *root, char *name) JL_NOTSAFEPOINT {

Update: It didn't segfault with a minimal MWE while investigating the issue here #42566 (comment)

base/gcutils.jl Outdated Show resolved Hide resolved
@vtjnash
Copy link
Sponsor Member

vtjnash commented Feb 3, 2022

@IanButterworth there seems to be a moderate number of TODO checkboxes left, and it needs to be gotten to a non-draft state where it is ready for review. There is about a week until the branch for that to get in though.

@vilterp
Copy link
Contributor Author

vilterp commented Feb 3, 2022

Hey @IanButterworth, glad you were able to get it to run, at least in a simple case!

@NHDaly and I have been focusing on finishing the allocation profiler (#43868) for 1.8, so we've put this on the back burner. There's the segfault we've been running into in the field path code (having to do with inlined structs), plus some other check boxes on the list.

@IanButterworth
Copy link
Sponsor Member

I'm being greedy. Both are really helpful. Thanks for taking them both on!

If I can spare free time to helping debug the segfault here I'll try.

@NHDaly
Copy link
Member

NHDaly commented Feb 4, 2022

Jeff told me that Christine was very interested in this too.

@chflood: if you would like to collaborate on the Heap Snapshot tool, we'd love to get some help with it! There are just so many code paths in GC and little edge cases that kept us stuck on this PR.

@IanButterworth: yay, thanks for the help! Excited to get more eyes and arms on it! 💪

@vilterp
Copy link
Contributor Author

vilterp commented Feb 7, 2022

Hey @chflood, thanks for checking it out :) Your invocation looks fine — did your make -j succeed on the branch? 🤔 Also feel free to ping me on the Julia Slack for further debugging; happy to work with you on this!

@NHDaly
Copy link
Member

NHDaly commented Feb 11, 2022

I've added this idea to the PR description:

  • idea: Can we indicate which GC Generation an object is in? Can we track that when we walk the heap?

gc_mark_queue_obj(gc_cache, sp, ptls2->previous_exception);
gc_heap_snapshot_record_root(ptls2->current_task, "previous exception");
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this be gc_heap_snapshot_record_root(ptls2->previous_exception, "previous exception");. Similar comment for the lines above.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, i think you're right! 👍 👍 👍 Thanks @whatsthecraic

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call. Fixed in e732d48.

src/gc.c Show resolved Hide resolved
@apaz-cli apaz-cli mentioned this pull request Sep 22, 2022
@vchuravy
Copy link
Sponsor Member

Closed in favor of #46862

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
GC Garbage collector
Projects
None yet
Development

Successfully merging this pull request may close these issues.