Minimize tracing-tracy allocations #86

SUPERCILEX · 2023-12-26T11:26:09Z

~~Depends on #85. Will rebase once that's in.~~ Done.

~~Note that the two commits should NOT be squashed as the second one contains benchmarking code.~~

Turns out GitHub saves the PR history even when squashing, so nevermind since I feel like these commits are messier.

The one major downside of this caching is that for long-running applications which occasionally have super long field values, we have no way of reclaiming that memory. I think that's ok until someone actually runs into this issue and can tell us more about their needs. For example, maybe we offer a "clear cache" API that they could run? Or maybe they need to configure the max cache size? Not sure, so probs easiest to wait unless you think this needs to be solved (in which case I'd rather do that in a follow up PR). TL;DR: the memory usage currently scales as O(num_active_spans*max_field_value_length).

The results are in: almost double the performance!

Note that I'm testing this with #87 as I'll deal with the format! allocation once we have a path forward there.

~> tracy-capture -o ~/Desktop/ftzz4.tracy -f
Connecting to 127.0.0.1:8086...
Queue delay: 19 ns
Timer resolution: 19 ns
 471.09 Mbps / 18.8% =2501.45 Mbps | Tx: 563 MB | 1020.05 MB | 10.21 s
Frames: 2
Time span: 10.25 s
Zones: 4,033,823
Elapsed time: 9.93 s
Saving trace... done!
Trace size 282.4 MB (37.28% ratio)
~> tracy-capture -o ~/Desktop/ftzz5.tracy -f
Connecting to 127.0.0.1:8086...
Queue delay: 21 ns
Timer resolution: 21 ns
 498.87 Mbps / 18.2% =2744.67 Mbps | Tx: 366.32 MB | 722.33 MB | 6.67 s
Frames: 2
Time span: 6.69 s
Zones: 4,033,813
Elapsed time: 6.41 s
Saving trace... done!
Trace size 193.94 MB (34.32% ratio)

Before:

After:

…locals: #86 Signed-off-by: Alex Saveau <[email protected]>

Signed-off-by: Alex Saveau <[email protected]>

Benchmarking code: ``` use std::cell::{Cell, RefCell, UnsafeCell}; use std::env::args_os; use std::ffi::OsStr; fn a() { thread_local! { static TLS: RefCell<Vec<usize>> = const { RefCell::new(Vec::new()) }; } for i in 0..10000 { TLS.with(|s| { s.borrow_mut().push(i); }); } TLS.with(|s| s.borrow_mut().clear()); } fn b() { thread_local! { static TLS: RefCell<Vec<usize>> = const { RefCell::new(Vec::new()) }; } for i in 0..10000 { TLS.with(|s| { unsafe { s.try_borrow_mut().unwrap_unchecked() }.push(i); }); } TLS.with(|s| unsafe { s.try_borrow_mut().unwrap_unchecked() }.clear()); } fn c() { thread_local! { static TLS: Cell<Vec<usize>> = const { Cell::new(Vec::new()) }; } for i in 0..10000 { TLS.with(|s| { let mut v = s.take(); v.push(i); s.set(v); }); } TLS.with(|s| { let mut v = s.take(); v.clear(); s.set(v); }); } fn d() { thread_local! { static TLS: UnsafeCell<Vec<usize>> = const { UnsafeCell::new(Vec::new()) }; } for i in 0..10000 { TLS.with(|s| { unsafe { &mut *s.get() }.push(i); }); } TLS.with(|s| unsafe { &mut *s.get() }.clear()); } fn e() { thread_local! { static TLS: UnsafeCell<Vec<usize>> = UnsafeCell::new(Vec::new()); } for i in 0..10000 { TLS.with(|s| { unsafe { &mut *s.get() }.push(i); }); } TLS.with(|s| unsafe { &mut *s.get() }.clear()); } fn f() { thread_local! { static TLS: RefCell<Vec<usize>> = RefCell::new(Vec::new()); } for i in 0..10000 { TLS.with(|s| { s.borrow_mut().push(i); }); } TLS.with(|s| s.borrow_mut().clear()); } fn main() { let cmd = args_os().nth(1).unwrap(); for _ in 0..1000 { if cmd == OsStr::new("refcell") { a(); } else if cmd == OsStr::new("refcell_unsafe") { b(); } else if cmd == OsStr::new("cell") { c(); } else if cmd == OsStr::new("unsafecell") { d(); } else if cmd == OsStr::new("unsafecell_non_const") { e(); } else if cmd == OsStr::new("refcell_non_const") { f(); } else { panic!(); } } } ``` Benchmarks: ``` > hyperfine -N "./foo refcell" "./foo refcell_unsafe" "./foo cell" "./foo unsafecell" "./foo unsafecell_non_const" "./foo refcell_non_const" Benchmark 1: ./foo refcell Time (mean ± σ): 17.2 ms ± 0.6 ms [User: 16.8 ms, System: 0.3 ms] Range (min … max): 15.0 ms … 21.3 ms 143 runs Benchmark 2: ./foo refcell_unsafe Time (mean ± σ): 18.1 ms ± 0.3 ms [User: 17.7 ms, System: 0.3 ms] Range (min … max): 16.6 ms … 21.0 ms 167 runs Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options. Benchmark 3: ./foo cell Time (mean ± σ): 36.0 ms ± 0.9 ms [User: 35.9 ms, System: 0.1 ms] Range (min … max): 34.3 ms … 42.7 ms 81 runs Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options. Benchmark 4: ./foo unsafecell Time (mean ± σ): 15.2 ms ± 1.1 ms [User: 14.7 ms, System: 0.3 ms] Range (min … max): 14.2 ms … 19.2 ms 210 runs Benchmark 5: ./foo unsafecell_non_const Time (mean ± σ): 15.5 ms ± 0.3 ms [User: 15.0 ms, System: 0.4 ms] Range (min … max): 15.0 ms … 16.3 ms 193 runs Benchmark 6: ./foo refcell_non_const Time (mean ± σ): 25.9 ms ± 5.3 ms [User: 25.5 ms, System: 0.3 ms] Range (min … max): 17.9 ms … 42.2 ms 146 runs Summary './foo unsafecell' ran 1.02 ± 0.08 times faster than './foo unsafecell_non_const' 1.13 ± 0.09 times faster than './foo refcell' 1.19 ± 0.09 times faster than './foo refcell_unsafe' 1.71 ± 0.37 times faster than './foo refcell_non_const' 2.37 ± 0.18 times faster than './foo cell' ``` Signed-off-by: Alex Saveau <[email protected]>

Signed-off-by: Alex Saveau <[email protected]>

…there—go back to TLS, but use the new VecCell abstraction Signed-off-by: Alex Saveau <[email protected]>

Signed-off-by: Alex Saveau <[email protected]>

SUPERCILEX · 2023-12-27T22:13:37Z

Ok, this should be ready for review. I've updated the PR description with results.

tracing-tracy/src/lib.rs

SUPERCILEX · 2023-12-30T19:43:23Z

tracing-tracy/src/lib.rs

 fn record_bool(&mut self, field: &Field, value: bool) {
 match (value, field.name()) {
- (true, "tracy.frame_mark") => self.frame_mark = true,
- _ => self.record_debug(field, &value),
+ (_, "tracy.frame_mark") => self.frame_mark = value,


I changed this to set the value... was the previous behavior intentional?

In principle a field name for an event can occur only once. I don’t particularly care either way, but tracy.frame_mark = false does not really have any reason to show up in the user code… so it seemed to make sense to not invoke any special logic on it? Unless I guess they want to make it dynamic somehow? 🤷

Unless I guess they want to make it dynamic somehow?

Yeah that's all I could think of. It just felt wrong to me to take in a bool and ignore it. I could see it as being confusing behavior to track down if you're expecting false to work for whatever reason.

tracing-tracy/src/lib.rs

Signed-off-by: Alex Saveau <[email protected]>

tracing-tracy/src/lib.rs

nagisa

In principle LGTM, last few nits and comments inline. Sorry for not spotting them last time around.

Signed-off-by: Alex Saveau <[email protected]>

SUPERCILEX mentioned this pull request Dec 27, 2023

Add option to emit fields as text associated with span #87

Merged

nagisa pushed a commit that referenced this pull request Dec 27, 2023

Bump minimum Rust version to 1.70 as we'll need it for better thread …

4040a1d

…locals: #86 Signed-off-by: Alex Saveau <[email protected]>

SUPERCILEX added 5 commits December 27, 2023 22:32

Remove TracyEventFieldVisitor allocations

866b6b3

Signed-off-by: Alex Saveau <[email protected]>

Move away from TLS for caching since we can use self

fa0656a

Signed-off-by: Alex Saveau <[email protected]>

Never mind, tracing layers must be Sync, so you can't hold a cell in …

4673171

…there—go back to TLS, but use the new VecCell abstraction Signed-off-by: Alex Saveau <[email protected]>

Cache span field buffers

6f0c24b

Signed-off-by: Alex Saveau <[email protected]>

SUPERCILEX changed the title ~~Minimize TracyEventFieldVisitor allocations~~ Minimize tracing-tracy allocations Dec 27, 2023

nagisa reviewed Dec 27, 2023

View reviewed changes

SUPERCILEX commented Dec 30, 2023

View reviewed changes

nagisa reviewed Dec 30, 2023

View reviewed changes

tracing-tracy/src/lib.rs Outdated Show resolved Hide resolved

SUPERCILEX added 5 commits December 30, 2023 13:34

Use plain Vec for span stack since it just uses {push,pop}_back

d8e6544

Signed-off-by: Alex Saveau <[email protected]>

Improve caching abstractions

9e042da

Signed-off-by: Alex Saveau <[email protected]>

Implement cache size upper bound enforcement

8148548

Signed-off-by: Alex Saveau <[email protected]>

Optimize visitor implementations

8d77a85

Signed-off-by: Alex Saveau <[email protected]>

Default cache size to 8K and trim largest objects on memory pressure

35b68de

Signed-off-by: Alex Saveau <[email protected]>

nagisa reviewed Jan 1, 2024

View reviewed changes

tracing-tracy/src/lib.rs Outdated Show resolved Hide resolved

nagisa reviewed Jan 1, 2024

View reviewed changes

tracing-tracy/src/lib.rs Show resolved Hide resolved

nagisa approved these changes Jan 1, 2024

View reviewed changes

Address review comments

cac08aa

Signed-off-by: Alex Saveau <[email protected]>

nagisa merged commit 0684979 into nagisa:main Jan 1, 2024
36 checks passed

SUPERCILEX deleted the alloc branch January 1, 2024 18:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Minimize tracing-tracy allocations #86

Minimize tracing-tracy allocations #86

SUPERCILEX commented Dec 26, 2023 •

edited

Loading

SUPERCILEX commented Dec 27, 2023

SUPERCILEX Dec 30, 2023

nagisa Jan 1, 2024

SUPERCILEX Jan 1, 2024

nagisa left a comment •

edited

Loading

Minimize tracing-tracy allocations #86

Minimize tracing-tracy allocations #86

Conversation

SUPERCILEX commented Dec 26, 2023 • edited Loading

SUPERCILEX commented Dec 27, 2023

SUPERCILEX Dec 30, 2023

Choose a reason for hiding this comment

nagisa Jan 1, 2024

Choose a reason for hiding this comment

SUPERCILEX Jan 1, 2024

Choose a reason for hiding this comment

nagisa left a comment • edited Loading

Choose a reason for hiding this comment

SUPERCILEX commented Dec 26, 2023 •

edited

Loading

nagisa left a comment •

edited

Loading