RFC: Avoid double-indirection of Arc<Vec<u8>> by using Arc<[u8]> directly #803

adamreichold · 2024-05-04T09:02:52Z

The second commit removes any per-access overhead to WritablePage but at the cost of some unsafe code relying on the uniqueness that was asserted once by calling Arc::try_unwrap before this change.

cberner · 2024-05-08T03:16:33Z

I'm not seeing a speed-up in https://github.com/cberner/redb/blob/master/benches/lmdb_benchmark.rs

Is there a benchmark where you were able to measure a significant speed-up from this?

adamreichold · 2024-05-08T05:14:35Z

I'm not seeing a speed-up in https://github.com/cberner/redb/blob/master/benches/lmdb_benchmark.rs

Is there a benchmark where you were able to measure a significant speed-up from this?

This was not driven by benchmarks for me. I just stumbled over this while looking something else up in the code.

I would say that there is usually never a reason to use Arc<Vec<T>> instead of Arc<[T]> as the Arc layer implies shared access. For redb, these pages are also never resized so that even with temporary mutable access, the unused capacity of the Vec is not necessary. So for me, this was this was strictly a simplification.

Of course, the situation is a bit more complex here as the WritablePage depends on mutable access through the Arc which I would agree that it is not a trivial change.

So if there are benchmark changes, I would expect them to be positive, but this was not my motivation. I would sort any performance changes under "distributed fat" as I would expect increased TLB/cache pressure from the additional indirection and possible heap fragmentation.

cberner · 2024-05-10T03:29:29Z

Got it, I see. The first commit looks good then, but I don't want to add the unsafe commit unless there's a really clear performance benefit. I'd like to remove the last of the unsafe code from redb, but need to get file lock support into std to replace the libc usage.

Want to remove the second commit, and I'll merge the first?

adamreichold · 2024-05-10T06:08:22Z

Want to remove the second commit, and I'll merge the first?

I did put that into a separate commit to make it easy to drop. I added it all because Arc:get_mut is then called for each mutable page access and I feared that this would actually impact performance. Did you already run the benchmark without the last commit? If so, I'll just drop.

Otherwise, I'll do that and if it has an impact I'll try to figure something to avoid that without unsafe.

but need to get file lock support into std to replace the libc usage.

Not sure if you want to take on another dependency, but rustix does have safe file locking: https://docs.rs/rustix/latest/rustix/fs/fn.flock.html (rustix is really widely used and almost similar to libc in scope. It is also already a transitive dev-dependency via tempfile. So while not obvious, I think it might make sense to replace this.)

adamreichold · 2024-05-10T06:20:25Z

Not sure if you want to take on another dependency, but rustix does have safe file locking: https://docs.rs/rustix/latest/rustix/fs/fn.flock.html (rustix is really widely used and almost similar to libc in scope. It is also already a transitive dev-dependency via tempfile. So while not obvious, I think it might make sense to replace this.)

Ok, I tried but while it does work, there is still Windows which is out of scope for rustix and F_BARRIERSYNC on macOS which it does not yet support.

adamreichold · 2024-05-10T06:55:42Z

I dropped the last commit after running the lmdb_benchmark on my notebook. The results are somewhat noisy so I think you'll want to run this again yourself to make sure. I do see small but consistent improvements for random reads, and without the second commit I rarely see small regressions for writes, but not always.

As I said, it is quite noisy and I am not sure what to make of it. (My hardware especially the disk are decidedly consumer-grade.)

…t traffic in its Drop impl.

adamreichold · 2024-05-10T07:10:59Z

Ah, I found a way to at least avoid the option dance without unsafe code at the cost of an otherwise unnecessary reference increment-decrement-pair. Pushed that as a separate commit.

adamreichold · 2024-05-10T07:18:53Z

Ok, I also reran the benchmark after changing it to run from a ramdisk to take my ~~shitty~~ reasonably-priced disk out of the picture and tease out the CPU overhead of managing the cache.

With that I am now reasonably confident that this does not introduce undue overhead and yields small but consistent improvements:

	PR	master
bulk load	2673ms	2791ms
individual writes	3ms	3ms
batch writes	1291ms	1299ms
len()	0ms	0ms
random reads	1137ms	1335ms
random reads	1026ms	1203ms
random range reads	2864ms	3080ms
random range reads	2861ms	3069ms
random reads (4 threads)	275ms	321ms
random reads (8 threads)	171ms	193ms
random reads (16 threads)	148ms	157ms
random reads (32 threads)	131ms	137ms
removals	2000ms	2069ms

cberner · 2024-05-11T18:03:15Z

Merged. Thanks!

adamreichold force-pushed the remove-double-indirection branch from fcfe1d1 to 66e40e9 Compare May 4, 2024 09:23

adamreichold force-pushed the remove-double-indirection branch from 66e40e9 to 0a63e55 Compare May 8, 2024 05:15

Avoid double-indirection of Arc<Vec<u8>> by using Arc<[u8]> directly

6626d76

adamreichold force-pushed the remove-double-indirection branch from 0a63e55 to 6626d76 Compare May 10, 2024 06:49

Avoid option dance in WritablePage by taking additonal reference coun…

fa6ae08

…t traffic in its Drop impl.

cberner merged commit 94e90d5 into cberner:master May 11, 2024
3 checks passed

adamreichold deleted the remove-double-indirection branch May 11, 2024 18:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: Avoid double-indirection of Arc<Vec<u8>> by using Arc<[u8]> directly #803

RFC: Avoid double-indirection of Arc<Vec<u8>> by using Arc<[u8]> directly #803

adamreichold commented May 4, 2024 •

edited

Loading

cberner commented May 8, 2024

adamreichold commented May 8, 2024

cberner commented May 10, 2024

adamreichold commented May 10, 2024

adamreichold commented May 10, 2024

adamreichold commented May 10, 2024

adamreichold commented May 10, 2024

adamreichold commented May 10, 2024 •

edited

Loading

cberner commented May 11, 2024

RFC: Avoid double-indirection of Arc<Vec<u8>> by using Arc<[u8]> directly #803

RFC: Avoid double-indirection of Arc<Vec<u8>> by using Arc<[u8]> directly #803

Conversation

adamreichold commented May 4, 2024 • edited Loading

cberner commented May 8, 2024

adamreichold commented May 8, 2024

cberner commented May 10, 2024

adamreichold commented May 10, 2024

adamreichold commented May 10, 2024

adamreichold commented May 10, 2024

adamreichold commented May 10, 2024

adamreichold commented May 10, 2024 • edited Loading

cberner commented May 11, 2024

adamreichold commented May 4, 2024 •

edited

Loading

adamreichold commented May 10, 2024 •

edited

Loading