Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Avoid double-indirection of Arc<Vec<u8>> by using Arc<[u8]> directly #803

Merged
merged 2 commits into from
May 11, 2024

Conversation

adamreichold
Copy link
Contributor

@adamreichold adamreichold commented May 4, 2024

The second commit removes any per-access overhead to WritablePage but at the cost of some unsafe code relying on the uniqueness that was asserted once by calling Arc::try_unwrap before this change.

@cberner
Copy link
Owner

cberner commented May 8, 2024

I'm not seeing a speed-up in https://github.com/cberner/redb/blob/master/benches/lmdb_benchmark.rs

Is there a benchmark where you were able to measure a significant speed-up from this?

@adamreichold
Copy link
Contributor Author

I'm not seeing a speed-up in https://github.com/cberner/redb/blob/master/benches/lmdb_benchmark.rs

Is there a benchmark where you were able to measure a significant speed-up from this?

This was not driven by benchmarks for me. I just stumbled over this while looking something else up in the code.

I would say that there is usually never a reason to use Arc<Vec<T>> instead of Arc<[T]> as the Arc layer implies shared access. For redb, these pages are also never resized so that even with temporary mutable access, the unused capacity of the Vec is not necessary. So for me, this was this was strictly a simplification.

Of course, the situation is a bit more complex here as the WritablePage depends on mutable access through the Arc which I would agree that it is not a trivial change.

So if there are benchmark changes, I would expect them to be positive, but this was not my motivation. I would sort any performance changes under "distributed fat" as I would expect increased TLB/cache pressure from the additional indirection and possible heap fragmentation.

@cberner
Copy link
Owner

cberner commented May 10, 2024

Got it, I see. The first commit looks good then, but I don't want to add the unsafe commit unless there's a really clear performance benefit. I'd like to remove the last of the unsafe code from redb, but need to get file lock support into std to replace the libc usage.

Want to remove the second commit, and I'll merge the first?

@adamreichold
Copy link
Contributor Author

Want to remove the second commit, and I'll merge the first?

I did put that into a separate commit to make it easy to drop. I added it all because Arc:get_mut is then called for each mutable page access and I feared that this would actually impact performance. Did you already run the benchmark without the last commit? If so, I'll just drop.

Otherwise, I'll do that and if it has an impact I'll try to figure something to avoid that without unsafe.

but need to get file lock support into std to replace the libc usage.

Not sure if you want to take on another dependency, but rustix does have safe file locking: https://docs.rs/rustix/latest/rustix/fs/fn.flock.html (rustix is really widely used and almost similar to libc in scope. It is also already a transitive dev-dependency via tempfile. So while not obvious, I think it might make sense to replace this.)

@adamreichold
Copy link
Contributor Author

Not sure if you want to take on another dependency, but rustix does have safe file locking: https://docs.rs/rustix/latest/rustix/fs/fn.flock.html (rustix is really widely used and almost similar to libc in scope. It is also already a transitive dev-dependency via tempfile. So while not obvious, I think it might make sense to replace this.)

Ok, I tried but while it does work, there is still Windows which is out of scope for rustix and F_BARRIERSYNC on macOS which it does not yet support.

@adamreichold
Copy link
Contributor Author

I dropped the last commit after running the lmdb_benchmark on my notebook. The results are somewhat noisy so I think you'll want to run this again yourself to make sure. I do see small but consistent improvements for random reads, and without the second commit I rarely see small regressions for writes, but not always.

As I said, it is quite noisy and I am not sure what to make of it. (My hardware especially the disk are decidedly consumer-grade.)

@adamreichold
Copy link
Contributor Author

Ah, I found a way to at least avoid the option dance without unsafe code at the cost of an otherwise unnecessary reference increment-decrement-pair. Pushed that as a separate commit.

@adamreichold
Copy link
Contributor Author

adamreichold commented May 10, 2024

Ok, I also reran the benchmark after changing it to run from a ramdisk to take my shitty reasonably-priced disk out of the picture and tease out the CPU overhead of managing the cache.

With that I am now reasonably confident that this does not introduce undue overhead and yields small but consistent improvements:

PR master
bulk load 2673ms 2791ms
individual writes 3ms 3ms
batch writes 1291ms 1299ms
len() 0ms 0ms
random reads 1137ms 1335ms
random reads 1026ms 1203ms
random range reads 2864ms 3080ms
random range reads 2861ms 3069ms
random reads (4 threads) 275ms 321ms
random reads (8 threads) 171ms 193ms
random reads (16 threads) 148ms 157ms
random reads (32 threads) 131ms 137ms
removals 2000ms 2069ms

@cberner cberner merged commit 94e90d5 into cberner:master May 11, 2024
3 checks passed
@cberner
Copy link
Owner

cberner commented May 11, 2024

Merged. Thanks!

@adamreichold adamreichold deleted the remove-double-indirection branch May 11, 2024 18:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants