Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bulk::decompress_to_buffer is 40 times slower than stream::copy_decode #291

Open
Firestar99 opened this issue Jul 14, 2024 · 1 comment
Open

Comments

@Firestar99
Copy link

Firestar99 commented Jul 14, 2024

Background: I use zstd to decompress "block compressed images" (BCn) which have additionally been compressed by zstd before being written to disk, yielding a 33% size reduction. I have 331 images, all of them 2048x2048 pixels in size and exactly 4MiB large when decompressed, which are all compressed individually without a dictionary. Some have high variance and some with very regular patterns. When I run my application, I first load the entire binary blob containing all the images from file into memory, before starting to decompress them from a &[u8] slice of that buffer into another fixed size &mut [u8] slice allocated beforehand, so neither IO nor allocation should not affect the results. For profiling I'm using the profiling crate with the puffin backend, everything in release ofc, and puffin_http to send the profiling results to the external puffin_viewer. Tested on an 6900HS.

Using bulk::decompress_to_buffer on a single thread takes about 52.8s total of which 51.2s are taken up by this method:

#[profiling::function]
fn decode_bcn_zstd_into(&self, src: &[u8], dst: &mut [u8]) -> io::Result<()> {
	let written = zstd::bulk::decompress_to_buffer(src, dst)?;
	assert_eq!(written, dst.len(), "all bytes written");
	Ok(())
}

But if I switch for stream::copy_decode it only takes 2.9s, of which 1.3s are spend on decompression:

#[profiling::function]
fn decode_bcn_zstd_into(&self, src: &[u8], mut dst: &mut [u8]) -> io::Result<()> {
	zstd::stream::copy_decode(src, &mut dst)?;
	assert_eq!(0, dst.len(), "all bytes written");
	Ok(())
}

Just looking at total time spent decompressing, that's a 39.3x speedup! I would honestly have expected the bulk API to be faster in this case, as it's specifically made to deal with slices and having all data being present in memory. Any idea what could cause the speed difference?

@Firestar99 Firestar99 changed the title bulk::decompress is 40 times slower than stream::copy_decode bulk::decompress_to_buffer is 40 times slower than stream::copy_decode Jul 14, 2024
@gyscos
Copy link
Owner

gyscos commented Jul 23, 2024

Hi, and thanks for the report!

This is indeed quite surprising!
Note that the bulk API is intended to re-use a Compressor (or Decompressor) between calls - the module-level methods create a (De)compressor every time. Though zstd::stream::copy_decode also creates a new context on every call, so it shouldn't be that different...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants