GzipSource's RealBufferedSource buffers too many bytes #1160

okiolover · 2022-08-23T14:18:10Z

I am streaming a document that has only one section gzip-compressed:

mixedSource: BufferedSource
mixedSource.use {
  mixedSource.skip(headerLength) // Process the (never compressed) header.
  val gzipSource = GzipSource(mixedSource).buffer()
  gzipSource.skip(unzippedLength) // Process now-inflated bytes.
  // Also, probably have to skip the gzip footer bytes...
  // Missing the first bytes of the footer in the GzipSource's BufferedSource's buffer!
  mixedSource.skip(footerLength) // Process the (never compressed) footer.
}

After streaming the gzipped section, my next section is missing its first few bytes. This is because GzipSource creates a new RealBufferedSource with my mixedSource, and this new RealBufferedSource ends up buffering more bytes than the gzip document will read. That is, my next section's first few bytes end up lost in the GzipSource's RealBufferedSource's Buffer with no obvious way to get them back to my mixedSource to continue streaming and processing.

Would it be wrong for GzipSource to accept a BufferedSource, so it wouldn't have to create its own, or is there any other way to get my bytes back?

The text was updated successfully, but these errors were encountered:

JakeWharton · 2022-08-23T23:31:43Z

Do you have the length of the gzipped section? OkHttp has a special Source that restricts the number of bytes that can be read which is how it ensures GzipSource doesn't read too many bytes when talking on an HTTP connection.

Side note: I could have sworn that I upstreamed the length-limiting source into Okio but it seems like I never got around to it.

okiolover · 2022-08-24T00:10:09Z

Yes, I know the compressed length and the uncompressed length of the gzipped section.

Oh, nice. I'll try looking again at OkHttp for an example...

JakeWharton · 2022-08-24T00:34:39Z

The fixed-length source is here in OkHttp: https://github.com/square/okhttp/blob/858d680ed3a8c46909b44db0881ce1ceca51bb1e/okhttp/src/jvmMain/kotlin/okhttp3/internal/http1/Http1ExchangeCodec.kt#L355

okiolover · 2022-08-24T04:52:39Z

Thanks. I made a simple Source wrapper to feed to the GzipSource, so the GzipSource's internal BufferedSource can't buffer up more bytes:

class ByteLimitedSource(delegate: Source, private var bytesRemaining: Long) :
  ForwardingSource(delegate) {
  override fun read(sink: Buffer, byteCount: Long): Long {
    return super.read(sink, minOf(bytesRemaining, byteCount)).also {
      check(it != -1L)
      bytesRemaining -= it
    }
  }
}

mixedSource: BufferedSource
mixedSource.use {
  mixedSource.skip(headerLength) // Process the (never compressed) header.
  val gzipSource = GzipSource(ByteLimitedSource(mixedSource, zippedLength)).buffer()
  gzipSource.skip(unzippedLength) // Process now-inflated bytes.
  mixedSource.skip(footerLength) // Process the (never compressed) footer.
}

It feels unfortunate that I need to know the compressed length beforehand since the GzipSource knows about gzip's footer/termination bytes.

I also feel like this is surprising behavior (like I originally assumed the GzipSource's source field would have just been a .buffer() call on my Source input, so they would share the Buffer).

But, in my case, I do know the length, so this works.

okiolover · 2022-08-24T04:58:04Z

I'll close this out because I don't see a consistent way for GzipSource to handle this for the user without an awkward API taking in a BufferedSource and promising to use it/share its Buffer.

Thanks for the help.

okiolover closed this as completed Aug 24, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GzipSource's RealBufferedSource buffers too many bytes #1160

GzipSource's RealBufferedSource buffers too many bytes #1160

okiolover commented Aug 23, 2022

JakeWharton commented Aug 23, 2022

okiolover commented Aug 24, 2022

JakeWharton commented Aug 24, 2022

okiolover commented Aug 24, 2022

okiolover commented Aug 24, 2022

GzipSource's RealBufferedSource buffers too many bytes #1160

GzipSource's RealBufferedSource buffers too many bytes #1160

Comments

okiolover commented Aug 23, 2022

JakeWharton commented Aug 23, 2022

okiolover commented Aug 24, 2022

JakeWharton commented Aug 24, 2022

okiolover commented Aug 24, 2022

okiolover commented Aug 24, 2022