Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GzipSource's RealBufferedSource buffers too many bytes #1160

Closed
okiolover opened this issue Aug 23, 2022 · 5 comments
Closed

GzipSource's RealBufferedSource buffers too many bytes #1160

okiolover opened this issue Aug 23, 2022 · 5 comments

Comments

@okiolover
Copy link

I am streaming a document that has only one section gzip-compressed:

mixedSource: BufferedSource
mixedSource.use {
  mixedSource.skip(headerLength) // Process the (never compressed) header.
  val gzipSource = GzipSource(mixedSource).buffer()
  gzipSource.skip(unzippedLength) // Process now-inflated bytes.
  // Also, probably have to skip the gzip footer bytes...
  // Missing the first bytes of the footer in the GzipSource's BufferedSource's buffer!
  mixedSource.skip(footerLength) // Process the (never compressed) footer.
}

After streaming the gzipped section, my next section is missing its first few bytes. This is because GzipSource creates a new RealBufferedSource with my mixedSource, and this new RealBufferedSource ends up buffering more bytes than the gzip document will read. That is, my next section's first few bytes end up lost in the GzipSource's RealBufferedSource's Buffer with no obvious way to get them back to my mixedSource to continue streaming and processing.

Would it be wrong for GzipSource to accept a BufferedSource, so it wouldn't have to create its own, or is there any other way to get my bytes back?

@JakeWharton
Copy link
Collaborator

Do you have the length of the gzipped section? OkHttp has a special Source that restricts the number of bytes that can be read which is how it ensures GzipSource doesn't read too many bytes when talking on an HTTP connection.

Side note: I could have sworn that I upstreamed the length-limiting source into Okio but it seems like I never got around to it.

@okiolover
Copy link
Author

Yes, I know the compressed length and the uncompressed length of the gzipped section.

Oh, nice. I'll try looking again at OkHttp for an example...

@okiolover
Copy link
Author

Thanks. I made a simple Source wrapper to feed to the GzipSource, so the GzipSource's internal BufferedSource can't buffer up more bytes:

class ByteLimitedSource(delegate: Source, private var bytesRemaining: Long) :
  ForwardingSource(delegate) {
  override fun read(sink: Buffer, byteCount: Long): Long {
    return super.read(sink, minOf(bytesRemaining, byteCount)).also {
      check(it != -1L)
      bytesRemaining -= it
    }
  }
}

mixedSource: BufferedSource
mixedSource.use {
  mixedSource.skip(headerLength) // Process the (never compressed) header.
  val gzipSource = GzipSource(ByteLimitedSource(mixedSource, zippedLength)).buffer()
  gzipSource.skip(unzippedLength) // Process now-inflated bytes.
  mixedSource.skip(footerLength) // Process the (never compressed) footer.
}

It feels unfortunate that I need to know the compressed length beforehand since the GzipSource knows about gzip's footer/termination bytes.

I also feel like this is surprising behavior (like I originally assumed the GzipSource's source field would have just been a .buffer() call on my Source input, so they would share the Buffer).

But, in my case, I do know the length, so this works.

@okiolover
Copy link
Author

I'll close this out because I don't see a consistent way for GzipSource to handle this for the user without an awkward API taking in a BufferedSource and promising to use it/share its Buffer.

Thanks for the help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants