Skip to content

Releases: iipc/jwarc

v0.30.0: Release 0.30.0

28 Jun 07:36
@ato ato
Compare
Choose a tag to compare

New features

  • WarcReader and WarcParser gained a lenient parsing mode which:
    • permits ASCII control characters in header field names and values
    • allows lines to end with LF instead of CRLF
    • permits multi-digit WARC minor versions like "0.18"

v0.29.0: Release 0.29.0

14 Feb 04:43
@ato ato
Compare
Choose a tag to compare

New features

  • Added MediaType.parseLeniently() and .isValid()

Changes

  • Message.contentType() and other methods that internally call it now use the lenient MediaType parser instead of throwing IllegalArgumentException #83

v0.28.6: Release 0.28.6

09 Feb 07:15
@ato ato
Compare
Choose a tag to compare

Bugs fixed

  • Improved compatibility with ARC variants (version-block length off by one, v2 version-block, spurious linefeeds) #82
  • WarcParser: Context in parse error messages was incorrectly using the parser (file) position instead of buffer position

v0.28.5: Release 0.28.5

13 Dec 05:34
@ato ato
Compare
Choose a tag to compare

Bugs fixed

  • Fixed ClosedChannelException when reading a WarcRevisit body after closing a previous one due to reuse of empty MessageBody. #80

v0.28.4: Release 0.28.4

13 Dec 05:33
@ato ato
Compare
Choose a tag to compare

Bugs fixed

  • CDX formatting now percent encodes spaces, newlines and null characters in all string fields. This is non-standard but at least prevents us outputting invalid CDX lines.
  • CdxRequestEncoder now handles requests with an invalid content-type header

v0.28.3: Release 0.28.3

28 Sep 00:09
@ato ato
Compare
Choose a tag to compare

Release 0.28.3

Bugs fixed:

  • Fixed multithreading issue on GzipChannel write header #79

v0.28.2: Release 0.28.2

15 Sep 07:18
@ato ato
Compare
Choose a tag to compare

Changes:

  • HttpRequest+HttpResponse in lenient mode now recovers when parsing the Content-Length header throws NumberFormatException
  • WarcParser now tries to leniently parse ARC records containing corrupt dates

v0.28.1: Release 0.28.1

02 Aug 07:22
@ato ato
Compare
Choose a tag to compare

Bugs fixed:

  • Fixed output truncation with the CDX CLI tool due to OutputStreamWriter buffer not being flushed or closed before exit
  • CdxWriter.process(files, useAbsolutePaths) ignored the useAbsolutePaths=false option case was always outputting absolute path
  • CdxRequestEncoder: Improved pywb compatiblity for non-ASCII characters in url encoded request bodies
  • CdxRequestEncoder: Fixed URLDecoder exception for large request bodies or those including invalid percent encoding
  • WarcWriter.fetch: Fixed bug where maxTime limit accidentally used the value of maxLength option instead

v0.28.0: Release 0.28.0

27 Jul 06:45
@ato ato
Compare
Choose a tag to compare

New features:

  • Added fetch options to WarcWriter.fetch and fetch tool: maxTime, maxLength, readTimeout, userAgent
  • Added fetch tool option --output-file

Bugs fixed:

  • Fixed missing response.http().body().size() value when response is truncated and WarcReader.calculateBlockDigest() is enabled

v0.27.1: Release 0.27.1

26 Jul 07:56
@ato ato
Compare
Choose a tag to compare

Bugs fixed:

  • Lenient HTTP parser now accepts folded header lines that use LF instead of CRLF
  • Fixed bug where bogus ARC MIME field could be prepended to the length field