Releases: iipc/jwarc
Releases · iipc/jwarc
v0.30.0: Release 0.30.0
New features
- WarcReader and WarcParser gained a lenient parsing mode which:
- permits ASCII control characters in header field names and values
- allows lines to end with LF instead of CRLF
- permits multi-digit WARC minor versions like "0.18"
v0.29.0: Release 0.29.0
New features
- Added MediaType.parseLeniently() and .isValid()
Changes
- Message.contentType() and other methods that internally call it now use the lenient MediaType parser instead of throwing IllegalArgumentException #83
v0.28.6: Release 0.28.6
Bugs fixed
- Improved compatibility with ARC variants (version-block length off by one, v2 version-block, spurious linefeeds) #82
- WarcParser: Context in parse error messages was incorrectly using the parser (file) position instead of buffer position
v0.28.5: Release 0.28.5
Bugs fixed
- Fixed ClosedChannelException when reading a WarcRevisit body after closing a previous one due to reuse of empty MessageBody. #80
v0.28.4: Release 0.28.4
Bugs fixed
- CDX formatting now percent encodes spaces, newlines and null characters in all string fields. This is non-standard but at least prevents us outputting invalid CDX lines.
- CdxRequestEncoder now handles requests with an invalid content-type header
v0.28.3: Release 0.28.3
v0.28.2: Release 0.28.2
Changes:
- HttpRequest+HttpResponse in lenient mode now recovers when parsing the Content-Length header throws NumberFormatException
- WarcParser now tries to leniently parse ARC records containing corrupt dates
v0.28.1: Release 0.28.1
Bugs fixed:
- Fixed output truncation with the CDX CLI tool due to OutputStreamWriter buffer not being flushed or closed before exit
- CdxWriter.process(files, useAbsolutePaths) ignored the useAbsolutePaths=false option case was always outputting absolute path
- CdxRequestEncoder: Improved pywb compatiblity for non-ASCII characters in url encoded request bodies
- CdxRequestEncoder: Fixed URLDecoder exception for large request bodies or those including invalid percent encoding
- WarcWriter.fetch: Fixed bug where maxTime limit accidentally used the value of maxLength option instead
v0.28.0: Release 0.28.0
New features:
- Added fetch options to WarcWriter.fetch and fetch tool: maxTime, maxLength, readTimeout, userAgent
- Added fetch tool option --output-file
Bugs fixed:
- Fixed missing response.http().body().size() value when response is truncated and WarcReader.calculateBlockDigest() is enabled
v0.27.1: Release 0.27.1
Bugs fixed:
- Lenient HTTP parser now accepts folded header lines that use LF instead of CRLF
- Fixed bug where bogus ARC MIME field could be prepended to the length field