Decode main loop improvements

- Rearrange main decoding loops to handle chunks of 32 bytes at a time, then 4 bytes at a time, meaning that `decode_suffix` need only handle 0-4 bytes, simplifying its code. Moderate speed gains of around 5-10%. - Improve error precision. `InvalidLength` now has a `usize` length indicating how many valid symbols were found, but that the count of those symbols was invalid. Before, it just did `input % 4 == `, which was harder to reason about, as there might be padding etc. DecoderReader now also precisely reports the suitable InvalidByte if an earlier block of decoding found padding that was valid in that context, but more padding was found later, rendering that earlier padding invalid. - Tidy up decode tests. There were some duplicated scenarios, and certain aspects are now tested in more detail.
marshallpierce · Mar 1, 2024 · a8a60f4 · a8a60f4
1 parent a25be06
commit a8a60f4
Show file tree

Hide file tree

Showing 6 changed files with 466 additions and 710 deletions.
diff --git a/src/decode.rs b/src/decode.rs
@@ -9,18 +9,20 @@ use std::error;
 #[derive(Clone, Debug, PartialEq, Eq)]
 pub enum DecodeError {
  /// An invalid byte was found in the input. The offset and offending byte are provided.
- /// Padding characters (`=`) interspersed in the encoded form will be treated as invalid bytes.
+ ///
+ /// Padding characters (`=`) interspersed in the encoded form are invalid, as they may only
+ /// be present as the last 0-2 bytes of input.
+ ///
+ /// This error may also indicate that extraneous trailing input bytes are present, causing
+ /// otherwise valid padding to no longer be the last bytes of input.
  InvalidByte(usize, u8),
- /// The length of the input is invalid.
- /// A typical cause of this is stray trailing whitespace or other separator bytes.
- /// In the case where excess trailing bytes have produced an invalid length *and* the last byte
- /// is also an invalid base64 symbol (as would be the case for whitespace, etc), `InvalidByte`
- /// will be emitted instead of `InvalidLength` to make the issue easier to debug.
- InvalidLength,
+ /// The length of the input, as measured in valid base64 symbols, is invalid.
+ /// There must be 2-4 symbols in the last input quad.
+ InvalidLength(usize),
  /// The last non-padding input symbol's encoded 6 bits have nonzero bits that will be discarded.
  /// This is indicative of corrupted or truncated Base64.
- /// Unlike `InvalidByte`, which reports symbols that aren't in the alphabet, this error is for
- /// symbols that are in the alphabet but represent nonsensical encodings.
+ /// Unlike [DecodeError::InvalidByte], which reports symbols that aren't in the alphabet,
+ /// this error is for symbols that are in the alphabet but represent nonsensical encodings.
  InvalidLastSymbol(usize, u8),
  /// The nature of the padding was not as configured: absent or incorrect when it must be
  /// canonical, or present when it must be absent, etc.
@@ -30,8 +32,10 @@ pub enum DecodeError {
 impl fmt::Display for DecodeError {
  fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
  match *self {
- Self::InvalidByte(index, byte) => write!(f, "Invalid byte {}, offset {}.", byte, index),
- Self::InvalidLength => write!(f, "Encoded text cannot have a 6-bit remainder."),
+ Self::InvalidByte(index, byte) => {
+ write!(f, "Invalid symbol {}, offset {}.", byte, index)
+ }
+ Self::InvalidLength(len) => write!(f, "Invalid input length: {}", len),
  Self::InvalidLastSymbol(index, byte) => {
  write!(f, "Invalid last symbol {}, offset {}.", byte, index)
  }