Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Long header folding adds additional spaces #1525

Closed
blaaat opened this issue Aug 23, 2018 · 28 comments · Fixed by #1840
Closed

Long header folding adds additional spaces #1525

blaaat opened this issue Aug 23, 2018 · 28 comments · Fixed by #1840

Comments

@blaaat
Copy link

blaaat commented Aug 23, 2018

If sending a long header without spaces, header folding is incorrectly.

Headers should fold with a CRLF, but after folding the CRLF is replaced with a LF and a space is introduced:

$encoded = trim(chunk_split($str, static::STD_LINE_LENGTH, static::$LE));

This is correct; but

$encoded = str_replace(static::$LE, "\n", trim($encoded));
$encoded = preg_replace('/^(.*)$/m', ' \\1', $encoded);

corrupts the folding; order should be different.

@Synchro
Copy link
Member

Synchro commented Aug 24, 2018

I'm not clear what you mean. Headers are folded by adding line breaks and white space at the start of continuation lines, so this is doing:

  1. Split text into chunks separated by the standard line break;
  2. Normalise all the line breaks into LF (because PCRE's m modifier used in the next step expects them);
  3. Replace each line within the string with itself prepended with a space.

For example, if STD_LINE_LENGTH was 10, you would expect this line:

01234567890123456789012

to be folded as:

0123456789
 012345678
 9012

If you don't add spaces, it's not folding, so what are you proposing to do instead?

@blaaat
Copy link
Author

blaaat commented Aug 24, 2018

I'm not really sure what's correct according to the RFC (which describes that folding should only happen before WSP).

But in the example given:

For example, the header field:

Subject: This is a test

can be represented as:

Subject: This
[space]is a test

No additional space is introduced (there is no space after This).

I tested with both Apple mail (which displays the additional spaces in subject) as the mailparse libraries (which also outputs the spaces).

When folding without introducing additional spaces (just CRLF) output is as expected in both clients.

Also when using Apple Mail or the Gmail web client to send a long header, folding occurs without additional spaces.

Since it's not absolutely clear that spaces should be introduced while folding, I would propose to match the interpretation of other clients/senders.

@Synchro
Copy link
Member

Synchro commented Aug 24, 2018

I see what you mean - but also I can't see a way to preserve spaces in the original header line that would not result in breaking lines that should not have spaces - folding/unfolding should be lossless, regardless of whether lines contain spaces or not. DKIM signatures can be several KB long and often have no "higher-level syntactic breaks" within the permitted line length, so not adding spaces while folding would result in corrupt headers, and not removing spaces when unfolding would render them useless. There may be some other mechanism at work that's not covered by 5322:2.3.3 though - here's the DKIM header from the notification email I received about your last post:

DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=github.com;
	s=pf2014; t=1535117995;
	bh=NT47w3kIw5+mSWWtDSkXfP3EGQxF5yPpGM7JCNqy2hA=;
	h=Date:From:Reply-To:To:Cc:In-Reply-To:References:Subject:List-ID:
	 List-Archive:List-Post:List-Unsubscribe:From;
	b=TmKwEcjr8shmKDF2cJS1C7KF/CYyQxkWBLVZpkyiZwF8+rEetHlF/ZGM40bmqS6VA
	 yUbHhgjdXhLXzoVgrnv4FU0jiAzTlUEz1EKYy0MKFPhbXc9kihdith1k/Y+8pLADFU
	 ERZ1a11i/cgJxbZmKC2ugXRXY/+EarbgUgokZsV8=

This has two levels of folding! The DKIM-Signature header itself is folded on "higher-level syntactic breaks" (the spaces between DKIM params) and uses a single tab for folding, however, the longer params (h and b) have an additional layer of folding applied which is like I described, using a space (e.g. that List-Archive starts with a tab followed by a space) - and if that space was preserved when unfolding, the DKIM signature would be wrong. As far as I'm aware, the RFCs don't care what you use for folding (it's all FWS), but I'm not aware of which RFC applies to this idea of nested folding.

@blaaat
Copy link
Author

blaaat commented Aug 25, 2018

Complex! To give the counter example; List-Unsubscribe headers may contain long URLs which should not contain spaces in the middle of the url after unfolding.

I noticed Apple Mail even adds a new iine directly after the : (and the header name/value separator space is on the next line)

Subject:
[space here]AaaaaaaaaaaaaaaaaaAaaaaaaaaaaaaaaaaaAaaaaaaaaaaaaaaaaaAaaaaaaaaaaaaaaaaaAaaaaaaaaaaaaaaaaaAaaaaaaaaaaaaaaaaaAaaaaaaaaaaaaaaaaaAaaaaaaaaaaaaaaaaaAaaaaaaaaaaaaaaaaaAaaaaaaaaaaaaaaaaaAaaaaaaaaaaaaaaaaaAaaaaaaaaaaaaaaaaaAaaaaaaaaaaaaaaaaaAaaaaaaaaaaaaaaaaaAaaaaaaaaaaaaaaaaaAaaaaaaaaaaaaaaaaaAaaaaaaaaaaaaaaaaaAaaaaaaaaaaaaaaaaaAaaaaaaaaaaaaaaaaaAaaaaaaaaaaaaaaaaaAaaaaaaaaaaaaaaaaaAaaaaaaaaaaaaaaaaaAaaaaaaaaaaaaaaaaa

I don't know what's the correct solution here either.

@JelleSFS
Copy link

JelleSFS commented Sep 6, 2018

I see what you mean - but also I can't see a way to preserve spaces in the original header line that would not result in breaking lines that should not have spaces - folding/unfolding should be lossless, regardless of whether lines contain spaces or not.

I think the way of folding is perfectly fine for critical headers like DKIM, but causes issues for other kind of headers.

Maybe there should be 2 kind of folding-methods? One for the critical headers where additional spaces are needed to guarantee proper working. The other one without additional spaces.

I'd like to add another example of the breaking and adding spacing causing issues:
I created issue #1469 a few months ago about spaces and linebreaks causing issues with filenames of attachments.

@JelleSFS
Copy link

Yesterday I saw a new case in which the additional space causes issues:

Because the subjects of the messages my system sends out can contain UTF-8 characters which didn't seem to be escaped correctly, I base64-encode the subjects of all messages (RFC2047:2).

$mailer->Subject = "=?UTF-8?B?" . base64_encode($this->subject) . "?=";

Yesterdays case: one of our clients got the literal base64-encoded string as subject because there was an additional space which his mailclient didn't delete.

Subject: =?UTF-8?B?RU1EIFdlcmtib246IE5hdHVyYWxpcyBCaW9kaXZlcnNpdHkgQ2VudGVyXzA4OTJfMT ctMDktMjAxOA==?=

@Synchro
Copy link
Member

Synchro commented Sep 27, 2018

@JelleSFS Don't do that - PHPMailer knows how to encode UTF-8 in headers, so when you do that it will get double-encoded.

@JelleSFS
Copy link

I'll try again without, but had some issues with some characters before. Maybe I have made some mistakes during implementation.

But it still leaves the space-issue. Without the space, it probably would've been decoded properly. (We've sent thousands of messages with the described technique and have only seen a handful errors)

@eKrajnak
Copy link

I can confirm problem with folding. Value of header option List-Unsubscribe contains space after 76 character. Confirmed by Gmail show source option. Unsubscribe link is unusable. PHPMailer 6.0.7

@pnoeric
Copy link

pnoeric commented Sep 1, 2019

Heh, I'm having exactly the same problem as @eKrajnak -- I'm adding a simple "List-Unsubscribe" header to my outgoing message. The header is 93 characters or so. At approximately character 72, PHP Mailer is inserting a single space. Obviously this breaks things. I can confirm this by viewing the raw headers in gmail. And of course Gmail doesn't show unsubscribe link at all. Ugh.

How can I help solve this? I don't know much about the RFCs but I can code... is this an issue where we can fold a long header with a CRLF instead of space? Or something? ;-)

@Synchro
Copy link
Member

Synchro commented Sep 1, 2019

The act of folding is to break a line at the line limit by inserting a line break (usually CRLF) and one or more whitespace characters. Servers and clients know that there should be no whitespace at the start of any header line unless it's been folded, so they can detect it and apply the reverse operation: remove the leading whitespace, remove the line break. As you can see, this should be lossless as the operations are symmetric, however, there is a further complication. What I just said is about the format of the message itself; folding can also occur independently at the SMTP level, invoking folding on top of folding. The problem then is that you can't tell which level did the folding, and it's also possible to use different whitespace characters at each level, so you could end up with:

header-name: value\r\n
\t rest of value

The RFCs suggest folding at word boundaries (e.g. where there are already spaces), but when you have things like long addresses or DKIM signatures that have no such "natural breaks", there is no choice but to break in the middle of originally unbroken text.

What's not clear to me (and I suspect at the root of the problem) is what you do with natural spaces since you can't distinguish them from folding whitespace. If you have a line like:

Header: this is a long line that I'm going to wrap

that gets folded like this:

Header: this is a long line that I'm
 going to wrap

when you unfold it, you end up with:

Header: this is a long line that I'mgoing to wrap

So you say, "aha, so we should add a space when unfolding!" But that doesn't work with lines that should not have spaces added because it breaks them. What's also unclear is whether any trailing space should be preserved after folding, as that might help with this.

Any input about how to handle these conflicting situations? I suspect the solution is distributed across the the small print of several RFCs...

@blaaat
Copy link
Author

blaaat commented Sep 1, 2019

The act of folding is to break a line at the line limit by inserting a line break (usually CRLF) and one or more whitespace characters.

In rfc5322:2.2.3 (and rfc2822:2.2.3) the text mentions:

The general rule is that wherever this specification allows for folding white
space (not simply WSP characters), a CRLF may be inserted before any
WSP.

I don't read adding one or more additional whitespace characters is allowed he. Just that the general rule is that a CRLF is inserted before any existing whitespace. As I mentioned earlier; other mail clients break at any point if there is no suitable whitespace to break on.

What's also unclear is whether any trailing space should be preserved after folding, as that might help with this.

I think so; since you can add the CRLF before any whitespace.

@blaaat
Copy link
Author

blaaat commented Sep 1, 2019

This has two levels of folding! The DKIM-Signature header itself is folded on "higher-level syntactic breaks" (the spaces between DKIM params) and uses a single tab for folding, however, the longer params (h and b) have an additional layer of folding applied which is like I described, using a space (e.g. that List-Archive starts with a tab followed by a space) - and if that space was preserved when unfolding, the DKIM signature would be wrong.

Are you sure this space is not used to calculate the DKIM signature? According to the List-Unsubscribe RFC; leading whitespace is allowed and all examples include leading white space.

Maybe the DKIM signatures are currently generated wrongly, possible related: #1563 #1469 #1406 #1352

@Synchro
Copy link
Member

Synchro commented Sep 1, 2019

That DKIM example is nothing to do with the format of the list-unsubscribe header? That h param for DKIM is a : delimited list of normalised header names, and it does allow for FWS either side of the delimiter. Anyway, I was just using that example to show the two-level folding going on - I assume I was wrong about the addition of that space breaking DKIM? I guess it depends on whether the space was present in the original signature, or whether it was added by folding applied after calculation of the signature, though the "relaxed" header canonicalization involves unfolding everything first.

It's all very painful!

@blaaat
Copy link
Author

blaaat commented Sep 1, 2019

I wasn't taking about the DKIM fold, but that you mentioned that the DKIM signature would be invalid when unfolding would re-introduce the space that exists in the List-Unsubscribe header sent by Github. My guess would be that adding a space when it doesn't exists breaks DKIM and that the unfolded GitHub List-unsubscribe header contains a comma+space separator.

@pnoeric
Copy link

pnoeric commented Sep 1, 2019

FWIW this is my full and complete List-Unsubscribe header:

List-Unsubscribe: <https://example.com/e_dis.php?e_id=1148423&e_et=1&e_rc=5241d9b6fd6ff1f4c9c523d5d3245bb2>

(I can optionally add a space and a comma and a second way to unsubscribe, but it isn't required; what I show above is valid.)

Try it with phpmailer and gmail, you'll see how a space is dropped in....

@pnoeric
Copy link

pnoeric commented Sep 1, 2019

BTW I tried it adding that header to a test mail sent with Swift Mailer and it works: no space is inserted, gmail is happy. Perhaps one way to solve this problem would be just to see what they're doing-- ? (Sorry if that's not cool to say! Just trying to help identify a quick solution.)

@eKrajnak
Copy link

eKrajnak commented Sep 1, 2019

I'm not gonna discuss, where is problem (RFC vs. Gmail vs. PHPMailer). But to provide simple solution for @pnoeric I have made simple patch.

Then you just need:

$mail->ListUnsubscribe = '<https://example.com/verylongurl>';

phpmailer-unsubscribe-patch.zip

@pnoeric
Copy link

pnoeric commented Sep 2, 2019

Thank you!

@Synchro
Copy link
Member

Synchro commented Sep 2, 2019

I've pushed a revision of DKIM header canonicalisation. There were a couple of likely sources of this problem:

  1. It combined unfolding and whitespace collapse into a single operation; this was not quite correct, and collapsing was also done later.
  2. Collapsing whitespace should collapse everything to a single space - but this would only happen if 2 or more whitespace chars happened in sequence, so single tabs were left unconverted.

I also added tests that cover long headers (like the List-Unsubscribe example mentioned above), runs of spaces, trailing space.

Please give it a go and tell me if it improves your outcomes!

@blaaat
Copy link
Author

blaaat commented Sep 2, 2019

Thanks. I'm not using DKIM to sign my messages, so I can't tell if this fixes the issues.

My understanding of the issue was that DKIM signatures were rejected because the receiving end would unfold headers differently.

Your tests for headers looks good, but the same needs to apply to the normal headers being sent out. The current master still adds spaces after each CRLF, the RFC does not mention that these spaces are required or even allowed.

Please test with:
$m->Subject = str_repeat('a', 300);

and send to any mail client.

I guess the above; and how folding inside the DKIM header (and assumed unfolding) should match each other to validate DKIM signatures.

@Synchro
Copy link
Member

Synchro commented Sep 2, 2019

This appears to be a well-known can of worms... This article describes exactly this problem - which is that RFC5322 simply doesn't allow for headers that lack spaces to break on! The only reasonable (but cumbersome) way I've seen to work around this is to use RFC2047 encoding (which will effectively convert an unwrappable line into a wrappable one) – but I've never seen that used in the wild for DKIM.

The DKIM example header above sneaks through 5322 by adding spaces in the h parameter value list, and also avoids other problems by only using a short DKIM key - if you're using a 2048-bit or bigger key (which you should), this is more likely to run into problems.

I've also realised that my definition of folding isn't quite correct - it's not that folded lines get CRLF<space> added, it's that CRLF is inserted before an existing space, and that space is preserved when unfolding. If the header lines contain spaces, the outcome is the same - but if they don't, trouble ensues.

This also means the DKIM unfolding change I just made isn't quite right.

Gotta love those RFCs...

@blaaat
Copy link
Author

blaaat commented Sep 2, 2019

Interesting read! Thanks for sharing.

which is that RFC5322 simply doesn't allow for headers that lack spaces to break on!

RFC2822:

folding SHOULD be limited to placing the CRLF at higher-level syntactic breaks.

rfc2119 defines that the SHOULD keyword that indicates this restriction may be ignored in some cases. A header without any higher level breaks (spaces, commas or something else?) appears to be one. Otherwise a MUST keyword should have been used in RFC2822

In any case, I think PHPMailer should do what is supported by the majority of e-mail clients.

@Synchro
Copy link
Member

Synchro commented Sep 2, 2019

The problem with that SHOULD "get-out" is what exactly should you do in that situation? It appears to be undefined - if you add spaces at arbitrary points, they will be retained when unfolding (which is going to be done by clients, not you as the sender), which breaks stuff, especially DKIM.

In any case, I think PHPMailer should do what is supported by the majority of e-mail clients.

So what is that, exactly? It seems to me there are only two choices: add spaces when folding headers lacking breaks, or don't. I've still not found a good explanation for the nested folding that DKIM example uses.

The big problem is when you get very long header fields (e.g. DKIM with 4096-bit keys or long URLs) that can exceed even the 998-char limit, you must fold at an arbitrary point - there is no other choice.

I think the RFC2047 approach is workable, doesn't break anything and should work everywhere, but it's quite ugly, and I also know that it will be painful to do in PHPMailer, even though it already has functions for encoding headers that way...

@blaaat
Copy link
Author

blaaat commented Sep 3, 2019

The problem with that SHOULD "get-out" is what exactly should you do in that situation? It appears to be undefined

I can't find a clear definition in the RFC's either.

So what is that, exactly? It seems to me there are only two choices: add spaces when folding headers lacking breaks, or don't.

Gmail and Apple mail both fold without adding additional spaces and add PHPMailer's spaces during unfolding. Mailparse unfolds the same. I haven't tested other mail clients yet. IMO folding/unfolding should be lossless (so no extra spaces)

Both Gmail and Apple mail fold without RFC2047 encoding. As this is apparently supported in the wild I wound't worry too much about encoding this in RFC2047.

As for DKIM; I've no experience or opinion about this; but viewed some of my inbox. Lots of variations with the second level fold. (one space, two spaces and no space at all).
GitHub itself sends mail with only first level tab level fold (the A third-party OAuth application has been added to your account mails) which validate. But thread notifications use tabs + space. Maybe just try the easy solution (one level fold only?) and see if this validates at MTA's?

@Synchro
Copy link
Member

Synchro commented Sep 3, 2019

Gmail and Apple mail both fold without adding additional spaces

You can't fold unbroken text without adding spaces or its not folding, and can't be undone. For example if you fold this:

Subject: aaaaaa

as

Subject: aaa
aaa

it doesn't look like a folded line, more like a corrupted header, and clients won't know what to do with it.

I just did a very useful test, sending a message from Apple Mail to gmail with a subject line of 1,386 'a's. Here's what happened:

Subject: =?us-ascii?Q?aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa?=
 =?us-ascii?Q?aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa?=
 =?us-ascii?Q?aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa?=
 =?us-ascii?Q?aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa?=
 =?us-ascii?Q?aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa?=
 =?us-ascii?Q?aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa?=
 =?us-ascii?Q?aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa?=
 =?us-ascii?Q?aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa?=
 =?us-ascii?Q?aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa?=
 =?us-ascii?Q?aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa?=
 =?us-ascii?Q?aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa?=
 =?us-ascii?Q?aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa?=
 =?us-ascii?Q?aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa?=
 =?us-ascii?Q?aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa?=
 =?us-ascii?Q?aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa?=
 =?us-ascii?Q?aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa?=
 =?us-ascii?Q?aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa?=
 =?us-ascii?Q?aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa?=
 =?us-ascii?Q?aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa?=
 =?us-ascii?Q?aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa?=
 =?us-ascii?Q?aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa?=
 =?us-ascii?Q?aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa?=
 =?us-ascii?Q?aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa?=
 =?us-ascii?Q?aaaaaaaaaaaaaa?=

This doesn't kick in until you hit the 998 char limit - it certainly doesn't happen when the length hits 76 chars (which is what the above lines are wrapped to, exactly as per RFC). As far as I can see, this is exemplary behaviour.

Going the other way, sending from gmail with the same subject line is a rather different story:

Subject: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
 aaaaaaaa

Gmail is doing two things wrong. Firstly, the subject line has been truncated, the value part has been chopped at 997 chars - but they didn't take the length of the Subject: label into account, giving a total length of 1,007 chars. That then got force-folded at 999 chars by inserting an extra space and a line break before the leftover aaaaaaaa. When displayed in Apple Mail, the subject line does indeed show the inserted space, as expected:

image

(you have no idea how wide I had to make the window to see that!)

This is pretty broken, but then it's gmail, so no great surprise there.

So then I thought I'd see how SwiftMailer deals with it, and... it doesn't. It makes no attempt to do anything at all, and simply allows a header to exceed 998 chars, like this:

Subject: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

In other words, having long header lines "work" in SwiftMailer is entirely down to forgiveness by mail servers and clients.

So, it seems that RFC2047 is the way to go... I was having a think about that too - there's a similar situation in message bodies: When Apple Mail encounters a message body with lines longer than 998 chars it switches the CTE from the default 8bit to quoted-printable so that it can insert line breaks without altering the content. I think that's a great idea so I copied it in PHPMailer a while back. Using RFC2047 encoding for headers is actually something that PHPMailer does automatically already, but only if a header contains 8-bit characters, where it's needed to encode UTF-8 (or other 8-bit charsets) correctly - I'd not thought of using it as a way to solve excessive line length though.

@blaaat
Copy link
Author

blaaat commented Sep 3, 2019

Nice work! Apple mail must have changed this since aug. 2018 when I tested this! Seems like the solution 👍

Synchro pushed a commit that referenced this issue Sep 25, 2019
* Always Q-encode headers exceeding maximum length

Previously, headers exceeding the maximum line length without
any special characters were only folded. This lead to problems
with long filenames (#1469) and long headers in general (#1525).

Now, long headers are always Q-encoded (and still folded).

* Use ASCII as Q-encoding charset if applicable

Previously, headers were Q-encoded using the message
charset, e.g. UTF-8. This is excessive for ASCII
values, as it requires a unicode engine.

Now, we use ASCII if we only find 7-bit characters.

* Separate header encoding from encoding selection

* Use ASCII for B-encoding as well

* Refactor max line length calculation

Previously, we calculated the maximum
line length for header encoding both
for B- and Q-encoding, even though
they share the same limits.

Now, we calculate these once for both.
@Synchro
Copy link
Member

Synchro commented Sep 25, 2019

@blaaat This should be fixed in master now - can you give it a try please?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants