-
-
Notifications
You must be signed in to change notification settings - Fork 9.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Long header folding adds additional spaces #1525
Comments
I'm not clear what you mean. Headers are folded by adding line breaks and white space at the start of continuation lines, so this is doing:
For example, if
to be folded as:
If you don't add spaces, it's not folding, so what are you proposing to do instead? |
I'm not really sure what's correct according to the RFC (which describes that folding should only happen before WSP). But in the example given:
No additional space is introduced (there is no space after This). I tested with both Apple mail (which displays the additional spaces in subject) as the mailparse libraries (which also outputs the spaces). When folding without introducing additional spaces (just CRLF) output is as expected in both clients. Also when using Apple Mail or the Gmail web client to send a long header, folding occurs without additional spaces. Since it's not absolutely clear that spaces should be introduced while folding, I would propose to match the interpretation of other clients/senders. |
I see what you mean - but also I can't see a way to preserve spaces in the original header line that would not result in breaking lines that should not have spaces - folding/unfolding should be lossless, regardless of whether lines contain spaces or not. DKIM signatures can be several KB long and often have no "higher-level syntactic breaks" within the permitted line length, so not adding spaces while folding would result in corrupt headers, and not removing spaces when unfolding would render them useless. There may be some other mechanism at work that's not covered by 5322:2.3.3 though - here's the DKIM header from the notification email I received about your last post:
This has two levels of folding! The |
Complex! To give the counter example; List-Unsubscribe headers may contain long URLs which should not contain spaces in the middle of the url after unfolding. I noticed Apple Mail even adds a new iine directly after the : (and the header name/value separator space is on the next line)
I don't know what's the correct solution here either. |
I think the way of folding is perfectly fine for critical headers like DKIM, but causes issues for other kind of headers. Maybe there should be 2 kind of folding-methods? One for the critical headers where additional spaces are needed to guarantee proper working. The other one without additional spaces. I'd like to add another example of the breaking and adding spacing causing issues: |
Yesterday I saw a new case in which the additional space causes issues: Because the subjects of the messages my system sends out can contain UTF-8 characters which didn't seem to be escaped correctly, I base64-encode the subjects of all messages (RFC2047:2).
Yesterdays case: one of our clients got the literal base64-encoded string as subject because there was an additional space which his mailclient didn't delete.
|
@JelleSFS Don't do that - PHPMailer knows how to encode UTF-8 in headers, so when you do that it will get double-encoded. |
I'll try again without, but had some issues with some characters before. Maybe I have made some mistakes during implementation. But it still leaves the space-issue. Without the space, it probably would've been decoded properly. (We've sent thousands of messages with the described technique and have only seen a handful errors) |
I can confirm problem with folding. Value of header option List-Unsubscribe contains space after 76 character. Confirmed by Gmail show source option. Unsubscribe link is unusable. PHPMailer 6.0.7 |
Heh, I'm having exactly the same problem as @eKrajnak -- I'm adding a simple "List-Unsubscribe" header to my outgoing message. The header is 93 characters or so. At approximately character 72, PHP Mailer is inserting a single space. Obviously this breaks things. I can confirm this by viewing the raw headers in gmail. And of course Gmail doesn't show unsubscribe link at all. Ugh. How can I help solve this? I don't know much about the RFCs but I can code... is this an issue where we can fold a long header with a CRLF instead of space? Or something? ;-) |
The act of folding is to break a line at the line limit by inserting a line break (usually CRLF) and one or more whitespace characters. Servers and clients know that there should be no whitespace at the start of any header line unless it's been folded, so they can detect it and apply the reverse operation: remove the leading whitespace, remove the line break. As you can see, this should be lossless as the operations are symmetric, however, there is a further complication. What I just said is about the format of the message itself; folding can also occur independently at the SMTP level, invoking folding on top of folding. The problem then is that you can't tell which level did the folding, and it's also possible to use different whitespace characters at each level, so you could end up with:
The RFCs suggest folding at word boundaries (e.g. where there are already spaces), but when you have things like long addresses or DKIM signatures that have no such "natural breaks", there is no choice but to break in the middle of originally unbroken text. What's not clear to me (and I suspect at the root of the problem) is what you do with natural spaces since you can't distinguish them from folding whitespace. If you have a line like:
that gets folded like this:
when you unfold it, you end up with:
So you say, "aha, so we should add a space when unfolding!" But that doesn't work with lines that should not have spaces added because it breaks them. What's also unclear is whether any trailing space should be preserved after folding, as that might help with this. Any input about how to handle these conflicting situations? I suspect the solution is distributed across the the small print of several RFCs... |
In rfc5322:2.2.3 (and rfc2822:2.2.3) the text mentions:
I don't read adding one or more additional whitespace characters is allowed he. Just that the general rule is that a CRLF is inserted before any existing whitespace. As I mentioned earlier; other mail clients break at any point if there is no suitable whitespace to break on.
I think so; since you can add the CRLF before any whitespace. |
Are you sure this space is not used to calculate the DKIM signature? According to the List-Unsubscribe RFC; leading whitespace is allowed and all examples include leading white space. Maybe the DKIM signatures are currently generated wrongly, possible related: #1563 #1469 #1406 #1352 |
That DKIM example is nothing to do with the format of the list-unsubscribe header? That It's all very painful! |
I wasn't taking about the DKIM fold, but that you mentioned that the DKIM signature would be invalid when unfolding would re-introduce the space that exists in the List-Unsubscribe header sent by Github. My guess would be that adding a space when it doesn't exists breaks DKIM and that the unfolded GitHub List-unsubscribe header contains a comma+space separator. |
FWIW this is my full and complete List-Unsubscribe header:
(I can optionally add a space and a comma and a second way to unsubscribe, but it isn't required; what I show above is valid.) Try it with phpmailer and gmail, you'll see how a space is dropped in.... |
BTW I tried it adding that header to a test mail sent with Swift Mailer and it works: no space is inserted, gmail is happy. Perhaps one way to solve this problem would be just to see what they're doing-- ? (Sorry if that's not cool to say! Just trying to help identify a quick solution.) |
I'm not gonna discuss, where is problem (RFC vs. Gmail vs. PHPMailer). But to provide simple solution for @pnoeric I have made simple patch. Then you just need:
|
Thank you! |
I've pushed a revision of DKIM header canonicalisation. There were a couple of likely sources of this problem:
I also added tests that cover long headers (like the List-Unsubscribe example mentioned above), runs of spaces, trailing space. Please give it a go and tell me if it improves your outcomes! |
Thanks. I'm not using DKIM to sign my messages, so I can't tell if this fixes the issues. My understanding of the issue was that DKIM signatures were rejected because the receiving end would unfold headers differently. Your tests for headers looks good, but the same needs to apply to the normal headers being sent out. The current master still adds spaces after each CRLF, the RFC does not mention that these spaces are required or even allowed. Please test with: and send to any mail client. I guess the above; and how folding inside the DKIM header (and assumed unfolding) should match each other to validate DKIM signatures. |
This appears to be a well-known can of worms... This article describes exactly this problem - which is that RFC5322 simply doesn't allow for headers that lack spaces to break on! The only reasonable (but cumbersome) way I've seen to work around this is to use RFC2047 encoding (which will effectively convert an unwrappable line into a wrappable one) – but I've never seen that used in the wild for DKIM. The DKIM example header above sneaks through 5322 by adding spaces in the I've also realised that my definition of folding isn't quite correct - it's not that folded lines get This also means the DKIM unfolding change I just made isn't quite right. Gotta love those RFCs... |
Interesting read! Thanks for sharing.
RFC2822:
rfc2119 defines that the SHOULD keyword that indicates this restriction may be ignored in some cases. A header without any higher level breaks (spaces, commas or something else?) appears to be one. Otherwise a MUST keyword should have been used in RFC2822 In any case, I think PHPMailer should do what is supported by the majority of e-mail clients. |
The problem with that
So what is that, exactly? It seems to me there are only two choices: add spaces when folding headers lacking breaks, or don't. I've still not found a good explanation for the nested folding that DKIM example uses. The big problem is when you get very long header fields (e.g. DKIM with 4096-bit keys or long URLs) that can exceed even the 998-char limit, you must fold at an arbitrary point - there is no other choice. I think the RFC2047 approach is workable, doesn't break anything and should work everywhere, but it's quite ugly, and I also know that it will be painful to do in PHPMailer, even though it already has functions for encoding headers that way... |
I can't find a clear definition in the RFC's either.
Gmail and Apple mail both fold without adding additional spaces and add PHPMailer's spaces during unfolding. Mailparse unfolds the same. I haven't tested other mail clients yet. IMO folding/unfolding should be lossless (so no extra spaces) Both Gmail and Apple mail fold without RFC2047 encoding. As this is apparently supported in the wild I wound't worry too much about encoding this in RFC2047. As for DKIM; I've no experience or opinion about this; but viewed some of my inbox. Lots of variations with the second level fold. (one space, two spaces and no space at all). |
You can't fold unbroken text without adding spaces or its not folding, and can't be undone. For example if you fold this:
as
it doesn't look like a folded line, more like a corrupted header, and clients won't know what to do with it. I just did a very useful test, sending a message from Apple Mail to gmail with a subject line of 1,386 'a's. Here's what happened:
This doesn't kick in until you hit the 998 char limit - it certainly doesn't happen when the length hits 76 chars (which is what the above lines are wrapped to, exactly as per RFC). As far as I can see, this is exemplary behaviour. Going the other way, sending from gmail with the same subject line is a rather different story:
Gmail is doing two things wrong. Firstly, the subject line has been truncated, the value part has been chopped at 997 chars - but they didn't take the length of the (you have no idea how wide I had to make the window to see that!) This is pretty broken, but then it's gmail, so no great surprise there. So then I thought I'd see how SwiftMailer deals with it, and... it doesn't. It makes no attempt to do anything at all, and simply allows a header to exceed 998 chars, like this:
In other words, having long header lines "work" in SwiftMailer is entirely down to forgiveness by mail servers and clients. So, it seems that RFC2047 is the way to go... I was having a think about that too - there's a similar situation in message bodies: When Apple Mail encounters a message body with lines longer than 998 chars it switches the CTE from the default |
Nice work! Apple mail must have changed this since aug. 2018 when I tested this! Seems like the solution 👍 |
* Always Q-encode headers exceeding maximum length Previously, headers exceeding the maximum line length without any special characters were only folded. This lead to problems with long filenames (#1469) and long headers in general (#1525). Now, long headers are always Q-encoded (and still folded). * Use ASCII as Q-encoding charset if applicable Previously, headers were Q-encoded using the message charset, e.g. UTF-8. This is excessive for ASCII values, as it requires a unicode engine. Now, we use ASCII if we only find 7-bit characters. * Separate header encoding from encoding selection * Use ASCII for B-encoding as well * Refactor max line length calculation Previously, we calculated the maximum line length for header encoding both for B- and Q-encoding, even though they share the same limits. Now, we calculate these once for both.
@blaaat This should be fixed in master now - can you give it a try please? |
If sending a long header without spaces, header folding is incorrectly.
Headers should fold with a CRLF, but after folding the CRLF is replaced with a LF and a space is introduced:
This is correct; but
corrupts the folding; order should be different.
The text was updated successfully, but these errors were encountered: