-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ampersand (&) in HTML corrupts word document file #1500
Comments
I've had a quick look and the html normalization seems a little too simplistic as well. Instead of EDIT |
The following is normal it fails, as this is not valid HTML. Html::addHtml($section, '&', false, false); The following works fine Html::addHtml($section, '&', false, false); As for the bug with the addText function, indeed, this should be escaped when writing to XML, but I don't think the Element/Text class is the place to do this. You will break the RTF writer. This should be escaped in the Word2007 writer instead. |
When the dom element is created in php, & is converted to just I am not sure what the final document formats look like, I've not researched that before so am not sure what breaking changes there are between output formats. Could you link me to the function that you'd like me to move the fix to? I'm happy to do that if you'd like - I'm also not precious about the fix should you want to move it to where you think it is most appropriate. Thanks for your time on this. |
Hi, got the same error here. docx file is corrupt in both cases. |
After looking for more information, found that use |
Oh wow, that's a bit strange - I wonder why a user would not escape the output, are there any side-effects? |
@silverbackdan Indeed, ideally, the setting should be set to |
I confirm this bug. |
I realise this is an old bug, but I spent a couple of hours chasing it today so thought I'd resurface it. if I send '&' in the html it should go all the way through to Word, but it gets converted back to a '&' and in a Word doc that corrupts it. I tried finding how the conversion happens, but cant work it out. I can see the '&' in the html at line 82 of Html.php, but in the call to parseNode it is suddenly a '&'. setOutputEscapingEnabled(true); does solve the issue though. |
This is:
Expected Behavior
Correct generation of a word file
Current Behavior
A word file is generated but cannot be opened because it is corrupted.
Failure Information
When adding HTML, if there is an ampersand (just an & character) the output document is corrupt. It is required to turn the ampersand into html encoded
&
How to Reproduce
Context
EDIT:
My mistake - reproduction can be with either
OR
HTML character codes such as
"
work fineThe text was updated successfully, but these errors were encountered: