Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

phpword v0.13.1 corrupting docx file after saving #1121

Open
Zxurian opened this issue Aug 14, 2017 · 10 comments
Open

phpword v0.13.1 corrupting docx file after saving #1121

Zxurian opened this issue Aug 14, 2017 · 10 comments

Comments

@Zxurian
Copy link

Zxurian commented Aug 14, 2017

( This was all working with v0.12, however we upgraded our server to PHP 7, and as a result, had to update phpword to v0.13. Since we use a later Zend Framework, I also had to fork and modify the phpword repo to not require specific zendframework/zendframework: 2.4 but instead zendframework/zendframework: ^2 )

I have a .docx file that I've created that has several ${xxx} phpword placeholders in it. With the upgrade to phpword v0.13, saving the files is now corrupting them. Trying to open the document complains of a mismatched tag (xml below), and even after correcting the mismatch, the document itself still has malformed formatting from the original.

Code for template variable swapping:

        $template = new PhpWord\TemplateProcessor($this->templatePath);
        foreach ($this->transposeVars() as $key => $value) {
            $template->setValue($key, $value);
        }
        $template->saveAs($fileName);

Original document.xml

<wps:txbx><w:txbxContent><w:p w:rsidR="00B47627" w:rsidRPr="00905ACC" w:rsidRDefault="00B47627" w:rsidP="00B47627"><w:pPr><w:spacing w:after="0"/><w:rPr><w:rFonts w:ascii="Open Sans" w:hAnsi="Open Sans" w:cs="Open Sans"/><w:sz w:val="52"/><w:szCs w:val="52"/></w:rPr></w:pPr><w:r w:rsidRPr="00905ACC"><w:rPr><w:rFonts w:ascii="Open Sans" w:hAnsi="Open Sans" w:cs="Open Sans"/><w:sz w:val="52"/><w:szCs w:val="52"/></w:rPr><w:t>{{$pub</w:t></w:r><w:r w:rsidR="00643EFB"><w:rPr><w:rFonts w:ascii="Open Sans" w:hAnsi="Open Sans" w:cs="Open Sans"/><w:sz w:val="52"/><w:szCs w:val="52"/></w:rPr><w:t xml:space="preserve">               </w:t></w:r><w:r w:rsidRPr="00905ACC"><w:rPr><w:rFonts w:ascii="Open Sans" w:hAnsi="Open Sans" w:cs="Open Sans"/><w:sz w:val="52"/><w:szCs w:val="52"/></w:rPr><w:t>}}</w:t></w:r></w:p><w:p w:rsidR="00B47627" w:rsidRDefault="00B47627" w:rsidP="00B47627"><w:pPr><w:spacing w:after="0"/><w:rPr><w:rFonts w:ascii="Open Sans" w:hAnsi="Open Sans" w:cs="Open Sans"/><w:sz w:val="20"/><w:szCs w:val="20"/></w:rPr></w:pPr><w:r><w:rPr><w:rFonts w:ascii="Open Sans" w:hAnsi="Open Sans" w:cs="Open Sans"/><w:sz w:val="20"/><w:szCs w:val="20"/></w:rPr><w:t>My Company</w:t></w:r></w:p><w:p w:rsidR="00B47627" w:rsidRDefault="00B47627" w:rsidP="00B47627"><w:pPr><w:spacing w:after="0"/><w:ind w:right="45"/><w:rPr><w:rFonts w:ascii="Open Sans" w:hAnsi="Open Sans" w:cs="Open Sans"/><w:sz w:val="20"/><w:szCs w:val="20"/></w:rPr></w:pPr><w:r><w:rPr><w:rFonts w:ascii="Open Sans" w:hAnsi="Open Sans" w:cs="Open Sans"/><w:sz w:val="20"/><w:szCs w:val="20"/></w:rPr><w:t>My Division</w:t></w:r></w:p><w:p w:rsidR="00B47627" w:rsidRDefault="00B47627" w:rsidP="00B47627"><w:pPr><w:spacing w:after="0"/><w:rPr><w:rFonts w:ascii="Open Sans" w:hAnsi="Open Sans" w:cs="Open Sans"/><w:sz w:val="20"/><w:szCs w:val="20"/></w:rPr></w:pPr><w:r><w:rPr><w:rFonts w:ascii="Open Sans" w:hAnsi="Open Sans" w:cs="Open Sans"/><w:sz w:val="20"/><w:szCs w:val="20"/></w:rPr><w:t>My Address</w:t></w:r></w:p><w:p w:rsidR="00B47627" w:rsidRDefault="00B47627" w:rsidP="00B47627"><w:pPr><w:spacing w:after="0"/><w:rPr><w:rFonts w:ascii="Open Sans" w:hAnsi="Open Sans" w:cs="Open Sans"/><w:sz w:val="20"/><w:szCs w:val="20"/></w:rPr></w:pPr><w:r><w:rPr><w:rFonts w:ascii="Open Sans" w:hAnsi="Open Sans" w:cs="Open Sans"/><w:sz w:val="20"/><w:szCs w:val="20"/></w:rPr><w:t>My Address 2</w:t></w:r></w:p><w:p w:rsidR="00B47627" w:rsidRPr="004167CD" w:rsidRDefault="00B47627" w:rsidP="00B47627"><w:pPr><w:spacing w:after="0"/><w:rPr><w:rFonts w:ascii="Open Sans" w:hAnsi="Open Sans" w:cs="Open Sans"/><w:sz w:val="20"/><w:szCs w:val="20"/></w:rPr></w:pPr></w:p></w:txbxContent></wps:txbx>

Corrupted after saving document.xml (notice the mismatched <wps:txbx> tag, as well as for some reason repeating and concatenating text earlier on in the xml.

<wps:txbx><w:txbxContent><w:p w:rsidR="00B47627" w:rsidRPr="00905ACC" w:rsidRDefault="00B47627" w:rsidP="00B47627"><w:pPr><w:spacing w:after="0"/><w:rPr><w:rFonts w:ascii="Open Sans" w:hAnsi="Open Sans" w:cs="Open Sans"/><w:sz w:val="52"/><w:szCs w:val="52"/></w:rPr></w:pPr><w:r w:rsidRPr="00905ACC"><w:rPr><w:rFonts w:ascii="Open Sans" w:hAnsi="Open Sans" w:cs="Open Sans"/><w:sz w:val="52"/><w:szCs w:val="52"/></w:rPr><w:t>{{$pub               }}My CompanyMy DivisionMy AddressMy Address 2{{$pub               }}</w:t></w:r></w:p><w:p w:rsidR="00B47627" w:rsidRDefault="00B47627" w:rsidP="00B47627"><w:pPr><w:spacing w:after="0"/><w:rPr><w:rFonts w:ascii="Open Sans" w:hAnsi="Open Sans" w:cs="Open Sans"/><w:sz w:val="20"/><w:szCs w:val="20"/></w:rPr></w:pPr><w:r><w:rPr><w:rFonts w:ascii="Open Sans" w:hAnsi="Open Sans" w:cs="Open Sans"/><w:sz w:val="20"/><w:szCs w:val="20"/></w:rPr><w:t>My Company</w:t></w:r></w:p><w:p w:rsidR="00B47627" w:rsidRDefault="00B47627" w:rsidP="00B47627"><w:pPr><w:spacing w:after="0"/><w:ind w:right="45"/><w:rPr><w:rFonts w:ascii="Open Sans" w:hAnsi="Open Sans" w:cs="Open Sans"/><w:sz w:val="20"/><w:szCs w:val="20"/></w:rPr></w:pPr><w:r><w:rPr><w:rFonts w:ascii="Open Sans" w:hAnsi="Open Sans" w:cs="Open Sans"/><w:sz w:val="20"/><w:szCs w:val="20"/></w:rPr><w:t>My Division</w:t></w:r></w:p><w:p w:rsidR="00B47627" w:rsidRDefault="00B47627" w:rsidP="00B47627"><w:pPr><w:spacing w:after="0"/><w:rPr><w:rFonts w:ascii="Open Sans" w:hAnsi="Open Sans" w:cs="Open Sans"/><w:sz w:val="20"/><w:szCs w:val="20"/></w:rPr></w:pPr><w:r><w:rPr><w:rFonts w:ascii="Open Sans" w:hAnsi="Open Sans" w:cs="Open Sans"/><w:sz w:val="20"/><w:szCs w:val="20"/></w:rPr><w:t>My Address</w:t></w:r></w:p><w:p w:rsidR="00B47627" w:rsidRDefault="00B47627" w:rsidP="00B47627"><w:pPr><w:spacing w:after="0"/><w:rPr><w:rFonts w:ascii="Open Sans" w:hAnsi="Open Sans" w:cs="Open Sans"/><w:sz w:val="20"/><w:szCs w:val="20"/></w:rPr></w:pPr><w:r><w:rPr><w:rFonts w:ascii="Open Sans" w:hAnsi="Open Sans" w:cs="Open Sans"/><w:sz w:val="20"/><w:szCs w:val="20"/></w:rPr><w:t>My Address 2</w:t></w:r></w:p><w:p w:rsidR="00B47627" w:rsidRPr="004167CD" w:rsidRDefault="00B47627" w:rsidP="00B47627"><w:pPr><w:spacing w:after="0"/><w:rPr><w:rFonts w:ascii="Open Sans" w:hAnsi="Open Sans" w:cs="Open Sans"/><w:sz w:val="20"/><w:szCs w:val="20"/></w:rPr></w:pPr></w:p></w:txbxContent></v:textbox>

Want to back this issue? Post a bounty on it! We accept bounties via Bountysource.

@VickG
Copy link

VickG commented Aug 30, 2017

+1 im having the same issue with
"phpoffice/phpword": "^0.13.0",

@VickG
Copy link

VickG commented Sep 8, 2017

I have the exact same problem.

I load a docx file and I do NOT do ANY modifications to it. I just load it and save it again, and it gets corrupted. Original file opens in Word just fine, but after running it through PHPWord and saving it, it gets corrupted.

I am hoping to get this resolved as it's affecting me. Is there any way I can help or buy you some beer for fixing it? :)

@FBnil
Copy link

FBnil commented Oct 4, 2017

@VickG Are you also using the TemplateProcessor? Because if not that would imply code from another part of PHPWord (like $phpWord = new Word2007();). Can you create a small document with the problem and share it so I can take a look at it?

@Zxurian The string you posted does not contain any ${var} only a {{$pub }}, and it does not match anything I could give with setValue(), which, the way you use it, uses str_replace(), and the XML string is left untouched, even after saving and re-reading it.

$templateProcessor->tempDocumentMainPart = $ORG;
$templateProcessor->setValue("key","val");
$templateProcessor->setValue('{{$pub}}',"val"); // Just in case...
$templateProcessor->saveAs(storage_path('app/public/bad.docx'));
$t = new OpenTemplateProcessor(storage_path('app/public/bad.docx'));
$NEW = $t->tempDocumentMainPart;
return ['org'=> $ORG, 'new'=>$NEW,'result'=>($ORG==$NEW)];

To you also, can you share a small document to test?
n.b.: The tempDocumentMainPart and other properties are not private with OpenTemplateProcessor(), otherwise it is the same as TemplateProcessor()

@Zxurian
Copy link
Author

Zxurian commented Oct 4, 2017

@FBnil The {{$pub }} is not intended to be replaced via phpword and should be taken as straight text.

The original document XML is just a section that didn't have any phpword placeholders to be replaced, but you can see in the XML that phpword still modified the contents.

@Zxurian
Copy link
Author

Zxurian commented Oct 4, 2017

I'll see if I can strip the original document back to a small section that I can share out to test with.

@FBnil
Copy link

FBnil commented Oct 4, 2017

I suspect fixBrokenMacros() with the following code:

$fixedDocumentPart = preg_replace_callback(
            '|\$[^{]*\{[^}]*\}|U',
            function ($match) {
                return strip_tags($match[0]);
            },
            $fixedDocumentPart
        );

It actually matches any { that comes an arbitrary distance after $, thus:

The apples were $2 each and I bought {several} of them.

Would match $2 each and I bought {several} and make the text:

The apples were ${several} of them.

With any markup in between gone...
The reason is that one often writes ${} then comes back to write the text inside, and so, such an action would yield:

<w:r>
        <w:rPr>
        </w:rPr>
        <w:t>${</w:t>
      </w:r>
      <w:r>
        <w:rPr>
        </w:rPr>
        <w:t>cool</w:t>
      </w:r>
      <w:r>
        <w:rPr>
        </w:rPr>
        <w:t>}</w:t>
      </w:r>

for the tag ${cool}. (Libreoffice, no tag element properties). So with it, it would make the tag visible for matching. So we could make it accept only breaks where the break starts with an open tag, so that limits a bit the false positives, and thus file corruption.

Not sure if something like this would work:

\$(?:<[^{]*)?\{<(?:[^}]+)\}

@FBnil
Copy link

FBnil commented Oct 4, 2017

@Zxurian @VickG can you edit TemplateProcessor.php function fixBrokenMacros() and edit the regexp like so:

$fixedDocumentPart = preg_replace_callback(
            '|\$(?:<[^{]*)?\{[^}]*\}|U',
            function ($match) {
                return strip_tags($match[0]);
            },
            $fixedDocumentPart
        );

It passes the unit tests... but will it fix your problem?

@FBnil
Copy link

FBnil commented Oct 18, 2017

@Zxurian @VickG I bit the bullet and rewrote the whole part, fixBrokenMacros can you test if your documents still gets corrupted after open/save with that new TemplateProcessor.php file?

@Zxurian
Copy link
Author

Zxurian commented Nov 1, 2017

@FBnil okay, finally had some time to test this. I merged your fixBrokenMacros into 0beeb27 and tested it against the documents that were giving issues previously. Now all of them are having template variables swapped without issue.

I'm sure I'm too small of a sample size, but the specific issue I was having with tags being broken/malformed is no longer happening for me. Thanks!

@filmo
Copy link

filmo commented Dec 4, 2018

Hi, is this fix going to get merged into the master? I'm having same problem processing documents via templating.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

4 participants