Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

deleteBlock / replaceBlock doesn't work properly #341

Open
timothe opened this issue Aug 20, 2014 · 29 comments
Open

deleteBlock / replaceBlock doesn't work properly #341

timothe opened this issue Aug 20, 2014 · 29 comments

Comments

@timothe
Copy link

timothe commented Aug 20, 2014

Hi,

PHPWord is a terrific solution for my needs. Unfortunately I'm not able to use the "deleteBlock" feature since PHPWord doesn't detect the blocks in my files.
I raised a ticket in StackOverflow: http:https://stackoverflow.com/questions/25402045/regexp-in-a-word-xml-why-does-it-not-match
Basically I achieved to understand the regexp itself doesn't detect the block in the docx.
Any help would be greatly appreciated.

Cheers


Want to back this issue? Post a bounty on it! We accept bounties via Bountysource.

@STiTCHi
Copy link

STiTCHi commented Sep 30, 2014

I have the same issue with this.. Has anyone worked it out?

@timothe
Copy link
Author

timothe commented Sep 30, 2014

On my side I just created different templates... no other choice.
On StOv they pointed out that RegExp is not the best option to parse XML... I guess it's accurate :)

@STiTCHi
Copy link

STiTCHi commented Sep 30, 2014

Thanks for the reply, Only I don't think i'm going to be able to create different templates as the client has specific requirements.

The strange bit about it is if I copy the whole template from the 'Sample_23_TemplateBlock.docx' and paste it at the end of my template the DELETEME block gets removed.

But if I paste it in the middle of my template the DELETEME block isn't removed?..

So I guess I will have to look at other ways to phrase the xml to remove the blocks any help in the right direction would be great thanks.

@chc88
Copy link
Contributor

chc88 commented Nov 12, 2014

I have found a fix for the cloneBLock regex/function which seems to fix this problem in my numerous cases.

/**
 * Clone a block
 *
 * @param string $blockname
 * @param integer $clones
 * @param boolean $replace
 * @return string|null
 */
public function cloneBlock($blockname, $clones = 1, $replace = true)
{
    $xmlBlock = null;
    preg_match(
        '/(<w:p.*>\${' . $blockname . '}<\/w:.*?p>)(.*)(<w:p.*\${\/' . $blockname . '}<\/w:.*?p>)/is',
        $this->documentXML,
        $matches
    );
    if (isset($matches[2])) {
        $xmlBlock = $matches[2];
        $cloned = array();
        for ($i = 1; $i <= $clones; $i++) {
            $cloned[] = preg_replace('/\${(.*?)}/','${$1_'.$i.'}', $xmlBlock);
        }
        if ($replace) {
            $this->documentXML = str_replace(
                $matches[1] . $matches[2] . $matches[3],
                implode('', $cloned),
                $this->documentXML
            );
        }
    }

@brad-jones
Copy link

I was still having issues with the regular expression, I just couldn't get it to match my template.
Here is my version of the method that uses SimpleXML to find the start and end tags, which for me at least seems to be much more robust. It also incorprates chc88's _1, _2, _3, etc for cloned variables.

    /**
     * Clone a block
     *
     * @param string $blockname
     * @param integer $clones
     * @param boolean $replace
     * @return string|null
     */
    public function cloneBlock($blockname, $clones = 1, $replace = true)
    {
        // Parse the XML
        $xml = new \SimpleXMLElement($this->documentXML);

        // Find the starting and ending tags
        $startNode = false; $endNode = false;
        foreach ($xml->xpath('//w:t') as $node)
        {
            if (strpos($node, '${'.$blockname.'}') !== false)
            {
                $startNode = $node;
                continue;
            }

            if (strpos($node, '${/'.$blockname.'}') !== false)
            {
                $endNode = $node;
                break;
            }
        }

        // Make sure we found the tags
        if ($startNode === false || $endNode === false)
        {
            return null;
        }

        // Find the parent <w:p> node for the start tag
        $node = $startNode; $startNode = null;
        while (is_null($startNode))
        {
            $node = $node->xpath('..')[0];

            if ($node->getName() == 'p')
            {
                $startNode = $node;
            }
        }

        // Find the parent <w:p> node for the end tag
        $node = $endNode; $endNode = null;
        while (is_null($endNode))
        {
            $node = $node->xpath('..')[0];

            if ($node->getName() == 'p')
            {
                $endNode = $node;
            }
        }

        /*
         * NOTE: Because SimpleXML reduces empty tags to "self-closing" tags.
         * We need to replace the original XML with the version of XML as
         * SimpleXML sees it. The following example should show the issue
         * we are facing.
         * 
         * This is the XML that my document contained orginally.
         * 
         * ```xml
         *  <w:p>
         *      <w:pPr>
         *          <w:pStyle w:val="TextBody"/>
         *          <w:rPr></w:rPr>
         *      </w:pPr>
         *      <w:r>
         *          <w:rPr></w:rPr>
         *          <w:t>${CLONEME}</w:t>
         *      </w:r>
         *  </w:p>
         * ```
         * 
         * This is the XML that SimpleXML returns from asXml().
         * 
         * ```xml
         *  <w:p>
         *      <w:pPr>
         *          <w:pStyle w:val="TextBody"/>
         *          <w:rPr/>
         *      </w:pPr>
         *      <w:r>
         *          <w:rPr/>
         *          <w:t>${CLONEME}</w:t>
         *      </w:r>
         *  </w:p>
         * ```
         */

        $this->documentXML = $xml->asXml();

        // Find the xml in between the tags
        $xmlBlock = null;
        preg_match
        (
            '/'.preg_quote($startNode->asXml(), '/').'(.*?)'.preg_quote($endNode->asXml(), '/').'/is',
            $this->documentXML,
            $matches
        );

        if (isset($matches[1]))
        {
            $xmlBlock = $matches[1];

            $cloned = array();

            for ($i = 1; $i <= $clones; $i++)
            {
                $cloned[] = preg_replace('/\${(.*?)}/','${$1_'.$i.'}', $xmlBlock);
            }

            if ($replace)
            {
                $this->documentXML = str_replace
                (
                    $matches[0],
                    implode('', $cloned),
                    $this->documentXML
                );
            }
        }

        return $xmlBlock;
    }

@chc88
Copy link
Contributor

chc88 commented Nov 21, 2014

Hi Brad,

Thanks for your input, I'll give it a test-run as well. I was just running into some issues with my version as well.

I'll report back!

Regards

@chc88
Copy link
Contributor

chc88 commented Dec 10, 2014

@brad-jones

I have tested your function in a series of documents.
It seems to be working very well!

Maybe put it in a Pull-Request for the project?

Regards!

@brad-jones
Copy link

I have actually taken it a little further. Checkout: https://github.com/phpgearbox/pdf

chervaliery added a commit to chervaliery/PHPWord that referenced this issue Jun 1, 2015
Implements the fonction gave by brad-jones on the issue :
PHPOffice#341 (comment)
@judgej
Copy link

judgej commented Aug 18, 2016

OMG - this thing uses regex to parse XML!?! I was searching for why my cloneBlock() suddenly stopped working after adding some formatting inside the block, and came across this issue. XML and REGEX? Makes the hairs on my neck stand up.

And looking at my document source, yes, it is the REGEX that no longer matches. MS Word puts in a few additional elements of formatting, and the the block tags can no longer be found using the REGEX built into cloneBlock(). I'm also going to assume this is never going to be fixed :-(

Edit: Raised my issue separately, with some analysis:

#867

@samdark
Copy link

samdark commented Jul 31, 2017

Duplicate of #316

@klado
Copy link

klado commented Feb 9, 2018

I'm still having issue when cloneBlock and replaceBlock don't find anything (regex tester shows catastrophic backtracking).

brad-jones's solution solves my problems

czosel pushed a commit to adfinis-forks/PHPWord that referenced this issue Apr 23, 2018
Implements the fonction gave by brad-jones on the issue :
PHPOffice#341 (comment)
@michakpl
Copy link

michakpl commented Mar 28, 2019

@klado probably you already solved that problem, but for other users which find this issue in future I found solution, preg_match used here have limit in PHP, and when document is long it can find only last block, not that one on the beginning of the document.

You have to increase that limit in php.ini file, for bigger files event limit suggested in my solution can be too small:

pcre.backtrack_limit = 53001337
pcre.recursion_limit = 53001337

PS
deleteBlock not working with LibreOffice documents, output file is corrupted

@Gadeoli
Copy link

Gadeoli commented Aug 16, 2019

Hi,

On version 0.16.0 for me i have problems with replace block and delete block (the template gets broken).

For now i just changed the replaceBlock method.

public function replaceBlock($blockname, $replacement)
    {
        preg_match(
            //removed (<\?xml.*) from beginning regex
            '/(<w:p.*>\${' . $blockname . '}<\/w:.*?p>)(.*)(<w:p.*\${\/' . $blockname . '}<\/w:.*?p>)/is',
            $this->tempDocumentMainPart,
            $matches
        );

        if (isset($matches[3])) {
            $this->tempDocumentMainPart = str_replace(
                $matches[2] . $matches[3] . $matches[4],
                $replacement,
                $this->tempDocumentMainPart
            );
        }

        //added to remove the start block sign
        $this->setValue($blockname, '');
    }

@dva-re
Copy link

dva-re commented Nov 24, 2019

deleteBlock not working with LibreOffice documents, output file is corrupted

instead deleteBlock we can use cloneBlock('block_name', 0);
It works in my case as delete and do not corrupt output file.

@tbl0605
Copy link

tbl0605 commented Dec 4, 2019

Hi,

On version 0.16.0 for me i have problems with replace block and delete block (the template gets broken).

For now i just changed the replaceBlock method.

public function replaceBlock($blockname, $replacement)
    {
        preg_match(
            //removed (<\?xml.*) from beginning regex
            '/(<w:p.*>\${' . $blockname . '}<\/w:.*?p>)(.*)(<w:p.*\${\/' . $blockname . '}<\/w:.*?p>)/is',
            $this->tempDocumentMainPart,
            $matches
        );

        if (isset($matches[3])) {
            $this->tempDocumentMainPart = str_replace(
                $matches[2] . $matches[3] . $matches[4],
                $replacement,
                $this->tempDocumentMainPart
            );
        }

        //added to remove the start block sign
        $this->setValue($blockname, '');
    }

Your code is wrong. Since you removed (<?xml.*) from beginning regex, you'll get PHP warning "Notice: Undefined offset: 4", because all occurences of $matches[2], $matches[3], $matches[4] should now respectively be $matches[1], $matches[2], $matches[3]

@mavykins
Copy link

mavykins commented Dec 6, 2019

If this helps anyone, I managed to get it to work by doing the following.

	public function replaceBlock($blockname, $replacement)
    {
	  $this->tempDocumentMainPart = preg_replace('/(\${' . $blockname . '})(.*)(\${\/' . $blockname . '})/is',$replacement,$this->tempDocumentMainPart);

      $this->setValue($blockname, '');
    }

It might of been my version of word I was saving it in, but my text was using <w:t> rather than <w:p>, the above seems to catch all eventualities.

@C4r1st
Copy link

C4r1st commented Jul 14, 2020

public function replaceBlock($blockname, $replacement)
    {
        // get all content
        $data = $this->tempDocumentMainPart;

        // searching the block's opening tag
        preg_match(
            '/(?>(<w:p\s(?:(?!<w:p\s).)*?|<w:p>(?:(?!<w:p>).)*?)\${' . $blockname . '}.*?<\/w:p>)/is',
            $data,
            $start,
            PREG_OFFSET_CAPTURE
        );

        // block not found
        if (empty($start)) {
            return $data;
        }

        $start_offset = $start[0][1];

        // document content before block's opening tag
        $header = substr($this->tempDocumentMainPart, 0, $start_offset);

        // searching the block's closing tag
        preg_match(
            '/(?>(<w:p\s(?:(?!<w:p\s).)*?|<w:p>(?:(?!<w:p>).)*?)\${' . $blockname . '}.*?<\/w:p>)/is',
            $data,
            $end,
            PREG_OFFSET_CAPTURE,
            $start_offset
        );

        // block not found
        if (empty($end)) {
            return $data;
        }

        // document content after block's opening tag
        $footer = substr($this->tempDocumentMainPart, $end[0][1] + strlen($end[0][0]));

        // combining results with replacement string
        $this->tempDocumentMainPart = $header . $replacement . $footer;

    }

@salimat
Copy link

salimat commented Aug 3, 2020

C4r1sts' solution solved my problem.

@liborm85
Copy link
Contributor

liborm85 commented Jan 3, 2021

I use this simple modification:

  public function replaceBlock($blockname, $replacement) {
    $this->tempDocumentMainPart = preg_replace(
      '/(\${' . $blockname . '})(.*?)(\${\/' . $blockname . '})/is',
      $replacement,
      $this->tempDocumentMainPart
    );
  }

@FrancoUd
Copy link

FrancoUd commented May 3, 2021

liborm85's works for me!

@mbahnmiller
Copy link

Any progress getting a fix into the codebase?

@weetgeen
Copy link

Thanks liborm85, your regez worked, changed cloneBlock to:

`

public function cloneBlock($blockname, $clones = 1, $replace = true, $indexVariables = false, $variableReplacements = null)
{
    $xmlBlock = null;
    $matches = array();
    preg_match(
        '/(\${' . $blockname . '})(.*?)(\${\/' . $blockname . '})/is',
        $this->tempDocumentMainPart,
        $matches
    );

    if (isset($matches[3])) {
        $xmlBlock = $matches[2];
        if ($indexVariables) {
            $cloned = $this->indexClonedVariables($clones, $xmlBlock);
        } elseif ($variableReplacements !== null && is_array($variableReplacements)) {
            $cloned = $this->replaceClonedVariables($variableReplacements, $xmlBlock);
        } else {
            $cloned = array();
            for ($i = 1; $i <= $clones; $i++) {
                $cloned[] = $xmlBlock;
            }
        }

        if ($replace) {
			var_dump($matches);
            $this->tempDocumentMainPart = str_replace(
                $matches[1] . $matches[2] . $matches[3],
                implode('', $cloned),
                $this->tempDocumentMainPart
            );
        }
    }

    return $xmlBlock;
}

`

@Lutifya
Copy link

Lutifya commented Dec 1, 2021

liborm85's works for me too, and as well the weetgeen's solution

@Dawid-Ohia
Copy link

Dawid-Ohia commented Aug 31, 2022

deleteBlock not working with LibreOffice documents, output file is corrupted

instead deleteBlock we can use cloneBlock('block_name', 0); It works in my case as delete and do not corrupt output file.

@dva-re solution works for me with current version (as of today it is 0.18.3).

@idem84
Copy link

idem84 commented Nov 14, 2023

The solution from liborm85 is the best, because it works with inline blocks like:

${outDateBlock}Date ${outDate}${/outDateBlock}

@kamleshwebtech
Copy link

I have downloaded latest version of PHPWord on my system but till date issue is still persist since 2014. Its weird.

@kamleshwebtech
Copy link

kamleshwebtech commented Dec 9, 2023

The solution from liborm85 is the best, because it works with inline blocks like:

${outDateBlock}Date ${outDate}${/outDateBlock}

Not worked dear for me,

Text in the 'helloWorld.docx' file as:

${block_name}This block content will be replaced${/block_name}

And in the php testing file

<?php
$templateProcessor = new \PhpOffice\PhpWord\TemplateProcessor('helloWorld.docx');

$templateProcessor->replaceBlock('block_name', 'This is the replacement text.');

$templateProcessor->saveAs('helloWorld-new.docx');

any other suggestion? Please share, Thanks.

@kamleshwebtech
Copy link

liborm85

Thanks a lot @liborm85 , you solution made my day, worked like charm and saved my life :) :)

@idem84
Copy link

idem84 commented May 29, 2024

Not worked dear for me,

I use code for delete block, not for replace block

@liborm85 solution doesn't work perfectly while delete block, after block deletion there is an empty line, line should be deleted too

For delete block with empty line, should use:
$this->cloneBlock($blockname, 0, true, true);

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests