Block clone suddenly stopped working after applying formatting #867

judgej · 2016-08-18T16:05:26Z

I had a simple block that covered an entire page, and was cloned N times right at the start before substitution variables were filled in with data:

${PAGE}
...content with ${variables}...
<page break>
${/PAGE}

Opening as a template, cloning, putting in data and saving working great.

Then I tried adding some formatting to the source document, starting by setting the entire document to Arial, and the ${PAGE} tags are no longer recognised at all.

Looking in the template source, the lines with the tags had font formatting applied, so I removed that (using the ctrl-space technique that has been around since MS Word 2.0). This gives me a source now without the formatting, that looks like this (saved from MS Word):

<?xml ...?>
    <w:document ...>
        <w:body>
            <w:p ...>
                <w:r>
                    <w:t>${PAGE}</w:t>
                </w:r>
            </w:p>
            ...content...
            <w:r w:rsidR="00D212DB">
                <w:lastRenderedPageBreak/>
                <w:t>${/PAGE}</w:t>
            </w:r>
    ...

Want to back this issue? Post a bounty on it! We accept bounties via Bountysource.

The text was updated successfully, but these errors were encountered:

judgej · 2016-08-18T16:13:17Z

The cloneBlock() method of the template processor is looking for this:

'/(<\?xml.*)(<w:p.*>\${' . $blockname . '}<\/w:.*?p>)(.*)(<w:p.*\${\/' . $blockname . '}<\/w:.*?p>)/is'

That's crudely looking for a <w:p> element wrapping around the block name tag. It does not work because MS Word wraps it in a <w:r><w:t> pair instead. Except it did match, so it seems removing formatting from a line leaves additional layers of elements in.

judgej · 2016-08-18T16:22:45Z

So I guess what needs to happen is a search for the clone tags in the XML structure, then walking up the tree to the <w:p> elements, then duplicating those elements and everything within them assuming they have a common parent (which should be <w:body>).

Of course NONE of that can be done with a REGEX. The document source is structured XML and needs to be treated as such.

Here is the source of my document that is failing to match the ${PAGE} tags:

judgej · 2016-08-18T16:54:56Z

Trying this preg match offline to see if I can see the problem, it seems that it is this at the start that is causing it to not match:

(<\?xml.*)

That has me totally baffled, because that does match everything when used on its own '/(<\?xml.*)/is'. I guess it is being too "greedy" and eating up the whole XML and not leaving anything for the remaining rules to match. Replacing that part of the REGEX with (>) fixes it for me temporarily (I'm a NASTY DIRTY HACK!) but I'm going to have to revisit this fix.

judgej · 2016-08-19T15:12:36Z

Also finding the regex replacements that setValue() uses can be extremely slow. For a 100 page document, it is taking about 0.25 seconds to replace about 20 substitution variables on each page in the document, so that 100 page document is taking a good 15 seconds to generate. I do think there has to be something faster - perhaps just a plain string replace?

Looking at it, a REGEX replace is performed if only replacing some of the fields, otherwise a string replace is done. Ouch

Edit: I just tried a string replace, using strpos() and substr(), which worked, but took about the same amount of time as the preg_replace(). I guess it involves less complex processing, but then more data to move around (the XML string for the 100 pages is about 8Mbyte, so chopping that into smaller strings around the insert text to replacement it, is not fast, and probably uses more memory). This is what I was trying, just for information:

    protected function setValueForPart($search, $replace, $documentPartXML, $limit)
    {
        // Note: we can't use the same function for both cases here, because of performance considerations.
        if (self::MAXIMUM_REPLACEMENTS_DEFAULT === $limit) {
            return str_replace($search, $replace, $documentPartXML);
        } else {
            if (is_string($documentPartXML)) {
                for($i = 1; $i <= $limit; $i++) {
                    $pos = strpos($documentPartXML, $search);

                    if ($pos === false) {
                        return $documentPartXML;
                    }

                    // Chop the string before and after the search text, then join it back
                    // together around the replace text. Still slow for 8M documents!
                    $documentPartXML = substr($documentPartXML, 0, $pos)
                        . $replace .
                        substr($documentPartXML, $pos + strlen($search));
                }

                return $documentPartXML;
            }

            $regExpEscaper = new RegExp();
            return preg_replace($regExpEscaper->escape($search), $replace, $documentPartXML, $limit);
        }
    }

FBnil · 2017-10-08T19:09:53Z

@judgej About the <xml matching, it hooks there to match only once (preg_match() otherwise would match multiple times). The xml line is in a separate \n line, thus . does not match that (/s for single line fixes that). So Write '/(<\?xml[\s\S]*) instead. I'm planning to remove the regexp for blocks out all together.

But the solution for your block cloning problem is to delete the closing tag, press enter, and write it again. This will ensure it being again inside 1 <w:p>.

JackEllis · 2017-10-23T00:40:51Z

@FBnil Big fan of you right now, your code fixed the clone block for me!

heroyan · 2018-12-24T07:36:42Z

I enconter this problem

judgej mentioned this issue Aug 18, 2016

deleteBlock / replaceBlock doesn't work properly #341

Open

czosel mentioned this issue Apr 23, 2018

Use XML API instead of string replacement in cloneBlock and replaceBlock #1354

Open

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Block clone suddenly stopped working after applying formatting #867

Block clone suddenly stopped working after applying formatting #867

judgej commented Aug 18, 2016 •

edited by Progi1984

Loading

judgej commented Aug 18, 2016

judgej commented Aug 18, 2016

judgej commented Aug 18, 2016

judgej commented Aug 19, 2016 •

edited

Loading

FBnil commented Oct 8, 2017 •

edited

Loading

JackEllis commented Oct 23, 2017

heroyan commented Dec 24, 2018

Block clone suddenly stopped working after applying formatting #867

Block clone suddenly stopped working after applying formatting #867

Comments

judgej commented Aug 18, 2016 • edited by Progi1984 Loading

judgej commented Aug 18, 2016

judgej commented Aug 18, 2016

judgej commented Aug 18, 2016

judgej commented Aug 19, 2016 • edited Loading

FBnil commented Oct 8, 2017 • edited Loading

JackEllis commented Oct 23, 2017

heroyan commented Dec 24, 2018

judgej commented Aug 18, 2016 •

edited by Progi1984

Loading

judgej commented Aug 19, 2016 •

edited

Loading

FBnil commented Oct 8, 2017 •

edited

Loading