Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Block clone suddenly stopped working after applying formatting #867

Open
judgej opened this issue Aug 18, 2016 · 7 comments
Open

Block clone suddenly stopped working after applying formatting #867

judgej opened this issue Aug 18, 2016 · 7 comments

Comments

@judgej
Copy link

judgej commented Aug 18, 2016

I had a simple block that covered an entire page, and was cloned N times right at the start before substitution variables were filled in with data:

${PAGE}
...content with ${variables}...
<page break>
${/PAGE}

Opening as a template, cloning, putting in data and saving working great.

Then I tried adding some formatting to the source document, starting by setting the entire document to Arial, and the ${PAGE} tags are no longer recognised at all.

Looking in the template source, the lines with the tags had font formatting applied, so I removed that (using the ctrl-space technique that has been around since MS Word 2.0). This gives me a source now without the formatting, that looks like this (saved from MS Word):

<?xml ...?>
    <w:document ...>
        <w:body>
            <w:p ...>
                <w:r>
                    <w:t>${PAGE}</w:t>
                </w:r>
            </w:p>
            ...content...
            <w:r w:rsidR="00D212DB">
                <w:lastRenderedPageBreak/>
                <w:t>${/PAGE}</w:t>
            </w:r>
    ...

Want to back this issue? Post a bounty on it! We accept bounties via Bountysource.

@judgej
Copy link
Author

judgej commented Aug 18, 2016

The cloneBlock() method of the template processor is looking for this:

'/(<\?xml.*)(<w:p.*>\${' . $blockname . '}<\/w:.*?p>)(.*)(<w:p.*\${\/' . $blockname . '}<\/w:.*?p>)/is'

That's crudely looking for a <w:p> element wrapping around the block name tag. It does not work because MS Word wraps it in a <w:r><w:t> pair instead. Except it did match, so it seems removing formatting from a line leaves additional layers of elements in.

@judgej
Copy link
Author

judgej commented Aug 18, 2016

So I guess what needs to happen is a search for the clone tags in the XML structure, then walking up the tree to the <w:p> elements, then duplicating those elements and everything within them assuming they have a common parent (which should be <w:body>).

Of course NONE of that can be done with a REGEX. The document source is structured XML and needs to be treated as such.

Here is the source of my document that is failing to match the ${PAGE} tags:

clipboard01

@judgej
Copy link
Author

judgej commented Aug 18, 2016

Trying this preg match offline to see if I can see the problem, it seems that it is this at the start that is causing it to not match:

(<\?xml.*)

That has me totally baffled, because that does match everything when used on its own '/(<\?xml.*)/is'. I guess it is being too "greedy" and eating up the whole XML and not leaving anything for the remaining rules to match. Replacing that part of the REGEX with (>) fixes it for me temporarily (I'm a NASTY DIRTY HACK!) but I'm going to have to revisit this fix.

@judgej
Copy link
Author

judgej commented Aug 19, 2016

Also finding the regex replacements that setValue() uses can be extremely slow. For a 100 page document, it is taking about 0.25 seconds to replace about 20 substitution variables on each page in the document, so that 100 page document is taking a good 15 seconds to generate. I do think there has to be something faster - perhaps just a plain string replace?

Looking at it, a REGEX replace is performed if only replacing some of the fields, otherwise a string replace is done. Ouch

Edit: I just tried a string replace, using strpos() and substr(), which worked, but took about the same amount of time as the preg_replace(). I guess it involves less complex processing, but then more data to move around (the XML string for the 100 pages is about 8Mbyte, so chopping that into smaller strings around the insert text to replacement it, is not fast, and probably uses more memory). This is what I was trying, just for information:

    protected function setValueForPart($search, $replace, $documentPartXML, $limit)
    {
        // Note: we can't use the same function for both cases here, because of performance considerations.
        if (self::MAXIMUM_REPLACEMENTS_DEFAULT === $limit) {
            return str_replace($search, $replace, $documentPartXML);
        } else {
            if (is_string($documentPartXML)) {
                for($i = 1; $i <= $limit; $i++) {
                    $pos = strpos($documentPartXML, $search);

                    if ($pos === false) {
                        return $documentPartXML;
                    }

                    // Chop the string before and after the search text, then join it back
                    // together around the replace text. Still slow for 8M documents!
                    $documentPartXML = substr($documentPartXML, 0, $pos)
                        . $replace .
                        substr($documentPartXML, $pos + strlen($search));
                }

                return $documentPartXML;
            }

            $regExpEscaper = new RegExp();
            return preg_replace($regExpEscaper->escape($search), $replace, $documentPartXML, $limit);
        }
    }

@FBnil
Copy link

FBnil commented Oct 8, 2017

@judgej About the <xml matching, it hooks there to match only once (preg_match() otherwise would match multiple times). The xml line is in a separate \n line, thus . does not match that (/s for single line fixes that). So Write '/(<\?xml[\s\S]*) instead. I'm planning to remove the regexp for blocks out all together.

But the solution for your block cloning problem is to delete the closing tag, press enter, and write it again. This will ensure it being again inside 1 <w:p>.

@JackEllis
Copy link

@FBnil Big fan of you right now, your code fixed the clone block for me!

@heroyan
Copy link

heroyan commented Dec 24, 2018

I enconter this problem

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

4 participants