Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Templating tables #1086

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

Templating tables #1086

wants to merge 2 commits into from

Conversation

ejn
Copy link

@ejn ejn commented Jun 27, 2017

Additional functions for working with tables in TemplateProcessor

  • insert a new table in a block containing a ${placeholder}
  • delete a table row containing a ${placeholder}
  • delete a table containing a ${placeholder}

(note split into two commits due to common changes in 02508f2 with Shared/Converter with #1084)

- utility functions to parse lengths given in absolute CSS units (in. px,
mm, cm, pc)
- insertion of a new table at a placeholder
- deletion of a table containing a placeholder
- deletion of a table row containing a placeholder
@FBnil
Copy link

FBnil commented Oct 6, 2017

I like the findXmlBlockStart() functions, and that means changing a few bits in my code to accommodate for them. I do not like the naming convention for insertTable(), it is more like replaceBlock("blockname", makeTable($array)). If we follow naming like setValue(), then it would be setTable() with is shorter than replaceBlockWithATable().
The logic of your code will fit fine with my singleton blocks ${blockname/}.
Why is there no replaceTable() and use deleteTable() based on that function (replace with empty)
I also have functions for Segments, which basically is what the findContainingXmlBlockForMacro() does. Keeping it as a generic word "deleteSegment()", is clearer. What I mean to say is:
Is it not better to add a single function makeTable() that spews out XML, and use the existing functions like replaceBlock() to replace the block with a table. (at the moment you use replaceXmlBlock(), which would be OK if you made replaceBlock() point to replaceXmlBlock() or else there is so much extra code duplication....)
What I mean is, why are you not brave enough to:

    public function replaceBlock($blockname, $replacement)
    {
      return replaceXmlBlock($blockname, $replacement); , # $blockType default is 'w:p'
    }

Also in codestyle: replaceXmlBlock($blockname, $replacement); is better for the documentation later than your replaceXmlBlock($macro, $block) really, what is "macro"? I don't use macro's I use blocknames, and that looks like the second parameter, $block, which is actually the replacement block.

Can you describe me the logic why inserting tables is needed beyond, say, cloneRow() ? CloneRow keeps the table formatting, so you can have nice tables. I did not see any formatting code in your functions, so you get a table, but just the basic one. You can just open a document with lots of blocks with tables with different sizes, then just getBlock("tablewith4columns"), and replaceBlock() it in your main document, then use cloneRow() to get the right amount of rows and you are done, no need for this patch.

Do not feel bad if this comes out harshly. (not meant to get you down, only to make the final code better). If I did not care about your code getting into 0.14.0 I would not even post.

Can people reading this tell if they need templates where the amount of rows is not known, and this would be the only solution? (I saw questions about cloneColumn() in the field, so I suppose it is needed)

@ejn
Copy link
Author

ejn commented Oct 6, 2017

Regarding the logic of why inserting tables is needed - I actually wrote this a couple of years ago so the details are a little hazy now, but I think we had a single placeholder that should be replaced either with a table or with a "No data available" type line, depending on conditions. The alternative of having two placeholders and deleting the unused one was unattractive as (i) we weren't the ones editing the templates so we'd have had to explain it and (ii) it was liable to end up with extra stuff in the template that wouldn't have got removed appropriately. You're right that the insertTable-functionality is a little basic, but maybe someone else can extend it with more formatting possibilities.

As for the rest of your comments / code review: Constructive criticism is good! I'm happy to do some reworking if it helps get this into the next release. I'll look at the details when I get a bit more time - I assume there's no great rush, or is a 0.14.0 imminent?

@FBnil
Copy link

FBnil commented Oct 6, 2017

@ejn: No great rush, I assume old dev's will come back during the holiday season to do some work. We do have @troosan with access to the dev branch, but I want to make it very light for him (as he, like us?, does not want to be a maintainer). And I want to merge all new functionality and fixes into one clear way of doing things.
For example: Searching for an xml tag (start and end of a <w:p>) there are 3 ways to do that:

  1. Using a regexp to fetch the block:

^(<\?xml[\s\S]+)(<w:p[ >].*?\${' . $blockname . '}.*?<\/w:p>)(.+?)(<w:p[ >].*?\${\/' . $blockname . '}.*?<\/w:p>)

We see that the 0.13.0 branch has a bad regexp that does not work anymore (here listed is a working one.. but it's a bit messy)

  1. Using str(r)pos() to find to the left and right of the ${value}. This is what cloneRow() and other *Row commands use. Not optimal because it searches for a MSOffice tag and if it fails, it searches for a OpenOffice tag.

  2. xpath

See for example the findBlock() https://github.com/webadev/PHPWord/blob/d825ff78196546b0f4dd83a2090914659ed803d1/src/PhpWord/TemplateProcessor.php

Although that commit also is not touching the Block commands, leaving the faulty regexp in place, it does fix the inline block problem, I'll explain:

This is a proper block. Each block has it's own paragraph <w:p>:

${myblock}
text
${/myblock}

This is an inline block:
Lorem ${inline} ipsum ${/inline} vitae dicta sunt.

And so, if the block and the /block have the same <w:p>, this is an inline block and needs to be treated differently, which that before mentioned xpath solution actually fixed.

And then we have the block/ singleton which, like <hr/> is a stand-alone block (see my branch on how that works) but instead of writing:

${block}
* ${bullet}
${/block}

Then use cloneBlock("block",n) to clone the bullet a few times and setValue("bullet#n",$v) to fill the bullet, do:

* ${bullet/}

And get the same result with cloneBlock("bullet/",n) and setValue("bullet/#n",$v)

ps: or exactly what you wanted it for:
if table data then $t->replaceBlockWithTable("table/", $arr_data, $style) else $t->setValue("table/", "no data found");

How do you feel about using xpath() instead of strpos() ? (I have not done any benchmarking, my guess is that it is slower, as each function call you need to rebuild the xpath table, or do everything in xpath, and perform no direct string insertions in the tempDocumentMainPart string; all or nothing). Personally, I think using xpath is the correct way, but am still more inclined to use strpos (with an optimization to search for <w:p and then check the +1 position for a space or a >). But again: need to benchmark that. Because all the *Row() commands to search for a span could benefit from xpath if you want to do serious table editing (span vertically or horizontally, split a cell in half, etc). Then again, current running code does not use it, so it's double work.

I said something that was not true: The tables you inject can later be transformed with xslt, for example making transformXml() publicly accessible, so you could make a table, add the styling by transforming it and then inject it back into tempDocumentMainPart. (so if you need to choose, the third table position() = 3, and keep a mental check on transforming the correct one... which works, but is a pita if you want different type of tables)

more on tables: http:https://officeopenxml.com/WPtableGrid.php

Once these search functions are created (find-given-tag-to-the left/right ), we make the rest of the functions use these functions.

Right. I'm making a list of your functions (from the create Image branch) into my part (and add insert image from string), then merge your table stuff as I envision it (a little bit more lightweight than it currently is), then try to make them use the same base functions for less filesize, and propose that as a functional design (so function names exposed to the outside are 'logical' or at least, similar). Then you can shoot at it (and other people wanting to give their opinion, you are most welcome, as you are end-users of this)

#316

@FBnil
Copy link

FBnil commented Oct 7, 2017

@ejn Ok here is a working copy of the TemplateProcessor.php. As you can see, I removed all your tag finding functions and replaced them with what I was using. This makes the amount of code much smaller. I also added a little to your testcases, and I will be incorporating your $zip->getFromName('word/document.xml') in favour of using OpenTemplateProcessor, as Travis complains about "eval". (And scrutinizer complains about having multiple classes in a single file, or using a require to load it in...). I did not do any reduction work on the images code, just the table code. (and here and there cut the line length to have less warnings about lines >120 characters).

pastebin
This pastebin is not a drop-in replacement for TemplateProcessor.php v0.13.0 because it uses a few new files from @ejn (for image embedding).

Tell me what you think.

@FBnil
Copy link

FBnil commented Oct 8, 2017

Ok, addressing the fact that we could do it in another way, let's leverage the Writer and IOFactory to make a colored table. This is pure 0.13.0

# create a PhpWord object, where we write a table to
$phpWord = new \PhpOffice\PhpWord\PhpWord();
$section = $phpWord->addSection();
$table = $section->addTable();
$table->addRow();
# let's go wild and throw some styling into it
$table->addCell(4500, array('vMerge' => 'restart', 'bgColor'=>"FFFF00"))->addText('dummy');
$table->addCell(4500)->addText('Test', ['alignment' => Jc::CENTER, 'italic' => true, 'underline' => 'dash', 'fgColor' => 'yellow', 'color' => '996699']);

# Now we use the IOFactory to print our document to an XML string
$objWriter = \PhpOffice\PhpWord\IOFactory::createWriter($phpWord, 'Word2007');
$fullxml = $objWriter->getWriterPart('Document')->write();

#Great, we have an empty document with only a table, so we extract the table only
$tablexml = preg_replace('/^[\s\S]*(<w:tbl\b.*<\/w:tbl>).*/', '$1', $fullxml);

# Ok, now we use a TemplateProcessor to inject our fancy table
$file_dir = storage_path('mydocument.docx');
$templateProcessor = new \PhpOffice\PhpWord\TemplateProcessor($file_dir);
# in our document we have a ${TABLEPLACEHOLDER} and ${/TABLEPLACEHOLDER} on separate lines
$templateProcessor->replaceBlock('TABLEPLACEHOLDER', $tablexml);
# Save the document
$templateProcessor->saveAs(storage_path('app/public/x.docx'));
# use laravel to serve the newly created document
return response()->download(storage_path('app/public/x.docx'))->deleteFileAfterSend(true);

Do you see the potential for this v/s your tables? Can't you make a new version of your tables and:

$table = makeTable('PLACEHOLDER');
$table->addRow();
$table->done();

that returns a subclass of PhpWord() with an extra function done() which will extract the table and replace it in the placeholder? No need for extra const TABLE_TEMPLATE, or the like.
You can even add your $table->addTable() that will add it without styling (but with table width), wrapping around addRow and Add Cell ?
I am not sure if your extra code for twips is required, but addRow($width,$style) does not do the translation. Although $width="102pt" does actually work in LibreOffice. More reading is required to know.

@FBnil
Copy link

FBnil commented Oct 8, 2017

@ejn ... and you can use the same with addImage(), so IMAGE_TEMPLATE is not needed anymore (but you still retain the file injection code into the zip).

edit: in fact, adding these to an internal array would keep them open until it is time to save/as or force them with a $templateProcessor->write();
In fact, all we would need is a ->write($placeholder, 'w:tbl', $table), because from $table (or any other element) you can $table->getPhpWord().

Which method do you think is better?

  1. Adding helper function so you do:
$table = $templateProcessor->newTable();
$table->addRow(); # etc
$templateProcessor->setBlock('MYTABLE',  $templateProcessor->toXml($table));

Where toXML() looks at the class and guesses w:tbl for us (but also accepts a second parameter for override).
And newTable() does the return (new \PhpOffice\PhpWord\PhpWord())->addSection()->addTable() for us. (and new* is of course generated procedurally by an array).

  1. Only a writeback class
    So you need to make a PhpWord() object yourself, populate it, and then
$templateProcessor->setBlock('MYTABLE', $phpWord_table);

where setBlock() sees that the second parameter is not a string but an object and calls toXml() (mentioned before)

  1. Combine both
$table = $templateProcessor->newTable();
$formatted_text = $templateProcessor->newText();
# do somehing with $table
$templateProcessor->setBlock('MYTABLE', $table);
$templateProcessor->setValue('NAME', $formatted);

The special case for setValue(), where it behaves like setBlock that cuts at w:p, but now cutting at w:r
underwater this still uses *Segment() functions to do a precise cut, while still allowing multiple formatted text.

things to check: RAM considerations
Coding toXML(). And please reply... this feels like a monologue. 0.14.0 is going to be great!

@ejn
Copy link
Author

ejn commented Oct 10, 2017

@FBnil Sorry for leaving you to talk to yourself for so long! I'll try and respond to all your main points.

I love the idea of generating the blocks using the "main" PHPWord stuff and then just putting them in as XML: Certainly a much cleaner and more consistent strategy than having a bunch of pre-prepared sprintf templates. Equally XPath would be the "clean" way to do the template manipulation - but on the other hand I reckon you're right that that would be slower and more memory-intensive, and possibly less flexible than the regexes.

Personally I like the hybrid approach to the API from your last post - for those that want full control of whatever blocks they're putting in they can generate them and then insert them, but keep the current simple API (single method call) for basic use-cases. For the image stuff where I wanted to allow the size of the image to be controlled from the template directly then this would not be possible with the "create object and then insert" approach as first the additional parameters must be read from the placeholder. (This functionality was to allow the template maintainer more flexibility without requiring that the code be changed). Oh, and yes to more uniform method names (setValue, setBlock, setImage, setTable, etc., even where many of these defer as much of the XML manipulation as possible to a common method such as setBlock). I also thought that it may be cleaner to put some of the Zip plumbing for adding additional files into a separate helper class, as this is secondary to the template processing but adds quite a lot of additional lines of code which clutter up the TemplateProcessor

In terms of memory usage I think it would be better to transform the objects to XML immediately upon adding to the template rather than first recording them in an array; (i) this would probably need less memory and (ii) more importantly it allows e.g. adding a table with placeholders for images and then inserting images into them. I can't see a big advantage in deferring the serialization, or what is your idea behind this?

The conversion of the column widths to Twips is possibly not required - I suspect I probably got this from studying what Word does (the whole images and tables was done by a combination of reading the theory and just deconstructing some docx files): It's quite possible that any valid units will work.

Having had a look at your Pastebin it looks pretty good to me for reducing the amount of code. If you want to run with your ideas of generating PHPWord-objects for the blocks then I can try and support you (even if just by testing - unfortunately I've not got much time just now to work on this) and once there's a "new, improved" version I'll happily close my pull requests with reference to the better version.

@FBnil
Copy link

FBnil commented Oct 10, 2017

I LOVE the twipcode and want that in the product. (now that I tried it, it really is something I would miss, even if I do not think in inches). Images is a juggernautic task that you tackled well. I can think and play with tables and come up with something simple but effective (in a week or so, really need some days worth of sleep).
The delay would be, for example, for table indexes, where you create your text, images, and then go back and fill the first page. But I am sleepy at the moment, so I will reply when I'm not half asleep.

@n-osennij
Copy link

So why it not merged?

@n-osennij
Copy link

n-osennij commented Jan 19, 2019

@troosan Do you have any information (about merge this pull request)?

@troosan troosan added this to the v0.17.0 milestone Jan 19, 2019
@troosan
Copy link
Contributor

troosan commented Jan 19, 2019

I'll try to merge this in next release. It needs a bit of work because lots of things will need merging

@troosan troosan removed this from the v0.17.0 milestone Aug 8, 2019
@PowerKiKi PowerKiKi changed the base branch from develop to master November 16, 2022 21:11
@Progi1984 Progi1984 force-pushed the master branch 3 times, most recently from 2d9f999 to e458249 Compare August 30, 2023 11:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

None yet

4 participants