Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using table tag in HTML Reader produces no output #324

Closed
EK1771 opened this issue Aug 1, 2014 · 15 comments
Closed

Using table tag in HTML Reader produces no output #324

EK1771 opened this issue Aug 1, 2014 · 15 comments

Comments

@EK1771
Copy link

EK1771 commented Aug 1, 2014

Sample Code:

$phpWord = new \PhpOffice\PhpWord\PhpWord();

$section = $phpWord->addSection();

$html = '<table><tr><td>test</td></tr></table>';

\PhpOffice\PhpWord\Shared\Html::addHtml($section, $html);

$objWriter = \PhpOffice\PhpWord\IOFactory::createWriter($phpWord, 'Word2007');
$objWriter->save('test.docx');

Expected Output:

Table with one cell containing the word "test".

Actual Output:

Blank

From stepping through the code quickly, the issue seems to be caused by the following if condition in parseChildNodes():

if ($element instanceof AbstractContainer) {
    self::parseNode($cNode, $element, $styles, $data);
}

Commenting out the if condition then allows for the sample code above to produce the expected output.


Want to back this issue? Post a bounty on it! We accept bounties via Bountysource.

@hregis
Copy link

hregis commented Sep 24, 2014

Hello
me too, you have a solution? (develop branch)
Thank you

@EK1771
Copy link
Author

EK1771 commented Sep 25, 2014

@hregis l do, in src/PhpWord/Shared/Html.php parseChildNodes():

private static function parseChildNodes($node, $element, $styles, $data)
{
       if ($node->nodeName != 'li') {
              $cNodes = $node->childNodes;
              if (count($cNodes) > 0) {
                     foreach ($cNodes as $cNode) {
                            if ($element instanceof AbstractContainer) {
                                   self::parseNode($cNode, $element, $styles, $data);
                            }
                     }
              }
       }
}

Change this to:

private static function parseChildNodes($node, $element, $styles, $data)
{
       if ($node->nodeName != 'li') {
              $cNodes = $node->childNodes;
              if (count($cNodes) > 0) {
                     foreach ($cNodes as $cNode) {
//                            if ($element instanceof AbstractContainer) {
                                   self::parseNode($cNode, $element, $styles, $data);
//                            }
                     }
              }
       }
}

Also could this issue please be labelled as a Bug, it's definitely not a Question.

@hregis
Copy link

hregis commented Sep 25, 2014

@EK1771 thank you, but i have this error with develop branch:
Fatal error: Call to undefined method PhpOffice\PhpWord\Element\Table::addText() in /PhpWord/Shared/Html.php on line 239

@mogilvie
Copy link

mogilvie commented Nov 9, 2014

I had the same issue where tables were not being parsed from HTML to DOM.

The problem is that HTML elements
<tbody> <tr> and <td> are not DOMElements as defined by the Abstract Container class. Because these HTML elements are not DOM Abstract Containers the parseChildNodes method doesnt check for any child elements.

@EK1771 solution removes the check against Abstract containers, but also causes every element to be checked for children, even when some are not containers.

There are a couple of steps to fix this.

  1. Insert a new node into the HTML mapping table to catch <tbody> elements.

/PhpWord/Shared/Html.php:parseNode()

        // Node mapping table
        $nodes = array(
                              // $method        $node   $element    $styles     $data   $argument1      $argument2
            'p'         => array('Paragraph',   $node,  $element,   $styles,    null,   null,           null),
            'h1'        => array('Heading',     null,   $element,   $styles,    null,   'Heading1',     null),
            'h2'        => array('Heading',     null,   $element,   $styles,    null,   'Heading2',     null),
            'h3'        => array('Heading',     null,   $element,   $styles,    null,   'Heading3',     null),
            'h4'        => array('Heading',     null,   $element,   $styles,    null,   'Heading4',     null),
            'h5'        => array('Heading',     null,   $element,   $styles,    null,   'Heading5',     null),
            'h6'        => array('Heading',     null,   $element,   $styles,    null,   'Heading6',     null),
            '#text'     => array('Text',        $node,  $element,   $styles,    null,    null,          null),
            'span'      => array('Span',        $node,  null,       $styles,    null,    null,          null), //to catch inline span style changes
            'strong'    => array('Property',    null,   null,       $styles,    null,   'bold',         true),
            'em'        => array('Property',    null,   null,       $styles,    null,   'italic',       true),
            'sup'       => array('Property',    null,   null,       $styles,    null,   'superScript',  true),
            'sub'       => array('Property',    null,   null,       $styles,    null,   'subScript',    true),
            'table'     => array('Table',       $node,  $element,   $styles,    null,   'addTable',     true),
            'tbody'     => array('Table',       $node,  $element,   $styles,    null,   'skipTbody',    true), //added to catch tbody in html.
            'tr'        => array('Table',       $node,  $element,   $styles,    null,   'addRow',       true),
            'td'        => array('Table',       $node,  $element,   $styles,    null,   'addCell',      true),
            'ul'        => array('List',        null,   null,       $styles,    $data,  3,              null),
            'ol'        => array('List',        null,   null,       $styles,    $data,  7,              null),
            'li'        => array('ListItem',    $node,  $element,   $styles,    $data,  null,           null),
        );

  1. Modify the parseChildNodes method. Define a list of table HTML elements which contain child elements. Write an IF check against the HTML nodeName. Let any other node types carry on to the original IF check for DOM Elements.
    private static function parseChildNodes($node, $element, $styles, $data)
    {
        if ($node->nodeName != 'li') {
            $cNodes = $node->childNodes;
            if (count($cNodes) > 0) {
                foreach ($cNodes as $cNode) {              
                    // Added to get tables to work                    
                    $htmlContainers = array(
                        'tbody',
                        'tr',
                        'td',
                    );
                    if (in_array( $cNode->nodeName, $htmlContainers ) ) {                        
                        self::parseNode($cNode, $element, $styles, $data);
                    }                              
                    // All other containers as defined in AbstractContainer
                    if ($element instanceof AbstractContainer) {                        
                        self::parseNode($cNode, $element, $styles, $data);
                    }
                }
            }
        }
    }
  1. Modify the parseTable method. The DOM writer adds columns and rows to the Table element directly, so you need to add a Switch or series of If checks against Argument1 of the Node Table.
    private static function parseTable($node, $element, &$styles, $argument1)
    {     
        switch ($argument1) {
            case 'addTable':                        
                $styles['paragraph'] = self::parseInlineStyle($node, $styles['paragraph']); 
                $newElement = $element->addTable('table', array('width' => 90));
                break;
            case 'skipTbody':                        
                $newElement = $element;
                break;
            case 'addRow':                        
                $newElement = $element->addRow();
                break;
            case 'addCell':                        
                $newElement = $element->addCell(1750);
                break;
        }

        // $attributes = $node->attributes;
        // if ($attributes->getNamedItem('width') !== null) {
            // $newElement->setWidth($attributes->getNamedItem('width')->value);
        // }

        // if ($attributes->getNamedItem('height') !== null) {
            // $newElement->setHeight($attributes->getNamedItem('height')->value);
        // }
        // if ($attributes->getNamedItem('width') !== null) {
            // $newElement=$element->addCell($width=$attributes->getNamedItem('width')->value);
        // }

        return $newElement;
    }

This works for me, hope it helps others. I'm sure there is a more elegent solution that could be incorporated in the Develop branch.

It needs to be exapanded to deal with <thead> and other HTML Table Elements.

Mark

@matteomoretti
Copy link

Any news? The bug still occours

@hari-web
Copy link

hari-web commented Sep 1, 2015

by using \PhpOffice\PhpWord\Shared\Html::addHtml($section, $html) we can interpret html to word. Can we set alignment options for this output (such as align right/left/both) ?

@garethellis36
Copy link
Contributor

@mogilvie are you able to share your complete and working Html class with modifications? I'm trying it myself but when I try and apply it to an updated sample (as below), I get an error when it tries to write because of the objects is null.

Html class

//node mapping table
            'table'     => array('Table',       $node,  $element,   $styles,    null,   'addTable',     true),
            'thead'     => array('Table',       $node,  $element,   $styles,    null,   'skipThead',    true),
            'tbody'     => array('Table',       $node,  $element,   $styles,    null,   'skipTbody',    true),
            'tr'        => array('Table',       $node,  $element,   $styles,    null,   'addRow',       true),
            'td'        => array('Table',       $node,  $element,   $styles,    null,   'addCell',      true),
            'th'        => array('Table',       $node,  $element,   $styles,    null,   'addCell',      true),

    /**
     * Parse child nodes.
     *
     * @param \DOMNode $node
     * @param \PhpOffice\PhpWord\Element\AbstractContainer $element
     * @param array $styles
     * @param array $data
     * @return void
     */
    private static function parseChildNodes($node, $element, $styles, $data)
    {
        if ('li' != $node->nodeName) {
            $cNodes = $node->childNodes;
            if (count($cNodes) > 0) {
                foreach ($cNodes as $cNode) {
                    // Added to get tables to work
                    $htmlContainers = array(
                        'thead',
                        'tbody',
                        'tr',
                        'td',
                        'th',
                    );
                    if (in_array($cNode->nodeName, $htmlContainers)) {
                        self::parseNode($cNode, $element, $styles, $data);
                    }
                    if ($element instanceof AbstractContainer) {
                        self::parseNode($cNode, $element, $styles, $data);
                    }
                }
            }
        }
    }

    private static function parseTable($node, $element, &$styles, $argument1)
    {
        switch ($argument1) {
            case 'addTable':
                $styles['paragraph'] = self::parseInlineStyle($node, $styles['paragraph']);
                $newElement = $element->addTable('table', array('width' => 90));
                break;
            case 'skipThead':
            case 'skipTbody':
                $newElement = $element;
                break;
            case 'addRow':
                $newElement = $element->addRow();
                break;
            case 'addCell':
                $newElement = $element->addCell(1750);
                break;
        }

Sample:

 $html .= '<table><thead><tr><th>Header of column 1</th><th>Header of column 2</th></tr></thead>';
$html .= '<tbody><tr><td>Row 1 for column 1</td><td>Row 1 for column 2</td></tr></tbody></table>';

@mogilvie
Copy link

mogilvie commented Feb 5, 2016

I'll post the full class tonight, but in the meantime, is the error caused by the switch statement not having any code to be executed for case: 'skipThead'?

@garethellis36
Copy link
Contributor

@mogilvie I doubt that - it should fall through to the 'skipTbody' case because there's no break statement. I don't fully understand how this works but I assumed that thead would need to be handled in the same way as tbody.

@mogilvie
Copy link

mogilvie commented Feb 5, 2016

@garethellis36 Agree re skipThead. Is the parseTable function returning the $newElement? It's cut off from the sample html class.

@mogilvie
Copy link

mogilvie commented Feb 5, 2016

Html.php uploaded as a text file. It was working as of Jan 2015.

Html.txt

@surindersinghva
Copy link

Thank you, this is very very helpful. The embedded table is printing like charm now. I am still stuck with an embedded nested list, if you can help. There are two issues, 1. List item not printing if <strong> tag is used, 2. Nested list not printing with or without <strong> tags. Here's what I have: <p>The following list has all the information.</p><ol><li><strong>Item 1</strong><ol><li><strong>Nested Item 1</strong></li><li>Nested Item 2</li></ol></li></ol><p>List ends here.</p>

Thank you

@arrabal
Copy link

arrabal commented Apr 22, 2016

@mogilvie Your Html.php works ok, thanks for your work. I had to comment line 61, $dom->save('/var/www/vhosts/specshaper.com/DOM.xml'); //@todo Delete Debug

However, the table is generated without borders, even if a set a big "border" value in

tag.

Will these fixes be integrated in the main branch?

@arivanbastos
Copy link

arivanbastos commented Nov 14, 2016

This issue still occurs... The @garethellis36 version dont work for me (PHPWord version 0.13). So I work a little improving the HTML class.
My version is able to converting the following HTML:

<table style="width: 50%; border: 6px #0000FF solid;">
    <thead>
        <tr style="background-color: #FF0000; text-align: center; color: #FFFFFF; font-weight: bold; ">
             <th>a</th>
             <th>b</th>
             <th>c</th>
        </tr>
    </thead>
    <tbody>
        <tr><td>1</td><td colspan="2">2</td></tr>
        <tr><td>4</td><td>5</td><td>6</td></tr>
    </tbody>
</table>

More details see: http:https://stackoverflow.com/questions/29275140/html-reader-from-phpword-doest-work-with-tables/40600565#40600565

@danilocarva9
Copy link

danilocarva9 commented Aug 22, 2019

How to cellspacing and cellpadding on table on Html table? It's not working :(

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests