Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

List item values are missing from docx file while convert as HTML file #1462

Open
vschavala opened this issue Sep 10, 2018 · 25 comments
Open

Comments

@vschavala
Copy link

/* Here is my code*/

$PHPWord = new \PhpOffice\PhpWord\PhpWord();

$PHPWordLoad = \PhpOffice\PhpWord\IOFactory::load($file);

$objWriter = \PhpOffice\PhpWord\IOFactory::createWriter($PHPWordLoad, 'HTML');

$tmpfname = public_path('doczipfiles/temp.html');

$htmlWriter ->save($tmpfname);

@akosipau
Copy link

Up to this. I encountered the same issue as well.

@evomer
Copy link

evomer commented Oct 29, 2018

same here

@vineetagarwal1981
Copy link

I also face the same issue and found that "ListItemRun.php" is missing from the PATH "src\PhpWord\Writer\HTML\Element" which is causing the issue.

I added the file and made the changes. I got the list value but was missing the bullet icon. I am currently trying to fix this issue. In the mean while if anyone want the file, please let me know

@lubobill1990
Copy link

+1

@bozzit
Copy link

bozzit commented Nov 13, 2018

@vineetagarwal1981 I could use the listItemRun.php file if you could make it available please. Just the data with out the bullets is sufficient for my needs. Thanks,

@vineetagarwal1981
Copy link

@vineetagarwal1981 I could use the listItemRun.php file if you could make it available please. Just the data with out the bullets is sufficient for my needs. Thanks,

@bozzit Below is the file that you require. Place this file at the PATH "src\PhpWord\Writer\HTML\Element"

Let me know if it's working for you

ListItemRun.zip

@lubobill1990
Copy link

@vschavala If you just want to convert html, I find it's better to convert with more specific tools.
This is my solution and works better than PHPWord: https://gist.github.com/lubobill1990/701df4becce20af43e9122a26dc52a05

The main purpose of PHPWord is to compose a word document with PHP, but not convert between formats.

@bozzit
Copy link

bozzit commented Nov 15, 2018

@vineetagarwal1981 I could use the listItemRun.php file if you could make it available please. Just the data with out the bullets is sufficient for my needs. Thanks,

@bozzit Below is the file that you require. Place this file at the PATH "src\PhpWord\Writer\HTML\Element"

Let me know if it's working for you

ListItemRun.zip

Hi Yes thank you, If I have time I will attempt to make it output a unordered lists instead of the text within

elements.

at least I'm not loosing the text within the lists by adding this file.

@kristl78
Copy link

kristl78 commented Mar 28, 2019

Same here.

If I generate a word document like here I will get this file structure:

  • _rels
  • theme
  • document.xml
  • fontTable.xml
  • footnotes.xml
  • numbering.xml
  • settings.xml
  • styles.xml
  • webSettings.xml

With HTML generated lists this:

  • _rels
  • theme
  • endnotes.xml
  • fontTable.xml
  • footer1.xml
  • header1.xml
  • settings.xml
  • styles.xml
  • stylesWithEffects.xml
  • webSettings.xml

So you can see there is no numbering.xml. And also if you try to use libreoffice to generate a pdf all lists are empty.

@lubobill1990
Copy link

@kristl78 #1462 (comment)
I'm using this solution and it works well. Hope it can help.

@kristl78
Copy link

kristl78 commented Apr 5, 2019

@lubobill1990 thank u but this is unfortunately not enough.

@tikumo
Copy link

tikumo commented Aug 1, 2019

I've used the solution from @vineetagarwal1981 but modified it a bit.
The list items were not parsed as li tags, which I need for my project.

public function write()
{
if (!$this->element instanceof \PhpOffice\PhpWord\Element\ListItemRun) {
return '';
}
$content = '

  • ';
    $content .= $this->element->getElement(0)->getText();
    $content .= '
  • ';
    return $content;
    }

    @PhoenixRising2015
    Copy link

    I just install PHPWord using composer (6/8/2020) I also see the above problem (loss of list text) when attempting to convert a .docx to .html.
    The version of PHPWord I installed did have the file mentioned above ListItemRun.php in the proper directory. However I still had the error.
    I also attempted to copy the file ListItemRun.php provided by @vineetagarwal1981 above into the element directory overwriting the installed copy of ListItemRun.php and that generated several exceptions. Therefor I backed that change out.
    Has there been any resolution on how to convert .docx list to Html without losing the text ??

    @PhoenixRising2015
    Copy link

    I've used the solution from @vineetagarwal1981 but modified it a bit.
    The list items were not parsed as li tags, which I need for my project.

    public function write()
    {
    if (!$this->element instanceof \PhpOffice\PhpWord\Element\ListItemRun) {
    return '';
    }
    $content = '* ';
    $content .= $this->element->getElement(0)->getText();
    $content .= '';
    return $content;
    }

    Hi Could you provide a little more specifics for example in which php file did you place this code?

    @Hector1567XD
    Copy link

    Hello @PhoenixRising2015!
    i have the same problem, did you find a solution?

    @bozzit
    Copy link

    bozzit commented May 1, 2021

    @Hector1567XD

    Just create a file called.

    "ListItemRun.php" in PATH "src\PhpWord\Writer\HTML\Element" With that code in it or look up in this thread there is a link ti a ZIP file with the "ListItemRun.php" in it.

    @ryanzzeng
    Copy link

    Same here:
    I have the following list with numbers in docx file:

    1. a
    2. b
    3. c
    

    After converting to HTML file:

     a
     b
     c
    

    @Lurtz963
    Copy link

    Lurtz963 commented Feb 9, 2022

    for anyone having this issue the solution by @tikumo works

    public function write()
    	{
    if (!$this->element instanceof \PhpOffice\PhpWord\Element\ListItemRun) {
    	return '';
    	}
    	$content = '';
    	$content .= '<ul><li>';
    	$content .= $this->element->getElement(0)->getText();
    	$content .= '</li></ul>';
    	$content .= "\n";
    	return $content;
    }
    

    Replace the function write() in src\PhpWord\Writer\HTML\Element\ListItemRun.php with the code above and it will transform any listItemRun into a li element, however there is no way to create the parent ul for the lists afaik so I modified the function and make every list item a separated list as a temporary solution. If anyone has any solution for making the ul elements please let me know

    @bozzit
    Copy link

    bozzit commented Feb 9, 2022

    What I ended up doing is:
    I modified phpoffice/phpword/vendor/phpoffice/phpword/src/PhpWord/Writer/HTML/Element/ListItemRun.php

    protected function writeOpening()
        {
             $content =  sprintf('<li data-depth="%s" data-liststyle="%s" data-numId="%s">',  $this->element->getDepth(),
                                                                                              $this->element->getListFormat($this->element->getDepth()),
                                                                                              $this->element->getListId());
    
            return $content;
        }
    

    Then created my own writer that extends AbstractWriter

    class MyHtmlWriter extends AbstractWriter implements WriterInterface
    {
    .
    .
    .
    
        /**
         * Get content
         *
         * @return string
         */
    
        public function getContent()
        {
            $content = $this->getWriterPart('Body')->write();
    
            $lines = explode(PHP_EOL, $content);
     
            $newcontent = '';
            foreach ($lines as $line)
            {
                if (preg_match('/( |^)<li data-depth/', $line))
                {
                /** use the data-depth, data-liststyle and data-numid to add <ul> </ul> <ol></ol> 
                   * where needed
                   * /
               }
               else
               {
                    $newcontent .= $line;
               }
            }
    
            $content = $newcontent;
    .
    .
    .
            return $content;
        }
    

    Hope this points you @Lurtz963 in the right directions.

    @Lurtz963
    Copy link

    Lurtz963 commented Feb 9, 2022

    What I ended up doing is: I modified phpoffice/phpword/vendor/phpoffice/phpword/src/PhpWord/Writer/HTML/Element/ListItemRun.php

    protected function writeOpening()
        {
             $content =  sprintf('<li data-depth="%s" data-liststyle="%s" data-numId="%s">',  $this->element->getDepth(),
                                                                                              $this->element->getListFormat($this->element->getDepth()),
                                                                                              $this->element->getListId());
    
            return $content;
        }
    

    Then created my own writer that extends AbstractWriter

    class MyHtmlWriter extends AbstractWriter implements WriterInterface
    {
    .
    .
    .
    
        /**
         * Get content
         *
         * @return string
         */
    
        public function getContent()
        {
            $content = $this->getWriterPart('Body')->write();
    
            $lines = explode(PHP_EOL, $content);
     
            $newcontent = '';
            foreach ($lines as $line)
            {
                if (preg_match('/( |^)<li data-depth/', $line))
                {
                /** use the data-depth, data-liststyle and data-numid to add <ul> </ul> <ol></ol> 
                   * where needed
                   * /
               }
               else
               {
                    $newcontent .= $line;
               }
            }
    
            $content = $newcontent;
    .
    .
    .
            return $content;
        }
    

    Hope this points you @Lurtz963 in the right directions.

    I tried this solution but data-depth is always 0 and the rest of the attributes are empty

    @bozzit
    Copy link

    bozzit commented Feb 9, 2022

    My bad, forgot I had to implement, some of those methods for the other Attributes, and 0 is normal for depth if you don't have nested lists. Top level List is always 0.

    index 6e48a69..ed83162 100644
    --- a/3rdparty/phpoffice/phpword/vendor/phpoffice/phpword/src/PhpWord/Element/ListItemRun.php
    +++ b/3rdparty/phpoffice/phpword/vendor/phpoffice/phpword/src/PhpWord/Element/ListItemRun.php
    @@ -73,6 +73,24 @@ class ListItemRun extends TextRun
             return $this->style;
         }
     
    +    public function getListFormat($depth)
    +    {
    +        if (isset($this->style->bulletListType[$depth]->format))
    +        {
    +            return $this->style->bulletListType[$depth]->format;
    +        }
    +        else
    +        {
    +            return 'bullet';
    +        }
    +
    +    }
    +
    +    public function getListId()
    +    {
    +        return $this->style->numId;
    +    }
    +
    

    @Lurtz963
    Copy link

    Lurtz963 commented Feb 9, 2022

    After a bit of struggle I was able to implement a similar solution @bozzit , for some reason I couldn't use a custom writer (It throws the error that is not a valid writer) so I modified HTML writer. I let the files here in case someone wants to use it or make a better version.
    ListItemRun.php goes in phpoffice/phpword/src/PhpWord/Writer/HTML/Element
    and HTML.php goes in phpoffice/phpword/src/PhpWord/Writer

    files.zip

    @CaptBarbarossa
    Copy link

    CaptBarbarossa commented Apr 15, 2022

    After a bit of struggle I was able to implement a similar solution @bozzit , for some reason I couldn't use a custom writer (It throws the error that is not a valid writer) so I modified HTML writer. I let the files here in case someone wants to use it or make a better version. ListItemRun.php goes in phpoffice/phpword/src/PhpWord/Writer/HTML/Element and HTML.php goes in phpoffice/phpword/src/PhpWord/Writer

    files.zip

    Thanks! Your code helped me)) I just added a loop to the function write in ListItemRun.php

    public function write()
        {
            if (!$this->element instanceof \PhpOffice\PhpWord\Element\ListItemRun) {
                return '';
            }
            $content = '';
            $content .= sprintf('<li data-depth="%s" data-liststyle="%s" data-numId="%s">',  $this->element->getDepth(),
                $this->getListFormat($this->element->getDepth()),$this->getListId());
    
            $size_content = $this->element->countElements();
            for ($i=0; $i < $size_content; $i++){
                $content .= $this->element->getElement($i)->getText();
            }
    
            $content .= '</li>';
            $content .= "\n";
            return $content;
        }
    

    @EvanShaw
    Copy link

    It's been 6 years and ListItemRun.php is still not implemented. Pretty crazy.

    In any case, I took @CaptBarbarossa's code and extended it to handle all types of elements in the li, since there is no guarantee that a li only contains text:

    <?php
    
    namespace PhpOffice\PhpWord\Writer\HTML\Element;
    
    /**
     * ListItemRun element HTML writer
     *
     * @since 0.10.0
     */
    class ListItemRun extends TextRun
    {
        public function write()
        {
            if (!$this->element instanceof \PhpOffice\PhpWord\Element\ListItemRun) {
                return '';
            }
            $content = '';
            $content .= sprintf(
                '<li data-depth="%s" data-liststyle="%s" data-numId="%s">',
                $this->element->getDepth(),
                $this->getListFormat($this->element->getDepth()),
                $this->getListId()
            );
    
            $namespace = 'PhpOffice\\PhpWord\\Writer\\HTML\\Element';
            $container = $this->element;
    
            $elements = $container->getElements();
            foreach ($elements as $element) {
                $elementClass = get_class($element);
                $writerClass = str_replace('PhpOffice\\PhpWord\\Element', $namespace, $elementClass);
                if (class_exists($writerClass)) {
                    /** @var \PhpOffice\PhpWord\Writer\HTML\Element\AbstractElement $writer Type hint */
                    $writer = new $writerClass($this->parentWriter, $element, true);
                    $content .= $writer->write();
                }
            }
    
            $content .= '</li>';
            $content .= "\n";
            return $content;
        }
    
        public function getListFormat($depth)
        {
            return $this->element->getStyle()->getNumStyle();
        }
    
        public function getListId()
        {
            return $this->element->getStyle()->getNumId();
        }
    }
    

    The true as the last argument to new $writerClass($this->parentWriter, $element, true); prevents text from being wrapped in <p> tags so that everything inside the li is displayed inline.

    If you're installing this package with composer (like I am), you can use the post-install-cmd hook in your composer.json file to copy this file into ./vendor/phpoffice/phpword/src/PhpWord/Writer/HTML/Element/ListItemRun.php every time the package is installed

    @ana582kess
    Copy link

    ana582kess commented May 24, 2024

    I have use the this fix and it really did help to add list tag to the list however there is no ul/ol tag is there a way to determin this and what is the best way of adding this?

    It's been 6 years and ListItemRun.php is still not implemented. Pretty crazy.

    In any case, I took @CaptBarbarossa's code and extended it to handle all types of elements in the li, since there is no guarantee that a li only contains text:

    <?php
    
    namespace PhpOffice\PhpWord\Writer\HTML\Element;
    
    /**
     * ListItemRun element HTML writer
     *
     * @since 0.10.0
     */
    class ListItemRun extends TextRun
    {
        public function write()
        {
            if (!$this->element instanceof \PhpOffice\PhpWord\Element\ListItemRun) {
                return '';
            }
            $content = '';
            $content .= sprintf(
                '<li data-depth="%s" data-liststyle="%s" data-numId="%s">',
                $this->element->getDepth(),
                $this->getListFormat($this->element->getDepth()),
                $this->getListId()
            );
    
            $namespace = 'PhpOffice\\PhpWord\\Writer\\HTML\\Element';
            $container = $this->element;
    
            $elements = $container->getElements();
            foreach ($elements as $element) {
                $elementClass = get_class($element);
                $writerClass = str_replace('PhpOffice\\PhpWord\\Element', $namespace, $elementClass);
                if (class_exists($writerClass)) {
                    /** @var \PhpOffice\PhpWord\Writer\HTML\Element\AbstractElement $writer Type hint */
                    $writer = new $writerClass($this->parentWriter, $element, true);
                    $content .= $writer->write();
                }
            }
    
            $content .= '</li>';
            $content .= "\n";
            return $content;
        }
    
        public function getListFormat($depth)
        {
            return $this->element->getStyle()->getNumStyle();
        }
    
        public function getListId()
        {
            return $this->element->getStyle()->getNumId();
        }
    }
    

    The true as the last argument to new $writerClass($this->parentWriter, $element, true); prevents text from being wrapped in <p> tags so that everything inside the li is displayed inline.

    If you're installing this package with composer (like I am), you can use the post-install-cmd hook in your composer.json file to copy this file into ./vendor/phpoffice/phpword/src/PhpWord/Writer/HTML/Element/ListItemRun.php every time the package is installed

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    None yet
    Development

    No branches or pull requests