Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Impossible to read ODT file previously saved by PHPWord as ODText #2493

Closed
ligoo opened this issue Oct 28, 2023 · 1 comment · Fixed by #2507
Closed

Impossible to read ODT file previously saved by PHPWord as ODText #2493

ligoo opened this issue Oct 28, 2023 · 1 comment · Fixed by #2507

Comments

@ligoo
Copy link

ligoo commented Oct 28, 2023

Describe the Bug

Returning an empty array of element when trying to read an ODT file that has been saved by PHPWord previously.

Steps to Reproduce

<?php
require __DIR__ . '/vendor/autoload.php';

// Save
$phpWord = new \PhpOffice\PhpWord\PhpWord();
$section = $phpWord->addSection();
$section->addText('days');
$section->addText('monday');
$section->addText('tuesday');
$writer = \PhpOffice\PhpWord\IOFactory::createWriter($phpWord, 'ODText');
$writer->save('example.odt');

// Load
$reader = IOFactory::createReader('ODText');
$document = $reader->load('example.odt');

dd($document); // dd() laravel sugar but the idea is to inspect $document

Expected Behavior

Should return an array of PhpOffice\PhpWord\Element

Current Behavior

returns [] in $document->getSection(0)->elements

Leads

The library can properly read this (that is coming from libre office when creating a document saved as .odt file):

 <office:body>
  <office:text>
   <text:p text:style-name="P1">days</text:p>
   <text:p text:style-name="P1">monday</text:p>
   <text:p text:style-name="P1">tuesday</text:p>
  </office:text>
 </office:body>

but cannot read this (that is coming from the library when saving as .odt):

 <office:body>
  <office:text>
   <text:section text:style-name="Sect1" text:name="Section1">
    <text:p text:style-name="P1"/>
    <text:p text:style-name="Standard">days</text:p>
    <text:p text:style-name="Standard">monday</text:p>
    <text:p text:style-name="Standard">tuesday</text:p>
   </text:section>
  </office:text>
 </office:body>

Context

Please fill in your environment information:

  • PHP Version: 8.2
  • PHPWord Version: 1.1.0
@oleibman
Copy link
Contributor

oleibman commented Nov 3, 2023

FWIW, the document prepared by save appears to be valid; the problem does not lie with Writer/Odt. On the other hand, Reader/Odt has some gaps. At a minimum, Reader/ODText/Content has no support for text:section and text:span tags, and needs those.

oleibman added a commit to oleibman/PHPWord that referenced this issue Nov 12, 2023
Fix PHPOffice#2493. There is much that the ODT Reader ignores. This change adds support for the `text:section`, `text:span`, `text:s`, and `text:tab` tags, thereby handling multiple sections, text runs, tab characters, and multiple spaces. There will still be many omissions (e.g. styles and tables), but you will now often be able to access the text content of valid ODT documents. The issue suggests variations in a simple file created on its own by LibreOffice, and a similar file created by PhpWord. Both are unit-tested.

A `getText` method is added to TextRun to facilitate testing (and can be useful on its own). It will return the concatenated texts of all elements of the text run.
oleibman added a commit to oleibman/PHPWord that referenced this issue Nov 22, 2023
Fix PHPOffice#2493. There is much that the ODT Reader ignores. This change adds support for the `text:section`, `text:span`, `text:s`, and `text:tab` tags, thereby handling multiple sections, text runs, tab characters, and multiple spaces. There will still be many omissions (e.g. styles and tables), but you will now often be able to access the text content of valid ODT documents. The issue suggests variations in a simple file created on its own by LibreOffice, and a similar file created by PhpWord. Both are unit-tested.

A `getText` method is added to TextRun to facilitate testing (and can be useful on its own). It will return the concatenated texts of all elements of the text run.
Progi1984 pushed a commit to oleibman/PHPWord that referenced this issue Nov 30, 2023
Fix PHPOffice#2493. There is much that the ODT Reader ignores. This change adds support for the `text:section`, `text:span`, `text:s`, and `text:tab` tags, thereby handling multiple sections, text runs, tab characters, and multiple spaces. There will still be many omissions (e.g. styles and tables), but you will now often be able to access the text content of valid ODT documents. The issue suggests variations in a simple file created on its own by LibreOffice, and a similar file created by PhpWord. Both are unit-tested.

A `getText` method is added to TextRun to facilitate testing (and can be useful on its own). It will return the concatenated texts of all elements of the text run.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
2 participants