Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ODText Reader : Improve Section Reader #2507

Merged
merged 1 commit into from
Nov 30, 2023

Conversation

oleibman
Copy link
Contributor

Fix #2493. There is much that the ODT Reader ignores. This change adds support for the text:section, text:span, text:s, and text:tab tags, thereby handling multiple sections, text runs, tab characters, and multiple spaces. There will still be many omissions (e.g. styles and tables), but you will now often be able to access the text content of valid ODT documents. The issue suggests variations in a simple file created on its own by LibreOffice, and a similar file created by PhpWord. Both are unit-tested.

A getText method is added to TextRun to facilitate testing (and can be useful on its own). It will return the concatenated texts of all elements of the text run.

Description

Please include a summary of the change and which issue is fixed. Please also include relevant motivation and context.

Fixes # (issue)

Checklist:

  • I have run composer run-script check --timeout=0 and no errors were reported
  • The new code is covered by unit tests (check build/coverage for coverage report)
  • I have updated the documentation to describe the changes

@coveralls
Copy link

coveralls commented Nov 22, 2023

Coverage Status

coverage: 95.532% (+0.04%) from 95.492%
when pulling bdcd104 on oleibman:word2493b
into b0e1e41 on PHPOffice:master.

@Progi1984 Progi1984 added this to the 1.2.0 milestone Nov 24, 2023
Fix PHPOffice#2493. There is much that the ODT Reader ignores. This change adds support for the `text:section`, `text:span`, `text:s`, and `text:tab` tags, thereby handling multiple sections, text runs, tab characters, and multiple spaces. There will still be many omissions (e.g. styles and tables), but you will now often be able to access the text content of valid ODT documents. The issue suggests variations in a simple file created on its own by LibreOffice, and a similar file created by PhpWord. Both are unit-tested.

A `getText` method is added to TextRun to facilitate testing (and can be useful on its own). It will return the concatenated texts of all elements of the text run.
@Progi1984 Progi1984 self-requested a review November 30, 2023 07:39
@Progi1984 Progi1984 changed the title Improve ODText Content Reader ODText Reader : Improve Section Reader Nov 30, 2023
@Progi1984 Progi1984 merged commit e76b701 into PHPOffice:master Nov 30, 2023
13 checks passed
@Progi1984 Progi1984 deleted the word2493b branch November 30, 2023 11:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

Impossible to read ODT file previously saved by PHPWord as ODText
3 participants