Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Word2007 Reader: Title not recognized in localized Word files #2422

Open
rasteiner opened this issue Jul 12, 2023 · 1 comment
Open

Word2007 Reader: Title not recognized in localized Word files #2422

rasteiner opened this issue Jul 12, 2023 · 1 comment

Comments

@rasteiner
Copy link

Describe the Bug

Currently when reading docx files, a Title is recognized by matching the styleId attribute of a paragraph instead of the actual "name" of the linked style.

The styleId unfortunately gets automatically localized by Word. E.g. if you have an English version of Word installed, the styleId for a H1 title is always "Heading1", if the same word file gets "re-saved" in Italian Word, that style gets automatically saved as Titolo1.
The actual "style name" (<w:name> in styles.xml) however is always "heading 1" no matter the language of Word. (Yes: ironically word translates the hidden id and doesn't translate the actually shown name)

Steps to Reproduce

  1. find a foreign language docx file. You can help yourself by going to google advanced search, selecting a different language (like German) and searching for something while limiting the file type to docx, like query "Beispiel filetype:docx". Download some files, and check if one actually uses headings styles.
    Also make sure the heading isn't also numbered (or you'll run into Word2007 Reader: Title not recognized when it's a list item #2421)
    DON'T SAVE YOUR FILE, or your word will translate the IDs!
  2. run this code:
<?php

use PhpOffice\PhpWord\Reader\Word2007;

require __DIR__ . '/vendor/autoload.php';

$reader = new Word2007();
$phpWord = $reader->load('Calcagno.docx');

if($phpWord->getTitles()->countItems()) {
    echo "Titles found\n";
} else {
    echo "No titles found\n";
}

Expected Behavior

Output:

Titles found

Current Behavior

No titles found

Context

PhpOffice\PhpWord\Reader\Word2007\AbstractPart::getHeadingDepth() currently relies on pattern matching a $paragraphStyle['styleName']. This isn't however the real Name of the style, but rather it's Id. A fix would be being able to actually load a style by id, and then match on the real name.

Please fill in your environment information:

  • PHP Version: 8.2
  • PHPWord Version: 1.1.0
@aadityapatil350
Copy link

Is this issue still open?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

2 participants