-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"MsDoc" reader fails to open and/or correctly process MS Word 97-2003 (*.doc) files #1318
Comments
Hey @Progi1984 , any luck with this? I am seeing similar behavior. It seems unable to read a pretty standard Word97 doc, no special formatting. Instead I get broken, fragmented text and/or not getting other sections entirely. I was able to get much better results from a simple fread style function. But that was only useful for plaintext extraction, no style or formatting data unfortunately.
|
bad |
Any updates on this? Trying to convert a .doc file to pdf, it works, but in the pdf part of the text is cut off and the italics are gone. ` use PhpOffice\PhpWord\IOFactory; Settings::setPdfRendererName(Settings::PDF_RENDERER_DOMPDF); $phpWord = IOFactory::load('TEST2.doc', 'MsDoc'); |
This is:
Expected Behavior
The MS Word 97-2003 document (*.doc) would be correctly opened and correctly processed by
$phpWord = IOFactory::load($c_file_name, 'MsDoc'); // this line causes error
styles would be internally set in MsDoc.php in generatePhpWord() method:
Current Behavior
Errors, inconsistently different:
Notice: Uninitialized string offset: 327680 (or some other wildly large number)
Error traced in
getInt2d()
and/orgetInt1d()
of vendor\phpoffice\phpword\src\PhpWord\Reader\MsDoc.php (line 2317)or
Fatal error: Uncaught PhpOffice\PhpWord\Exception\Exception: Could not open resources/resources/n_466.doc for reading! File does not exist, or it is not readable. in D:\xxx\xxx\vendor\phpoffice\phpword\src\PhpWord\Shared\OLERead.php:78
or
Notice: Undefined property: stdClass::$styleSection
traced to vendor\phpoffice\phpword\src\PhpWord\Reader\MsDoc.php generatePhpWord()
or, when it manages to convert some test file, the layout is completely wrong:
no styles, line breaks in wrong places, parts of words are missing, table is not reproduced.
the elements recognized by the following snippet are of type Text, with failed recognition of paragraphs. A simple table has not been recognized at all.
Failure Information
I tried all possible versions of MS Word 97-2003 documents (created from MS Word 2007, or in MS Word 365). I tried to process downloaded files (i.e. from here n_466.doc or d466.doc), or I created new files manually in both available to me versions of MS Word (2007 and 365) and saved them as *.doc.
The provided set-up (see further) works OK with the same documents saved as .docx files (different reader class).
test_documents.zip
Version, copied from the composer.json:
"phpoffice/phpword": "^0.14.0",
or form composer.lock:
"name": "phpoffice/phpword",
"version": "v0.14.0",
"source": {
"type": "git",
"url": "https://github.com/PHPOffice/PHPWord.git",
"reference": "b614497ae6dd44280be1c2dda56772198bcd25ae"
},
How to Reproduce
This is a part of Symfony 4 project.
Service class:
Controller class:
Sample implementation of twig template
Context
The text was updated successfully, but these errors were encountered: