-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How Read Doc Or DocX #2106
Comments
@lucaswhob something like this? $objReader = \PhpOffice\PhpWord\IOFactory::createReader('Word2007');
$phpWord = $objReader->load('my/file.docx'); // instance of \PhpOffice\PhpWord\PhpWord
$text = '';
foreach ($phpWord->getSections() as $section) {
foreach ($section->getElements() as $element) {
if ($element instanceof \PhpOffice\PhpWord\Element\Text) {
$text .= $element->getText();
}
// and so on for other element types (see src/PhpWord/Element)
}
} |
@gisostallenberg The reader documentation of |
This was just a simple example. Sections seem to also contain TextRun's (these are containers), which contain sub elements. <?php
use PhpOffice\PhpWord\Element\AbstractContainer;
use PhpOffice\PhpWord\Element\Text;
use PhpOffice\PhpWord\IOFactory as WordIOFactory;
require_once __DIR__.'/vendor/autoload.php';
$objReader = WordIOFactory::createReader('Word2007');
$phpWord = $objReader->load('file.docx'); // instance of \PhpOffice\PhpWord\PhpWord
$text = '';
function getWordText($element) {
$result = '';
if ($element instanceof AbstractContainer) {
foreach ($element->getElements() as $element) {
$result .= getWordText($element);
}
} elseif ($element instanceof Text) {
$result .= $element->getText();
}
// and so on for other element types (see src/PhpWord/Element)
return $result;
}
foreach ($phpWord->getSections() as $section) {
foreach ($section->getElements() as $element) {
$text .= getWordText($element);
}
}
echo $text; |
Might I suggest a small improvement to the recursive method since it has the opportunity to miss text from several object types
|
Sorry for hijacking the topic, but I have a related question. I am also walking the document object tree in some recursive implementation. I try to extract a "table of contents", so I am looking for The XML looks like this <w:p xmlns:wp14="http:https://schemas.microsoft.com/office/word/2010/wordml" w:rsidP="02051CF4" w14:paraId="4E47C1E7" wp14:textId="5ECEFD8F">
<w:pPr>
<w:pStyle w:val="Title"/>
<w:rPr>
<w:rFonts w:ascii="Calibri Light" w:hAnsi="Calibri Light" w:eastAsia="" w:cs=""/>
<w:sz w:val="56"/>
<w:szCs w:val="56"/>
</w:rPr>
</w:pPr>
<w:bookmarkStart w:name="_GoBack" w:id="0"/>
<w:bookmarkEnd w:id="0"/>
<w:r w:rsidR="7A933B85">
<w:rPr/>
<w:t xml:space="preserve">The </w:t>
</w:r>
<w:proofErr w:type="spellStart"/>
<w:r w:rsidR="7A933B85">
<w:rPr/>
<w:t>document</w:t>
</w:r>
<w:proofErr w:type="spellEnd"/>
<w:r w:rsidR="7A933B85">
<w:rPr/>
<w:t xml:space="preserve"> title</w:t>
</w:r>
</w:p> Do you have any suggestions? |
Thank you for showing how to take the content of a docx file. But I would like you to show me how I can take the content of a doc file please? |
Yes, I am trying to convert the doc extension file to text. In the examples given, we can convert the docx file to text. How can we convert a doc extension file to text? |
A method like |
|
But how can i do it with my code? Or Can you give me a code to do it? |
Hello
Thank you For Best Library Word Processing
I Need Read Docx File And Extract : 1- Text 2- All Images 3- All Link with Title
Please Help Me And Guide Me For Reading File Docx
I Read Document and All your Examples But I Can not Found Read Element and Section Example
Please Help Me
thx
The text was updated successfully, but these errors were encountered: