-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Number of words in document #1089
Comments
I'm doing it this way: https://gist.github.com/Anexo/106c25dc4a99843936562ab71c5eef18 But I can not get the content under TextRun... any ideas? |
@mr-pack another idea.
This example is woefully incomplete though. Do not use it to count words. This is becausethe textrun's actually do not need the space at the beginning from the previous text, so extra markers need to be added to remove artificials spaces between words. Something like: |
@FBnil "You do not need to know it is a TextRun": the problem is with class class Title extends AbstractElement
{
// [...]
/**
* Create a new Title Element
*
* @param string|TextRun $text
* @param int $depth
*/
public function __construct($text, $depth = 1)
{
if (is_string($text)) {
$this->text = CommonText::toUTF8($text);
} elseif ($text instanceof TextRun) { // <-- THIS IS THE PROBLEM
$this->text = $text; // <-- THIS IS THE PROBLEM
}
// [...]
}
/**
* Get Title Text content
*
* @return string
*/
public function getText()
{
return $this->text;
}
// [...]
} Here's my solution, which extends yours: function ExtractText($obj, $nested = 0) {
$txt = "";
if (method_exists($obj, 'getSections')) {
foreach ($obj->getSections() as $section) {
$txt .= " " . ExtractText($section, $nested + 1);
}
} else if (method_exists($obj, 'getElements')) {
foreach ($obj->getElements() as $element) {
$txt .= " " . ExtractText($element, $nested + 1);
}
} else if (method_exists($obj, 'getText')) {
// --------------------------------------------------------------
// THIS IS THE DIFFERENT BLOCK
$extracted = $obj->getText();
if (is_string($extracted) === true) {
$txt .= $extracted;
} else {
$txt .= " " . ExtractText($extracted, $nested + 1);
}
// --------------------------------------------------------------
} else if (method_exists($obj, 'getRows')) {
foreach ($obj->getRows() as $row) {
$txt .= " " . ExtractText($row, $nested + 1);
}
} else if (method_exists($obj, 'getCells')) {
foreach ($obj->getCells() as $cell) {
$txt .= " " . ExtractText($cell, $nested + 1);
}
} else if (get_class($obj) != "PhpOffice\PhpWord\Element\TextBreak") {
$txt .= "(" . get_class($obj) . ")"; # unknown object, you need to add it
}
return $txt;
}
$text = ExtractText($phpWord->load($filename));
$text = str_replace(' ',"", $text );
$text = str_replace('•',"",$text );
$textArray = preg_split('/\s+/', $text );
$numberWords = count($textArray); Hope this helps! :-) Francesco |
What is the best and fast way
to count number of words in document
.doc .docx .pdf etc. ?
Thanks.
Want to back this issue? Post a bounty on it! We accept bounties via Bountysource.
The text was updated successfully, but these errors were encountered: