Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reading docx with <h?> containing anchors the ancors just get removed. #1792

Closed
bozzit opened this issue Dec 31, 2019 · 1 comment · Fixed by #2433
Closed

Reading docx with <h?> containing anchors the ancors just get removed. #1792

bozzit opened this issue Dec 31, 2019 · 1 comment · Fixed by #2433

Comments

@bozzit
Copy link

bozzit commented Dec 31, 2019

Describe the Bug

if you are reading a word document that contains headings with anchors in them getContent() returns the heading with no anchor or anchor text.

Steps to Reproduce

word document containing: (Sample Docx Attached)
Biographies <- H1
Regular Anchor Aaaa bozz <- p /* Bozz are hyperlinks to http:https://www.xyz.com /
AAAAA Bozz, Anchor in Heading <-- h2 /
Bozz are hyperlinks to http:https://www.xyz.com */
On January 31st, 2019, Lorem ipsum dolor sit amet, consectetur adipiscing elit. Vivamus hendrerit pellentesque nisl. Vivamus lobortis enim consequat egestas suscipit. In convallis metus id erat eleifend consectetur. Donec tincidunt, dui quis congue sollicitudin, metus arcu mattis erat, sed rutrum eros odio quis ex. Vestibulum sit amet viverra est. Nullam ultrices commodo metus vel iaculis. Fusce nec blandit leo. Curabitur id lacinia libero. Etiam nunc arcu, pharetra sit amet felis non, congue bibendum magna. Duis semper nec metus ac vehicula.

Please provide a code sample that reproduces the issue.

<?php
require_once ('bootstrap.php');

$phpWord = \PhpOffice\PhpWord\IOFactory::load('test.docx');
$htmlWriter = new \PhpOffice\PhpWord\Writer\HTML($phpWord);
$content = $htmlWriter->getContent();

echo $content;

Expected Behavior

Biographies

Regular Anchor Aaaa bozz

AAAAA , Ancor **bozz** in Heading

On January 31st, 2019, Lorem ipsum dolor sit amet, consectetur adipiscing elit. Vivamus hendrerit pellentesque nisl. Vivamus lobortis enim consequat egestas suscipit. In convallis metus id erat eleifend consectetur. Donec tincidunt, dui quis congue sollicitudin, metus arcu mattis erat, sed rutrum eros odio quis ex. Vestibulum sit amet viverra est. Nullam ultrices commodo metus vel iaculis. Fusce nec blandit leo. Curabitur id lacinia libero. Etiam nunc arcu, pharetra sit amet felis non, congue bibendum magna. Duis semper nec metus ac vehicula.

Current Behavior

Biographies

Regular Ancor Aaaa bozz

AAAAA , Ancor in Heading

On January 31st, 2019, Lorem ipsum dolor sit amet, consectetur adipiscing elit. Vivamus hendrerit pellentesque nisl. Vivamus lobortis enim consequat egestas suscipit. In convallis metus id erat eleifend consectetur. Donec tincidunt, dui quis congue sollicitudin, metus arcu mattis erat, sed rutrum eros odio quis ex. Vestibulum sit amet viverra est. Nullam ultrices commodo metus vel iaculis. Fusce nec blandit leo. Curabitur id lacinia libero. Etiam nunc arcu, pharetra sit amet felis non, congue bibendum magna. Duis semper nec metus ac vehicula.

Context

Please fill in your environment information:

  • PHP 7.1.33
  • PHPWord Version: 0.17.0 and tried [dev-develop]
    test.docx

Thanks for any insight, workaround or nudge in the right direction.

@bozzit
Copy link
Author

bozzit commented Sep 23, 2020

Hi I think the following will fixe the issue, if this helps anyone

PHPWord/src/PhpWord/Reader/Word2007/AbstractPart.php

Line 151:

  } elseif ($headingDepth !== null) {
         // Heading or Title
         $textContent = null;
         // ******
         // Since Headings can contain Hyperlinks 
         // Changed  the following line
         // $nodes = $xmlReader->getElements('w:r', $domNode);
         // ******
        $nodes = $xmlReader->getElements('w:r|w:hyperlink', $domNode); 
         if ($nodes->length === 1) {
             $textContent = htmlspecialchars($xmlReader->getValue('w:t', $nodes->item(0)), ENT_QUOTES, 'UTF-8');
         } else {
             $textContent = new TextRun($paragraphStyle);
             foreach ($nodes as $node) {
                 $this->readRun($xmlReader, $node, $textContent, $docPart, $paragraphStyle);
             }
         }
         $parent->addTitle($textContent, $headingDepth);
     } else {

If this could be merged in if the fix makes sense that would be great.

Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging a pull request may close this issue.

1 participant