Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTML generated from a Docx is way too big #2203

Open
Mouke opened this issue Mar 9, 2022 · 0 comments
Open

HTML generated from a Docx is way too big #2203

Mouke opened this issue Mar 9, 2022 · 0 comments

Comments

@Mouke
Copy link

Mouke commented Mar 9, 2022

Describe the Bug

I use PHPWord (and PHPSpreadsheet) to convert word/excel (and Openoffice equivalent) files into PDF (by converting them in HTML then using DomPDF). When my file has pictures in it, the rendered HTML size exploses : for instance a 1MB .docx file goes into a 36MB html string. (The PDF conversion then brings it back to 26MB, which is still way too much) After dumping the HTML, I would guess it's the base64 conversion of the pictures that makes everything go crazy.

Steps to Reproduce

Using that file :
test-long.docx

<?php
require __DIR__ . '/vendor/autoload.php';

$path = 'PATH_TO_FILE';

$phpWord = \PhpOffice\PhpWord\IOFactory::load(file_get_contents($path), 'Word2007');
$htmlWriter = new \PhpOffice\PhpWord\Writer\HTML($phpWord);
$html = $htmlWriter->getContent();
echo strlen($html);

Expected Behavior

I would expect it to be more concise. I understand that the conversion may produce a bigger filer, but in that case it's more than 10x bigger.

Context

Please fill in your environment information:

  • PHP 7.4.28 (cli) (built: Mar 3 2022 09:59:56) ( NTS )
    Copyright (c) The PHP Group
    Zend Engine v3.4.0, Copyright (c) Zend Technologies
  • PHPWord Version 0.18.2
  • Server is a dockerized Ubuntu based on the php:7.4-fpm image.

Best regards,

@Mouke Mouke changed the title HTML generated from a Docx is way too HTML generated from a Docx is way too big Mar 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

1 participant