Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: Support EMF image #1480

Open
bakkan opened this issue Sep 29, 2018 · 19 comments
Open

Feature request: Support EMF image #1480

bakkan opened this issue Sep 29, 2018 · 19 comments
Assignees
Milestone

Comments

@bakkan
Copy link

bakkan commented Sep 29, 2018

This is:

  • [√] a bug report
  • [√] a feature request

Expected Behavior

Support EMF image.

Failure Information

Throws PhpOffice\PhpWord\Exception\InvalidImageException exception.
Exception message :
Invalid image: zip:https:///Users/xxx/Downloads/xxxx.docx#word/media/image.emf
#0 /works/shared/laravel/vendor/phpoffice/phpword/src/PhpWord/Element/Image.php(149): PhpOffice\PhpWord\Element\Image->checkImage()
#1 [internal function]: PhpOffice\PhpWord\Element\Image->__construct('zip:https:///Users/hu...', NULL, false, 'Picture 18')

How to Reproduce

Document file contains emf format images.
Google emf I got this page: https://fileinfo.com/extension/emf

<?php
use PhpOffice\PhpWord\IOFactory;
$file = '/path/to/file.docx';
$phpWord = IOFactory::load($file);
$sections = $phpWord->getSections();
foreach ($sections as $section) {
      $elements = $section->getElements();
      foreach ($elements as $element) {
            // do something else...
      }
}

Context

  • PHP version: PHP 7.1.16
  • PHPWord version: 0.15.0
@bakkan
Copy link
Author

bakkan commented Sep 29, 2018

PHPWord uses getimagesize() function to get image info, getimagesize() doesn't support emf format. 😂😂

@mynukeviet
Copy link

I using phpword: dev-master and see error

[Mon, 08 Apr 2019 09:26:50 +0700] [127.0.0.1] [Error(1): Uncaught exception 'PhpOffice\PhpWord\Exception\InvalidImageException' with message 'Invalid image: zip:https:///opt/lampp/temp/php4WlqvI#word/media/image1.emf' in /media/hongoctrien/DATA/MyHost/NukeViet/module-nvonlinetest-01.vn/vendor/phpoffice/phpword/src/PhpWord/Element/Image.php:418
Stack trace:
#0 /media/hongoctrien/DATA/MyHost/NukeViet/module-nvonlinetest-01.vn/vendor/phpoffice/phpword/src/PhpWord/Element/Image.php(149): PhpOffice\PhpWord\Element\Image->checkImage()
#1 [internal function]: PhpOffice\PhpWord\Element\Image->__construct('zip:https:///opt/lamp...')
#2 /media/hongoctrien/DATA/MyHost/NukeViet/module-nvonlinetest-01.vn/vendor/phpoffice/phpword/src/PhpWord/Element/AbstractContainer.php(145): ReflectionClass->newInstanceArgs(Array)
#3 [internal function]: PhpOffice\PhpWord\Element\AbstractContainer->addElement('Image', 'zip:https:///opt/lamp...')
#4 /media/hongoctrien/DATA/MyHost/NukeViet/module-nvonlinetest-01.vn/vendor/phpoffice/phpword/src/PhpWord/Element/AbstractContainer.php(112): call_user_func_array(Array] [FILE: /vendor/phpoffice/phpword/src/PhpWord/Element/Image.php] [LINE: 418]

@derKroisi
Copy link

Any news on this issue? Will this be addressed sooner or later?

@ThomazPom
Copy link

ThomazPom commented May 7, 2022

I encountered this error just now. I guess EMF format is becoming more commonly used in modern docx files

@RomMad
Copy link

RomMad commented Oct 3, 2022

The same problem for me today. Any news about this issue ?

@gurpreetbhatoa
Copy link

There isn't any support for .emf file but there is a workaround

  • Change extension of your .docx template to .zip
  • Unzip into a directory
  • Search for .eml file creating the issue in the xml files (I used VSCode Find in Folder)
  • Save the .eml file as .jpeg
  • Update the XML file containing .eml file to .jpeg
  • Compress content of the directory again
  • Change extension to .docx from .zip and try again.

@ThomazPom
Copy link

ThomazPom commented Oct 14, 2022

Workaround by code :
PHPWord includes template processing for this.

include 'vendor/autoload.php';
$templateProcessor = new \PhpOffice\PhpWord\TemplateProcessor('test2.docx');
$templateProcessor->setValue('name', 'myvar');
$templateProcessor->saveAs('./xx.docx');

https://phpword.readthedocs.io/en/latest/templates-processing.html
https://stackoverflow.com/a/53039632/4693790

You can avoid using TemplateProcessing as your need is only to replace .emf references

You may write a prepareDocxReplaceEMF($docxPath) function that do all of these actions on a docx file, before working with phpword
renaming docx to zip is not needed .

Use PHP ZipArchive to extract "YOURDOC.docx\word_rels\document.xml.rels"
https://www.php.net/manual/en/ziparchive.extractto.php

Replace EMF references in file
https://stackoverflow.com/a/69155428/4693790

Use PHP ZipArchive to zip document.xml.rels back
https://www.php.net/manual/en/ziparchive.addfile.php

Use PHP ZipArchive to extract emf file
https://www.php.net/manual/en/ziparchive.extractto.php

Use ImageMagick to convert the EMF FILE
https://imagemagick.org/script/formats.php
https://www.php.net/manual/fr/book.imagick.php

Use PHP ZipArchive to zip jpeg file back
https://www.php.net/manual/en/ziparchive.addfile.php

@user3470
Copy link

Workaround that worked for me

    private function removeImageReferences($zip, $placeholderImagePath)
    {
        $relsPath = 'word/_rels/document.xml.rels';
        $relsContent = $zip->getFromName($relsPath);

        $relsXml = new SimpleXMLElement($relsContent);
        $imagePaths = [];

        foreach ($relsXml->Relationship as $relationship) {
            if (strpos($relationship['Type'], 'image') !== false) {
                // Store the original image path
                $imagePaths[] = 'word/' . $relationship['Target'];

                // Replace the image target with a placeholder image reference
                $placeholderImageTarget = 'media/placeholder.png';
                $relationship['Target'] = $placeholderImageTarget;
            }
        }

        // Update the relationships file
        $zip->deleteName($relsPath);
        $zip->addFromString($relsPath, $relsXml->asXML());

        // Delete the original image files
        foreach ($imagePaths as $imagePath) {
            $zip->deleteName($imagePath);
        }

        // Add the placeholder image to the zip archive
        $zip->addFile($placeholderImagePath, 'word/' . $placeholderImageTarget);
    }


    private function getPlaceholderImage()
    {
        $placeholderImagePath = 'placeholder.png';

        if (!Storage::disk('local')->exists($placeholderImagePath)) {
            $width = 1;
            $height = 1;
            $color = [255, 255, 255]; // RGB value for white color
            $image = imagecreatetruecolor($width, $height);
            $color = imagecolorallocate($image, $color[0], $color[1], $color[2]);
            imagefilledrectangle($image, 0, 0, $width - 1, $height - 1, $color);
            ob_start();
            imagepng($image);
            $imageData = ob_get_contents();
            ob_end_clean();
            Storage::disk('local')->put($placeholderImagePath, $imageData);
        }

        return storage_path('app/' . $placeholderImagePath);
    }

Then

            $tempFilePath = tempnam(sys_get_temp_dir(), 'doc');
            file_put_contents($tempFilePath, $response->getBody()->getContents());

            $zip = new ZipArchive();
            $placeholderImagePath = $this->getPlaceholderImage();

            $zip->open($tempFilePath);
            $this->removeImageReferences($zip, $placeholderImagePath);
            $zip->close();

             $phpWord = IOFactory::load($tempFilePath);

@websuasive
Copy link

websuasive commented May 25, 2023

In the unlikely event that this is going to be fixed at anytime soon due to what seems to be poor support of EMF images with PHP, is it worth catching this error and replacing the image with a placeholder 'can't be found image/message'?

Then, at least the library can be used for any documents which use an EMF image.

@Progi1984 Progi1984 self-assigned this Sep 16, 2023
@Progi1984 Progi1984 added this to the 1.2.0 milestone Sep 16, 2023
@thomasb88
Copy link

So, PHP getimagesize and getimagesizefromstring accept the following formats
https://www.php.net/manual/fr/image.constants.php

It is not including emf file (neither svg...).

So this could be a PHP Feature Request, but in the meantime, we could try to implement it "PHP like" on PHPWord.

In Php code:
PHP_FUNCTION(getimagesize)
{
php_getimagesize_from_any(INTERNAL_FUNCTION_PARAM_PASSTHRU, FROM_PATH);
}
/* }}} */

/* {{{ Get the size of an image as 4-element array */
PHP_FUNCTION(getimagesizefromstring)
{
php_getimagesize_from_any(INTERNAL_FUNCTION_PARAM_PASSTHRU, FROM_DATA);
}

It then get the stream, and call
php_getimagesize_from_stream

To know which kind of file it is, it call then php_getimagesize_from_stream

For each kind of defined type, it check a specific number of bytes, and then the corresponding content.

For example, for jpeg, the 3 first bytes should be
PHPAPI const char php_sig_jpg[3] = {(char) 0xff, (char) 0xd8, (char) 0xff};

Then it apply a image type specific function to get the related image size. For example, for PSD image type;

"static struct gfxinfo *php_handle_psd (php_stream * stream)
{
struct gfxinfo *result = NULL;
unsigned char dim[8];

if (php_stream_seek(stream, 11, SEEK_CUR))
	return NULL;

if (php_stream_read(stream, (char*)dim, sizeof(dim)) != sizeof(dim))
	return NULL;

result = (struct gfxinfo *) ecalloc(1, sizeof(struct gfxinfo));
result->height   =  (((unsigned int)dim[0]) << 24) + (((unsigned int)dim[1]) << 16) + (((unsigned int)dim[2]) << 8) + ((unsigned int)dim[3]);
result->width    =  (((unsigned int)dim[4]) << 24) + (((unsigned int)dim[5]) << 16) + (((unsigned int)dim[6]) << 8) + ((unsigned int)dim[7]);

return result;

}"

Or for BMP file
"static struct gfxinfo *php_handle_bmp (php_stream * stream)
{
struct gfxinfo *result = NULL;
unsigned char dim[16];
int size;

if (php_stream_seek(stream, 11, SEEK_CUR))
	return NULL;

if (php_stream_read(stream, (char*)dim, sizeof(dim)) != sizeof(dim))
	return NULL;

size   = (((unsigned int)dim[ 3]) << 24) + (((unsigned int)dim[ 2]) << 16) + (((unsigned int)dim[ 1]) << 8) + ((unsigned int) dim[ 0]);
if (size == 12) {
	result = (struct gfxinfo *) ecalloc (1, sizeof(struct gfxinfo));
	result->width    =  (((unsigned int)dim[ 5]) << 8) + ((unsigned int) dim[ 4]);
	result->height   =  (((unsigned int)dim[ 7]) << 8) + ((unsigned int) dim[ 6]);
	result->bits     =  ((unsigned int)dim[11]);
} else if (size > 12 && (size <= 64 || size == 108 || size == 124)) {
	result = (struct gfxinfo *) ecalloc (1, sizeof(struct gfxinfo));
	result->width    =  (((unsigned int)dim[ 7]) << 24) + (((unsigned int)dim[ 6]) << 16) + (((unsigned int)dim[ 5]) << 8) + ((unsigned int) dim[ 4]);
	result->height   =  (((unsigned int)dim[11]) << 24) + (((unsigned int)dim[10]) << 16) + (((unsigned int)dim[ 9]) << 8) + ((unsigned int) dim[ 8]);
	result->height   =  abs((int32_t)result->height);
	result->bits     =  (((unsigned int)dim[15]) <<  8) +  ((unsigned int)dim[14]);
} else {
	return NULL;
}

return result;

}"

So, we could implement a glue, that can rely on the file name (.xxx) or on the first byte definition for EMF, and then retrieve the related content from the specification.

More precisely
"1.3.1 Metafile Structure
An EMF metafile begins with a EMR_HEADER record (section 2.3.4.2), which includes the metafile
version, its size, the resolution of the device on which the picture was created, and it ends with an
EMR_EOF record (section 2.3.4.1). Between them are records that specify the rendering of the image."

And then
"2.3.4.2 EMR_HEADER Record Types
The EMR_HEADER record is the starting point of an EMF metafile. It specifies properties of the
device on which the image in the metafile was recorded; this information in the header record makes
it possible for EMF metafiles to be independent of any specific output device.
The following are the EMR_HEADER record types.
Name Section Description
EmfMetafileHeader 2.3.4.2.1 The original EMF header record.
EmfMetafileHeaderExtension1 2.3.4.2.2 The header record defined in the first extension to EMF, which added
support for OpenGL records and an optional internal pixel format
descriptor.<62>
EmfMetafileHeaderExtension2 2.3.4.2.3 The header record defined in the second extension to EMF, which
added the capability of measuring display dimensions in
micrometers.<63>
EMF metafiles SHOULD be created with an EmfMetafileHeaderExtension2 header record.
The generic structure of EMR_HEADER records is specified as follows.
...
Type (4 bytes): An unsigned integer that identifies this record type as EMR_HEADER. This value is
0x00000001
...
The value of the Size field can be used to distinguish between the different EMR_HEADER record types
listed earlier in this section. There are three possible headers:
 The EmfMetafileHeader record. The fixed-size part of this header is 88 bytes, and it contains a
Header object (section 2.2.9).
 The EmfMetafileHeaderExtension1 record. The fixed-size part of this header is 100 bytes, and it
contains a Header object and a HeaderExtension1 object (section 2.2.10).
 The EmfMetafileHeaderExtension2 record. The fixed-size part of this header is 108 bytes, and it
contains a Header object, a HeaderExtension1 object, and a HeaderExtension2 object (section
2.2.11)."

Then in 2.2.9
"Bounds (16 bytes): A RectL object ([MS-WMF] section 2.2.2.19) that specifies the rectangular
inclusive-inclusive bounds in logical units of the smallest rectangle that can be drawn around
the image stored in the metafile."

Which get us in
https://learn.microsoft.com/en-us/openspecs/windows_protocols/ms-wmf/4813e7fd-52d0-4f42-965f-228c8b7488d2
section 2.2.2.19
"2.2.2.19 RectL Object
The RectL Object defines a rectangle.
...
Left (4 bytes): A 32-bit signed integer that defines the x coordinate, in logical coordinates, of the
upper-left corner of the rectangle.
Top (4 bytes): A 32-bit signed integer that defines the y coordinate, in logical coordinates, of the
upper-left corner of the rectangle.
Right (4 bytes): A 32-bit signed integer that defines the x coordinate, in logical coordinates, of the
lower-right corner of the rectangle.
Bottom (4 bytes): A 32-bit signed integer that defines y coordinate, in logical coordinates, of the
lower-right corner of the rectangle.
A rectangle defined with a RectL Object is filled up to— but not including—the right column and
bottom row of pixels"

@thomasb88
Copy link

Hi Progi1984,

I hadn't the time to install the whole environment to be able to test looking to the project standards, but i wrote a glue for getimagesize that is working on my environment.

As the specification is a little bit painful, i copy below the function, hoping it could help you in managing this ticket.

"/**
* Get image size from filename (glue over PHP that don't manage all image file types, like emf).
*
* First try to use PHP function getimagesize, then implement a custom glue for unsupported formats.
* For unsupported formats, check also the filename extension.
*
* @param string $filename
*
* @return null|array
*/
private function getImageSizeGlue($filename, &$image_info = null)
{
$imageData = @getimagesize($filename, $image_info);
if (!is_array($imageData)) {
$image_path_parts = pathinfo($this->source);
$source_extension = (array_key_exists('extension', $image_path_parts))?$image_path_parts['extension']:'';
$hexaImageString = bin2hex($this->getImageString());
switch($source_extension){
case 'emf':
// As Of EMF Specification, chapter 1.3.3, Data in metafile records is stored in little-endian format
$tag_format = bin2hex(implode(array_reverse(str_split(hex2bin(substr($hexaImageString, 0, 8))))));
if('00000001' != $tag_format){
throw new InvalidImageException(sprintf('Invalid %s image format: Bad EMR_READER tag (%s instead of 00000001)', $source_extension, $tag_format));
}
$existing_format_version = ['00000058' => 'EmfMetafileHeader', '00000064' => 'EmfMetafileHeaderExtension1', '0000006c' => 'EmfMetafileHeaderExtension2'];
$format_version = bin2hex(implode(array_reverse(str_split(hex2bin(substr($hexaImageString, 8, 8))))));
if(!in_array($format_version, array_keys($existing_format_version))){
throw new InvalidImageException(sprintf('Invalid %s image format: Invalid Header Size (%s)', $source_extension, $format_version));
}
$record_signature = bin2hex(implode(array_reverse(str_split(hex2bin(substr($hexaImageString, 80, 8))))));
if('464d4520' != $record_signature){
throw new InvalidImageException(sprintf('Invalid %s image format: Bad ENHMETA_SIGNATURE Record Signature (%s)', $source_extension, $record_signature));
}
$emf_version = bin2hex(implode(array_reverse(str_split(hex2bin(substr($hexaImageString, 88, 8))))));
if('00010000' != $emf_version){
throw new InvalidImageException(sprintf('Invalid %s image format: Bad Version (%s)', $source_extension, $emf_version));
}
$header_reserved = bin2hex(implode(array_reverse(str_split(hex2bin(substr($hexaImageString, 116, 4))))));
if('0000' != $header_reserved){
throw new InvalidImageException(sprintf('Invalid %s image format: Bad Reserved Tag (%s)', $source_extension, $header_reserved));
}
if(hexdec($format_version) > 88){
$bOpenGL = bin2hex(implode(array_reverse(str_split(hex2bin(substr($hexaImageString, 116, 4))))));
if(!in_array($bOpenGL, ['00000000', '00000001'])){
throw new InvalidImageException(sprintf('Invalid %s image format: Bad OpenGL Tag (%s)', $source_extension, $bOpenGL));
}
}
// RectL Object Image Size in Pixels. As Of MS-WMF specification, A Rectangle defined with a RectL Object is filled up to - but not including - the right column and bottom row of pixel.
$image_bounds_raw = substr($hexaImageString, 16, 32);
$image_bounds = str_split($image_bounds_raw, 8);
foreach($image_bounds as $bound_index => $bound_value){
$image_bounds[$bound_index] = bin2hex(implode(array_reverse(str_split(hex2bin($bound_value)))));
}
$height_in_pixels = abs(hexdec($image_bounds[0]) - hexdec($image_bounds[3])) + 1;
$width_in_pixels = abs(hexdec($image_bounds[1]) - hexdec($image_bounds[2])) + 1;
$image_type = self::IMAGETYPE_EMF;
$size_string = sprintf('height="%s" width="%s"', $height_in_pixels, $width_in_pixels);
$imageData = [$height_in_pixels, $width_in_pixels, $image_type, $size_string];
break;
default:
throw new InvalidImageException(sprintf('Unsupported image format: %s from file ', $source_extension, $this->source));
break;
}
}
return $imageData;
}"

@thomasb88
Copy link

But this only solve the CheckImage Problem.

There is also another problem on parseImage on PhpWord/Shared/Html.php on line 960

@thomasb88
Copy link

My Bad, the image type should also be modified

@ThomazPom
Copy link

I got around this a year ago, this never bothered me again.
I prepare any docx via the method 2 i enumerate here #1480 (comment)

@thomasb88
Copy link

Well, EMF to JPEG is not a lossless conversion.

That's why i updated PHPWord to manage emf image. But you're right that if you don't mind about image quality, your solution is a good workaround.

@Progi1984
Copy link
Member

Someone has a file with EMF/WMF file, please ?

@thomasb88
Copy link

I have one, but it is my customer one, so it can't be used like that.

So i used the trial version of the Metafile Companion Software, and then produce a random image that i inserted on a random docx file.
Docx with Emf Image for Test.docx

@thomasb88
Copy link

Hope it helps

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

10 participants