Skip to content

Extract embedded files from Office Open XML files

License

Notifications You must be signed in to change notification settings

dhedberg/unembedx

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

unembedx

Utility for quick extraction of embedded files from Office Open XML (OOXML) files.

Specifically, it unzips the document, looks through any compound files in the embeddings directory and extracts anything interesting into a series of numbered files. By default it will also attempt to determine the correct file extensions from the content of each file using libmagic.

If you're interested in something other than the files mentioned above, you can try running unzip on the document rather than this utility.

The current version has only been tested by extracting pdf files from PowerPoint presentations and might not handle your specific use case. Bug reports and pull requests are welcome.

Building

Install rust and run

cargo build --release

The resulting executable can be found at target/release/unembedx.

By default, compilation requires file-devel (fedora), libmagic-dev (debian, ubuntu) or equivalent. This allows unembedx to automatically append the correct file extension to the extracted files in some cases.

To build without the file extension logic, you can instead run

cargo build --release --no-default-features

Usage

Extract all embedded files into the current directory

./unembedx some-presentation.pptx

Use --help for more information.

License

MIT

About

Extract embedded files from Office Open XML files

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages