Skip to content

JayBizzle/doc-to-text

Repository files navigation

Extract text from a Word Doc

Latest Version on Packagist Software License Build Status Quality Score Total Downloads

This package provides a class to extract text from a Word Doc.

<?php

use Jaybizzle\DocToText\Doc;

echo Doc::getText('book.doc'); // returns the text from the doc

Requirements

Behind the scenes this package leverages antiword. You can verify if the binary is installed on your system by issuing this command:

which antiword

If it is installed it will return the path to the binary.

To install the binary you can use this command on Ubuntu or Debian:

apt-get install antiword

Installation

You can install the package via composer:

composer require jaybizzle/doc-to-text

Usage

Extracting text from a Doc is easy.

$text = (new Doc())
    ->setDoc('book.doc')
    ->text();

Or easier:

echo Doc::getText('book.doc');

By default the package will assume that the antiword command is located at /usr/bin/antiword. If it is located elsewhere pass its binary path to the constructor

$text = (new Doc('/custom/path/to/antiword'))
    ->setDoc('book.doc')
    ->text();

or as the second parameter to the getText static method:

echo Doc::getText('book.doc', '/custom/path/to/antiword');

Sometimes you may want to use antiword options. To do so you can set them up using the setOptions method.

$text = (new Doc())
    ->setDoc('table.doc')
    ->setOptions(['f', 'w 80'])
    ->text()
;

or as the third parameter to the getText static method:

echo Doc::getText('book.doc', null, ['f', 'w 80']);

Change log

Please see CHANGELOG for more information about what has changed recently.

Testing

 composer test

Security

If you discover any security related issues, please email [email protected] instead of using the issue tracker.

Credits

License

The MIT License (MIT). Please see License File for more information.