-
Notifications
You must be signed in to change notification settings - Fork 3
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
0 parents
commit 9b88c2c
Showing
15 changed files
with
1,903 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,24 @@ | ||
language: php | ||
|
||
before_install: | ||
- sudo apt-get install -y antiword | ||
|
||
php: | ||
- 7.0 | ||
- 7.1 | ||
- 7.2 | ||
|
||
env: | ||
matrix: | ||
- COMPOSER_FLAGS="--prefer-lowest" | ||
- COMPOSER_FLAGS="" | ||
|
||
before_script: | ||
- travis_retry composer self-update | ||
- travis_retry composer update ${COMPOSER_FLAGS} --no-interaction --prefer-source | ||
|
||
script: | ||
- phpunit --coverage-text --coverage-clover=coverage.clover | ||
|
||
after_script: | ||
- php vendor/bin/ocular code-coverage:upload --format=php-clover coverage.clover |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
# Changelog | ||
|
||
All Notable changes to `doc-to-text` will be documented in this file | ||
|
||
## 1.0.0 - 2018-12-04 | ||
|
||
Initial release |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,112 @@ | ||
# Extract text from a Word Doc | ||
|
||
[![Latest Version on Packagist](https://img.shields.io/packagist/v/jaybizzle/doc-to-text.svg?style=flat-square)](https://packagist.org/packages/jaybizzle/doc-to-text) | ||
[![Software License](https://img.shields.io/badge/license-MIT-brightgreen.svg?style=flat-square)](LICENSE.md) | ||
[![Build Status](https://img.shields.io/travis/jaybizzle/doc-to-text/master.svg?style=flat-square)](https://travis-ci.org/jaybizzle/doc-to-text) | ||
[![SensioLabsInsight](https://img.shields.io/sensiolabs/i/9d85e8dd-b444-4bef-a5d5-faa7f2d8d6bb.svg?style=flat-square)](https://insight.sensiolabs.com/projects/9d85e8dd-b444-4bef-a5d5-faa7f2d8d6bb) | ||
[![Quality Score](https://img.shields.io/scrutinizer/g/jaybizzle/doc-to-text.svg?style=flat-square)](https://scrutinizer-ci.com/g/jaybizzle/doc-to-text) | ||
[![Total Downloads](https://img.shields.io/packagist/dt/jaybizzle/doc-to-text.svg?style=flat-square)](https://packagist.org/packages/jaybizzle/doc-to-text) | ||
|
||
This package provides a class to extract text from a Word Doc. | ||
|
||
```php | ||
<?php | ||
|
||
use Jaybizzle\DocToText\Doc; | ||
|
||
echo Doc::getText('book.doc'); // returns the text from the doc | ||
``` | ||
|
||
## Requirements | ||
|
||
Behind the scenes this package leverages [antiword](https://en.wikipedia.org/wiki/Antiword). You can verify if the binary installed on your system by issuing this command: | ||
|
||
```bash | ||
which antiword | ||
``` | ||
|
||
If it is installed it will return the path to the binary. | ||
|
||
To install the binary you can use this command on Ubuntu or Debian: | ||
|
||
```bash | ||
apt-get install antiword | ||
``` | ||
|
||
## Installation | ||
|
||
You can install the package via composer: | ||
|
||
```bash | ||
composer require jaybizzle/doc-to-text | ||
``` | ||
|
||
## Usage | ||
|
||
Extracting text from a Doc is easy. | ||
|
||
```php | ||
$text = (new Doc()) | ||
->setDoc('book.doc') | ||
->text(); | ||
``` | ||
|
||
Or easier: | ||
|
||
```php | ||
echo Doc::getText('book.doc'); | ||
``` | ||
|
||
By default the package will assume that the `antiword` command is located at `/usr/bin/antiword`. | ||
If it is located elsewhere pass its binary path to the constructor | ||
|
||
```php | ||
$text = (new Doc('/custom/path/to/antiword')) | ||
->setDoc('book.doc') | ||
->text(); | ||
``` | ||
|
||
or as the second parameter to the `getText` static method: | ||
|
||
```php | ||
echo Doc::getText('book.doc', '/custom/path/to/antiword'); | ||
``` | ||
|
||
Sometimes you may want to use [antiword options](https://linux.die.net/man/1/antiword). To do so you can set them up using the `setOptions` method. | ||
|
||
```php | ||
$text = (new Doc()) | ||
->setDoc('table.doc') | ||
->setOptions(['layout', 'r 96']) | ||
->text() | ||
; | ||
``` | ||
|
||
or as the third parameter to the `getText` static method: | ||
|
||
```php | ||
echo Doc::getText('book.doc', null, ['layout', 'opw myP1$$Word']); | ||
``` | ||
|
||
## Change log | ||
|
||
Please see [CHANGELOG](CHANGELOG.md) for more information about what has changed recently. | ||
|
||
## Testing | ||
|
||
```bash | ||
composer test | ||
``` | ||
|
||
## Security | ||
|
||
If you discover any security related issues, please email [email protected] instead of using the issue tracker. | ||
|
||
## Credits | ||
|
||
- [Mark Beech](https://github.com/jaybizzle) | ||
- [All Contributors](../../contributors) | ||
|
||
## License | ||
|
||
The MIT License (MIT). Please see [License File](LICENSE.md) for more information. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,17 @@ | ||
<?xml version="1.0" encoding="UTF-8"?> | ||
<testsuites> | ||
<testsuite name="Test Suite" tests="10" assertions="10" errors="0" failures="0" skipped="0" time="0.073678"> | ||
<testsuite name="Jaybizzle\DocToText\Test\DocToTextTest" file="/Users/markbeech/Desktop/doc-to-text/tests/DocToTextTest.php" tests="10" assertions="10" errors="0" failures="0" skipped="0" time="0.073678"> | ||
<testcase name="it_can_extract_text_from_a_doc" class="Jaybizzle\DocToText\Test\DocToTextTest" classname="Jaybizzle.DocToText.Test.DocToTextTest" file="/Users/markbeech/Desktop/doc-to-text/tests/DocToTextTest.php" line="16" assertions="1" time="0.014135"/> | ||
<testcase name="it_provides_a_static_method_to_extract_text" class="Jaybizzle\DocToText\Test\DocToTextTest" classname="Jaybizzle.DocToText.Test.DocToTextTest" file="/Users/markbeech/Desktop/doc-to-text/tests/DocToTextTest.php" line="26" assertions="1" time="0.010763"/> | ||
<testcase name="it_can_handle_paths_with_spaces" class="Jaybizzle\DocToText\Test\DocToTextTest" classname="Jaybizzle.DocToText.Test.DocToTextTest" file="/Users/markbeech/Desktop/doc-to-text/tests/DocToTextTest.php" line="32" assertions="1" time="0.006766"/> | ||
<testcase name="it_can_handle_paths_with_single_quotes" class="Jaybizzle\DocToText\Test\DocToTextTest" classname="Jaybizzle.DocToText.Test.DocToTextTest" file="/Users/markbeech/Desktop/doc-to-text/tests/DocToTextTest.php" line="40" assertions="1" time="0.007283"/> | ||
<testcase name="it_can_handle_doctotext_options_without_starting_hyphen" class="Jaybizzle\DocToText\Test\DocToTextTest" classname="Jaybizzle.DocToText.Test.DocToTextTest" file="/Users/markbeech/Desktop/doc-to-text/tests/DocToTextTest.php" line="48" assertions="1" time="0.007513"/> | ||
<testcase name="it_can_handle_doctotext_options_with_starting_hyphen" class="Jaybizzle\DocToText\Test\DocToTextTest" classname="Jaybizzle.DocToText.Test.DocToTextTest" file="/Users/markbeech/Desktop/doc-to-text/tests/DocToTextTest.php" line="59" assertions="1" time="0.006695"/> | ||
<testcase name="it_can_handle_doctotext_options_with_mixed_hyphen_status" class="Jaybizzle\DocToText\Test\DocToTextTest" classname="Jaybizzle.DocToText.Test.DocToTextTest" file="/Users/markbeech/Desktop/doc-to-text/tests/DocToTextTest.php" line="70" assertions="1" time="0.008278"/> | ||
<testcase name="it_will_throw_an_exception_when_the_doc_is_not_found" class="Jaybizzle\DocToText\Test\DocToTextTest" classname="Jaybizzle.DocToText.Test.DocToTextTest" file="/Users/markbeech/Desktop/doc-to-text/tests/DocToTextTest.php" line="81" assertions="1" time="0.000954"/> | ||
<testcase name="it_will_throw_an_exception_when_the_binary_is_not_found" class="Jaybizzle\DocToText\Test\DocToTextTest" classname="Jaybizzle.DocToText.Test.DocToTextTest" file="/Users/markbeech/Desktop/doc-to-text/tests/DocToTextTest.php" line="91" assertions="1" time="0.004240"/> | ||
<testcase name="it_will_throw_an_exception_when_the_option_is_unknown" class="Jaybizzle\DocToText\Test\DocToTextTest" classname="Jaybizzle.DocToText.Test.DocToTextTest" file="/Users/markbeech/Desktop/doc-to-text/tests/DocToTextTest.php" line="101" assertions="1" time="0.007051"/> | ||
</testsuite> | ||
</testsuite> | ||
</testsuites> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,38 @@ | ||
{ | ||
"name": "jaybizzle/doc-to-text", | ||
"description": "Extract text from a pdf", | ||
"keywords": [ | ||
"jaybiizle", | ||
"doc-to-text" | ||
], | ||
"homepage": "https://github.com/jaybizzle/doc-to-text", | ||
"license": "MIT", | ||
"authors": [ | ||
{ | ||
"name": "Mark Beech", | ||
"email": "[email protected]", | ||
"homepage": "https://www.mark-beech.co.uk", | ||
"role": "Developer" | ||
} | ||
], | ||
"require": { | ||
"php" : "^7.0", | ||
"symfony/process": "^3.3|^4.0" | ||
}, | ||
"require-dev": { | ||
"phpunit/phpunit" : "^6.4|^7.0" | ||
}, | ||
"autoload": { | ||
"psr-4": { | ||
"Jaybizzle\\DocToText\\": "src" | ||
} | ||
}, | ||
"autoload-dev": { | ||
"psr-4": { | ||
"Jaybizzle\\DocToText\\Test\\": "tests" | ||
} | ||
}, | ||
"scripts": { | ||
"test": "phpunit" | ||
} | ||
} |
Oops, something went wrong.