Skip to content

Commit

Permalink
It begins.
Browse files Browse the repository at this point in the history
  • Loading branch information
JayBizzle committed Dec 5, 2018
0 parents commit 9b88c2c
Show file tree
Hide file tree
Showing 15 changed files with 1,903 additions and 0 deletions.
24 changes: 24 additions & 0 deletions .travis.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
language: php

before_install:
- sudo apt-get install -y antiword

php:
- 7.0
- 7.1
- 7.2

env:
matrix:
- COMPOSER_FLAGS="--prefer-lowest"
- COMPOSER_FLAGS=""

before_script:
- travis_retry composer self-update
- travis_retry composer update ${COMPOSER_FLAGS} --no-interaction --prefer-source

script:
- phpunit --coverage-text --coverage-clover=coverage.clover

after_script:
- php vendor/bin/ocular code-coverage:upload --format=php-clover coverage.clover
7 changes: 7 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# Changelog

All Notable changes to `doc-to-text` will be documented in this file

## 1.0.0 - 2018-12-04

Initial release
112 changes: 112 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,112 @@
# Extract text from a Word Doc

[![Latest Version on Packagist](https://img.shields.io/packagist/v/jaybizzle/doc-to-text.svg?style=flat-square)](https://packagist.org/packages/jaybizzle/doc-to-text)
[![Software License](https://img.shields.io/badge/license-MIT-brightgreen.svg?style=flat-square)](LICENSE.md)
[![Build Status](https://img.shields.io/travis/jaybizzle/doc-to-text/master.svg?style=flat-square)](https://travis-ci.org/jaybizzle/doc-to-text)
[![SensioLabsInsight](https://img.shields.io/sensiolabs/i/9d85e8dd-b444-4bef-a5d5-faa7f2d8d6bb.svg?style=flat-square)](https://insight.sensiolabs.com/projects/9d85e8dd-b444-4bef-a5d5-faa7f2d8d6bb)
[![Quality Score](https://img.shields.io/scrutinizer/g/jaybizzle/doc-to-text.svg?style=flat-square)](https://scrutinizer-ci.com/g/jaybizzle/doc-to-text)
[![Total Downloads](https://img.shields.io/packagist/dt/jaybizzle/doc-to-text.svg?style=flat-square)](https://packagist.org/packages/jaybizzle/doc-to-text)

This package provides a class to extract text from a Word Doc.

```php
<?php

use Jaybizzle\DocToText\Doc;

echo Doc::getText('book.doc'); // returns the text from the doc
```

## Requirements

Behind the scenes this package leverages [antiword](https://en.wikipedia.org/wiki/Antiword). You can verify if the binary installed on your system by issuing this command:

```bash
which antiword
```

If it is installed it will return the path to the binary.

To install the binary you can use this command on Ubuntu or Debian:

```bash
apt-get install antiword
```

## Installation

You can install the package via composer:

```bash
composer require jaybizzle/doc-to-text
```

## Usage

Extracting text from a Doc is easy.

```php
$text = (new Doc())
->setDoc('book.doc')
->text();
```

Or easier:

```php
echo Doc::getText('book.doc');
```

By default the package will assume that the `antiword` command is located at `/usr/bin/antiword`.
If it is located elsewhere pass its binary path to the constructor

```php
$text = (new Doc('/custom/path/to/antiword'))
->setDoc('book.doc')
->text();
```

or as the second parameter to the `getText` static method:

```php
echo Doc::getText('book.doc', '/custom/path/to/antiword');
```

Sometimes you may want to use [antiword options](https://linux.die.net/man/1/antiword). To do so you can set them up using the `setOptions` method.

```php
$text = (new Doc())
->setDoc('table.doc')
->setOptions(['layout', 'r 96'])
->text()
;
```

or as the third parameter to the `getText` static method:

```php
echo Doc::getText('book.doc', null, ['layout', 'opw myP1$$Word']);
```

## Change log

Please see [CHANGELOG](CHANGELOG.md) for more information about what has changed recently.

## Testing

```bash
composer test
```

## Security

If you discover any security related issues, please email [email protected] instead of using the issue tracker.

## Credits

- [Mark Beech](https://github.com/jaybizzle)
- [All Contributors](../../contributors)

## License

The MIT License (MIT). Please see [License File](LICENSE.md) for more information.
17 changes: 17 additions & 0 deletions build/report.junit.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
<?xml version="1.0" encoding="UTF-8"?>
<testsuites>
<testsuite name="Test Suite" tests="10" assertions="10" errors="0" failures="0" skipped="0" time="0.073678">
<testsuite name="Jaybizzle\DocToText\Test\DocToTextTest" file="/Users/markbeech/Desktop/doc-to-text/tests/DocToTextTest.php" tests="10" assertions="10" errors="0" failures="0" skipped="0" time="0.073678">
<testcase name="it_can_extract_text_from_a_doc" class="Jaybizzle\DocToText\Test\DocToTextTest" classname="Jaybizzle.DocToText.Test.DocToTextTest" file="/Users/markbeech/Desktop/doc-to-text/tests/DocToTextTest.php" line="16" assertions="1" time="0.014135"/>
<testcase name="it_provides_a_static_method_to_extract_text" class="Jaybizzle\DocToText\Test\DocToTextTest" classname="Jaybizzle.DocToText.Test.DocToTextTest" file="/Users/markbeech/Desktop/doc-to-text/tests/DocToTextTest.php" line="26" assertions="1" time="0.010763"/>
<testcase name="it_can_handle_paths_with_spaces" class="Jaybizzle\DocToText\Test\DocToTextTest" classname="Jaybizzle.DocToText.Test.DocToTextTest" file="/Users/markbeech/Desktop/doc-to-text/tests/DocToTextTest.php" line="32" assertions="1" time="0.006766"/>
<testcase name="it_can_handle_paths_with_single_quotes" class="Jaybizzle\DocToText\Test\DocToTextTest" classname="Jaybizzle.DocToText.Test.DocToTextTest" file="/Users/markbeech/Desktop/doc-to-text/tests/DocToTextTest.php" line="40" assertions="1" time="0.007283"/>
<testcase name="it_can_handle_doctotext_options_without_starting_hyphen" class="Jaybizzle\DocToText\Test\DocToTextTest" classname="Jaybizzle.DocToText.Test.DocToTextTest" file="/Users/markbeech/Desktop/doc-to-text/tests/DocToTextTest.php" line="48" assertions="1" time="0.007513"/>
<testcase name="it_can_handle_doctotext_options_with_starting_hyphen" class="Jaybizzle\DocToText\Test\DocToTextTest" classname="Jaybizzle.DocToText.Test.DocToTextTest" file="/Users/markbeech/Desktop/doc-to-text/tests/DocToTextTest.php" line="59" assertions="1" time="0.006695"/>
<testcase name="it_can_handle_doctotext_options_with_mixed_hyphen_status" class="Jaybizzle\DocToText\Test\DocToTextTest" classname="Jaybizzle.DocToText.Test.DocToTextTest" file="/Users/markbeech/Desktop/doc-to-text/tests/DocToTextTest.php" line="70" assertions="1" time="0.008278"/>
<testcase name="it_will_throw_an_exception_when_the_doc_is_not_found" class="Jaybizzle\DocToText\Test\DocToTextTest" classname="Jaybizzle.DocToText.Test.DocToTextTest" file="/Users/markbeech/Desktop/doc-to-text/tests/DocToTextTest.php" line="81" assertions="1" time="0.000954"/>
<testcase name="it_will_throw_an_exception_when_the_binary_is_not_found" class="Jaybizzle\DocToText\Test\DocToTextTest" classname="Jaybizzle.DocToText.Test.DocToTextTest" file="/Users/markbeech/Desktop/doc-to-text/tests/DocToTextTest.php" line="91" assertions="1" time="0.004240"/>
<testcase name="it_will_throw_an_exception_when_the_option_is_unknown" class="Jaybizzle\DocToText\Test\DocToTextTest" classname="Jaybizzle.DocToText.Test.DocToTextTest" file="/Users/markbeech/Desktop/doc-to-text/tests/DocToTextTest.php" line="101" assertions="1" time="0.007051"/>
</testsuite>
</testsuite>
</testsuites>
38 changes: 38 additions & 0 deletions composer.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
{
"name": "jaybizzle/doc-to-text",
"description": "Extract text from a pdf",
"keywords": [
"jaybiizle",
"doc-to-text"
],
"homepage": "https://github.com/jaybizzle/doc-to-text",
"license": "MIT",
"authors": [
{
"name": "Mark Beech",
"email": "[email protected]",
"homepage": "https://www.mark-beech.co.uk",
"role": "Developer"
}
],
"require": {
"php" : "^7.0",
"symfony/process": "^3.3|^4.0"
},
"require-dev": {
"phpunit/phpunit" : "^6.4|^7.0"
},
"autoload": {
"psr-4": {
"Jaybizzle\\DocToText\\": "src"
}
},
"autoload-dev": {
"psr-4": {
"Jaybizzle\\DocToText\\Test\\": "tests"
}
},
"scripts": {
"test": "phpunit"
}
}
Loading

0 comments on commit 9b88c2c

Please sign in to comment.