This action uses naptha/tesseract.js to extract text from images attached to issue comments.
The extracted text is appended to the issue body.
This allows extracted text to be searchable via Github's searchbox.
Inspired by imjasonh/ideas/issues/76
Create a workflow (eg: .github/workflows/ocr-bot.yml
see Creating a Workflow file) with the following content:
name: "OCR Bot"
on:
issues:
types: [opened, edited]
jobs:
run:
runs-on: ubuntu-latest
steps:
- uses: thehanimo/[email protected]
with:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
Done! You should see OCR keywords being added to issues that contain images. Something like this:
OCR Keywords
Mild Splendour of the various-vested Night! Mother of wildly-working visions! haill I watch thy gliding, while with watery light Thy weak eye glimmers through a fleecy veil; And when thou lovest thy pale orb to shroud Behind the gather’d blackness lost on high; And when thou dartest from the wind-rent cloud Thy placid lightning o’er the awaken’d sky.Install the dependencies
npm install
Run the tests ✔️
$ npm test
PASS ./index.test.js
✓ empty comment (3 ms)
✓ links outside img tag (1 ms)
✓ extract text (1 ms)
...