Converting PDF or Images into text file from PyQt with Tesseract and PyPDF2
- PyPDF2
- pytesseract
- pdf2image
- PyQt5>=5.14
Poppler is already included. (As of September 14, 2020, it is the latest version.)
The current GUI only uses Tesseract for image-to-text conversion and does not use it for PDF-to-text conversion. The functionality does exist in the script.py, so feel free to use it if you'd like.
- Install Tesseract from Google.
- Add the installed path of Tesseract to your environment variables.
- git clone
- pip install -r requirements.txt
- python main.py