image_to_boxes crashing #106

tlcyr4 · 2018-03-10T05:04:12Z

When I run image_to_string, it works great, but when I run either image_to_boxes or image_to_data, I get an error message like this:

IOError: [Errno 2] No such file or directory: 'c:\users\tlcyr\appdata\local\temp\tess_kqx1fs_out.box'

with some random text in place of 'kqx1fs' each time I run it.

I have tesseract 3.05.01 installed on Windows.

bozhodimitrov · 2018-03-11T09:18:51Z

Hi @tlcyr4 , can you try the new version 4.x of Tesseract for Windows?

bozhodimitrov · 2018-03-15T23:36:15Z

Please feel free to reopen if you have problems with the new 4.x version.
It will be a good idea if you can provide a sample image for testing the problem.

trehman65 · 2018-04-01T05:52:17Z

image_to_boxes is not working for me either. I have tesseract 4.0 on macOS.

from PIL import Image
from pytesseract import pytesseract
import argparse
import cv2
import os

construct the argument parse and parse the arguments

ap = argparse.ArgumentParser()
ap.add_argument("-i", "--image", required=True,
help="path to input image to be OCR'd")
ap.add_argument("-p", "--preprocess", type=str, default="thresh",
help="type of preprocessing to be done")
args = vars(ap.parse_args())

load the example image and convert it to grayscale

image = cv2.imread(args["image"])
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

check to see if we should apply thresholding to preprocess the image

if args["preprocess"] == "thresh":
gray = cv2.threshold(gray, 0, 255,
cv2.THRESH_BINARY | cv2.THRESH_OTSU)[1]

make a check to see if median blurring should be done to remove noise

elif args["preprocess"] == "blur":
gray = cv2.medianBlur(gray, 3)

write the grayscale image to disk as a temporary file so we can apply OCR to it

filename = "{}.png".format(os.getpid())
cv2.imwrite(filename, gray)

load the image as a PIL/Pillow image, apply OCR, and then delete the temporary file

text = pytesseract.image_to_boxes(Image.open(filename))

os.remove(filename)
print(text)`

Error I am getting is:

IOError: [Errno 2] No such file or directory: '/var/folders/gh/ytdtnjmx6t7dwc325f3xsky80000gn/T/tess_3zJn_y_out.box'

Tesseract version:

talha (tess *) VisionxNLTK-v2.0 $ tesseract -v
tesseract 4.00.00alpha
leptonica-1.74.4
libjpeg 9b : libpng 1.6.34 : libtiff 4.0.8 : zlib 1.2.11
Found AVX2
Found AVX
Found SSE

The image I am using is:

bozhodimitrov · 2018-04-01T10:33:54Z

Hi @trehman65 - did you tested the same options directly with tesseract itself?

trehman65 · 2018-04-01T10:35:43Z

You mean on command line? I am sorry I am bit of a noob. Can you tell me the command for it?

bozhodimitrov · 2018-04-01T12:10:47Z

You can patch the pytesseract.py library temporarily on line 133 and you can print the command with:

print(' '.join(command))

In order to find the full pytesseract.py library file path, you need the following snippet of code:

import pytesseract
print(pytesseract.__file__)

trehman65 · 2018-04-01T12:27:42Z

The command that printed by patching pytesseract.py is:

tesseract /var/folders/gh/ytdtnjmx6t7dsky80000gn/T/tess_UhXf0J.PNG /var/folders/gh/ytdtnjmx6t7dwc325f3xsky80000gn/T/tess_UhXf0J_out batch.nochop makebox

This command is not working.

tesseract temp.jpg out makebox

The error is:

read_params_file: Can't open makebox
Tesseract Open Source OCR Engine v4.00.00alpha with Leptonica

The following command works but it only shows the text, not the boxes.

tesseract temp.jpg out

bozhodimitrov · 2018-04-01T13:18:08Z

And what about the tesseract temp.jpg out batch.nochop makebox - what is the error of that?

trehman65 · 2018-04-01T13:38:46Z

This is the error:

talha (tess *) VisionxNLTK-v2.0 $ tesseract temp.jpg out batch.nochop makebox
read_params_file: Can't open batch.nochop
read_params_file: Can't open makebox
Tesseract Open Source OCR Engine v4.00.00alpha with Leptonica

bozhodimitrov · 2018-04-01T13:44:07Z

Thank you for the feedback. Can you report your OS version and how you installed tesseract.
Maybe this 4.00.00alpha build of tesseract is a bit problematic.

trehman65 · 2018-04-01T13:51:51Z

My OS version is macOS High Sierra version 10.13.2. I built tesseract from source code, by cloning the git repo.

qnkhuat · 2018-08-03T15:56:48Z

same issue.

qnkhuat · 2018-08-03T16:10:59Z

I'm able to run with tesseract itself but still get this error while running pytesseract

chahna107 · 2019-05-23T14:15:05Z

Is there any further update on this issue? I am having the same problem with Tesseract 4.0.

jxu · 2019-06-03T21:26:27Z

I have Tesseract 4.0.0.20190314 installed but I replaced the eng.traineddata with the one from here https://github.com/tesseract-ocr/tessdata/blob/master/eng.traineddata to support Tesseract v3 and I also have a barebones tessdata folder with no other files besides eng.traineddata.
With the default tessdata folder everything works fine.

HongChow · 2019-09-17T04:01:48Z

When I run image_to_string, it works great, but when I run either image_to_boxes or image_to_data, I get an error message like this:

IOError: [Errno 2] No such file or directory: 'c:\users\tlcyr\appdata\local\temp\tess_kqx1fs_out.box'

with some random text in place of 'kqx1fs' each time I run it.

I have tesseract 3.05.01 installed on Windows.

I have the same problem with Ubuntu18 and Tesseract4.0 .
Have anyone fixed this ?

bozhodimitrov · 2019-09-17T18:54:33Z

I can't reproduce this issue. I am using the sample image from this issue and it works as expected within the official python docker container.

Tested with:
Python 3.7.4
pytesseract 0.3.0
tesseract 4.0.0 (libgif 5.1.4 : libjpeg 6b (libjpeg-turbo 1.5.2) : libpng 1.6.36 : libtiff 4.0.10 : zlib 1.2.11 : libwebp 0.6.1 : libopenjp2 2.3.0)

I tested with both image_to_boxes and image_to_data

@HongChow try to execute this command in your terminal in order to check if it works:

tesseract /test.jpg /tmp/test_output_file batch.nochop makebox

PS: It also works ok with:
Python 3.6.8 ( Ubuntu 18.04.3 LTS )
pytesseract 0.3.0
tesseract 4.0.0-beta.1 (leptonica-1.75.3 libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 1.5.2) : libpng 1.6.34 : libtiff 4.0.9 : zlib 1.2.11 : libwebp 0.6.1 : libopenjp2 2.3.0)

amtam0 · 2019-10-18T21:29:41Z

I had the same issue yesterday. I think it is more a Tesseract config issue.

You maybe need to setup configs and tessconfigs folders under .../tesseract/share/data/

image_to_boxes() use batch.nochop and makebox configs. Check the link for download

bozhodimitrov · 2019-10-18T22:33:34Z

@HongChow take a look at the above ^
@hazimora33d thank you for clarifying that - I can add additional documentation about this in the README. The other option is to extract every specific option out of the tessconfigs and hard code it into pytesseract.

JoelStansbury · 2020-03-29T00:44:39Z

This fixed it for me. pytesseract.image_to_boxes(myImg, config = " -c tessedit_create_boxfile=1")

For whatever reason, my installation of tesseract 4.1.1 from conda-forge needs this argument to be set explicitly in order for the tesseract ... call to generate a .box file. Injecting this into the subprocess call feels real hacky though so it's very possible that a future update would break this work-around

EDIT:

Note the <SPACE> in front of -c and tessedit.... Those are very important

I found this setting by looking through the output of tesseract --print-parameters

bozhodimitrov · 2020-03-29T01:14:31Z

@JoelStansbury thank for reporting the workaround.
I think that the conda-forge packages have GitHub repositories (just like pytesseract has a conda-forge repo), so we can file an issue there.
But I am not sure for the name of the conda-forge tesseract package.

JoelStansbury · 2020-03-29T06:00:41Z

@int3l No problem! Thanks for working on pytesseract!
Here is the tesseract page if you're curious https://anaconda.org/conda-forge/tesseract. I don't know enough about the cause to justify starting a new issue, just wanted to share for future victims. If I find out enough to point out a flaw I will definitely let them know

eveningkid · 2020-10-25T14:23:45Z

Same thing happened to me, running macOS 10.15.6 and tesseract 4.1.1.

@JoelStansbury workaround worked for me, thank you. Very odd!

deduble · 2021-06-11T05:19:59Z

@JoelStansbury I am making this issue come up alive once again since it is still there for Python 3.7.3 latest pytesseract and tesseract 5.0.0. It wasn't privileges in my case. But your workaround fixes the problem for me as well. Were you able to find what is causing this issue?

JoelStansbury · 2021-06-11T13:30:25Z

@deduble No not really. This config option looks suspicious to me. Maybe it should be "tessedit_create_boxfile 1" as "tessedit_create_wordstrbox" doesn't seem to be a valid config option
https://github.com/tesseract-ocr/tesseract/blob/7a308edcb1fc7455008b531bc2a49de583d7b171/tessdata/configs/wordstrbox

pure speculation though. I havent tested this at all

bozhodimitrov closed this as completed Mar 15, 2018

DrPlanecraft mentioned this issue Jun 22, 2023

FileNotFoundError tmp file #454

Closed

bozhodimitrov mentioned this issue Aug 25, 2023

Fix default boxing config #504

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

image_to_boxes crashing #106

image_to_boxes crashing #106

tlcyr4 commented Mar 10, 2018

bozhodimitrov commented Mar 11, 2018

bozhodimitrov commented Mar 15, 2018

trehman65 commented Apr 1, 2018 •

edited

Loading

construct the argument parse and parse the arguments

load the example image and convert it to grayscale

check to see if we should apply thresholding to preprocess the image

make a check to see if median blurring should be done to remove noise

write the grayscale image to disk as a temporary file so we can apply OCR to it

load the image as a PIL/Pillow image, apply OCR, and then delete the temporary file

bozhodimitrov commented Apr 1, 2018

trehman65 commented Apr 1, 2018 •

edited

Loading

bozhodimitrov commented Apr 1, 2018

trehman65 commented Apr 1, 2018 •

edited

Loading

bozhodimitrov commented Apr 1, 2018

trehman65 commented Apr 1, 2018

bozhodimitrov commented Apr 1, 2018

trehman65 commented Apr 1, 2018

qnkhuat commented Aug 3, 2018

qnkhuat commented Aug 3, 2018

chahna107 commented May 23, 2019

jxu commented Jun 3, 2019 •

edited

Loading

HongChow commented Sep 17, 2019

bozhodimitrov commented Sep 17, 2019 •

edited

Loading

amtam0 commented Oct 18, 2019

bozhodimitrov commented Oct 18, 2019 •

edited

Loading

JoelStansbury commented Mar 29, 2020 •

edited

Loading

bozhodimitrov commented Mar 29, 2020

JoelStansbury commented Mar 29, 2020

eveningkid commented Oct 25, 2020 •

edited

Loading

deduble commented Jun 11, 2021

JoelStansbury commented Jun 11, 2021 •

edited

Loading

image_to_boxes crashing #106

image_to_boxes crashing #106

Comments

tlcyr4 commented Mar 10, 2018

bozhodimitrov commented Mar 11, 2018

bozhodimitrov commented Mar 15, 2018

trehman65 commented Apr 1, 2018 • edited Loading

construct the argument parse and parse the arguments

load the example image and convert it to grayscale

check to see if we should apply thresholding to preprocess the image

make a check to see if median blurring should be done to remove noise

write the grayscale image to disk as a temporary file so we can apply OCR to it

load the image as a PIL/Pillow image, apply OCR, and then delete the temporary file

bozhodimitrov commented Apr 1, 2018

trehman65 commented Apr 1, 2018 • edited Loading

bozhodimitrov commented Apr 1, 2018

trehman65 commented Apr 1, 2018 • edited Loading

bozhodimitrov commented Apr 1, 2018

trehman65 commented Apr 1, 2018

bozhodimitrov commented Apr 1, 2018

trehman65 commented Apr 1, 2018

qnkhuat commented Aug 3, 2018

qnkhuat commented Aug 3, 2018

chahna107 commented May 23, 2019

jxu commented Jun 3, 2019 • edited Loading

HongChow commented Sep 17, 2019

bozhodimitrov commented Sep 17, 2019 • edited Loading

amtam0 commented Oct 18, 2019

bozhodimitrov commented Oct 18, 2019 • edited Loading

JoelStansbury commented Mar 29, 2020 • edited Loading

bozhodimitrov commented Mar 29, 2020

JoelStansbury commented Mar 29, 2020

eveningkid commented Oct 25, 2020 • edited Loading

deduble commented Jun 11, 2021

JoelStansbury commented Jun 11, 2021 • edited Loading

trehman65 commented Apr 1, 2018 •

edited

Loading

trehman65 commented Apr 1, 2018 •

edited

Loading

trehman65 commented Apr 1, 2018 •

edited

Loading

jxu commented Jun 3, 2019 •

edited

Loading

bozhodimitrov commented Sep 17, 2019 •

edited

Loading

bozhodimitrov commented Oct 18, 2019 •

edited

Loading

JoelStansbury commented Mar 29, 2020 •

edited

Loading

eveningkid commented Oct 25, 2020 •

edited

Loading

JoelStansbury commented Jun 11, 2021 •

edited

Loading