Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tesseract Using Environment Variable's Path In Windows Instead Of Bundled Path #29

Open
ryuga93 opened this issue Jul 18, 2021 · 2 comments

Comments

@ryuga93
Copy link

ryuga93 commented Jul 18, 2021

Hi, in the latest version, the Tesseract engine will use the Path set in Environment Variable instead of the path from the bundle, causing it to throw error (or OCR not working in the release version).

pytesseract.pytesseract.TesseractError: (1, 'Error opening data file C:\\Users\\PC1\\Downloads\\Tesseract-OCR\\jpn.traineddata Please make sure the TESSDATA_PREFIX environment variable is set to your "tessdata" directory. Failed loading language \'jpn\' Error opening data file C:\\Users\\PC1\\Downloads\\Tesseract-OCR\\eng.traineddata Please make sure the TESSDATA_PREFIX environment variable is set to your "tessdata" directory. Failed loading language \'eng\' Tesseract couldn\'t load any languages! Could not set option: use_new_state_cost=F Could not set option: segment_segcost_rating=F Could not set option: enable_new_segsearch=0 Could not initialize tesseract.') 2021-07-18T20:55:47Z <Greenlet at 0x1b188676048: _process_message({'call': 14.888897478777801, 'name': 'recognize_im, <geventwebsocket.websocket.WebSocket object at 0x0)> failed with TesseractError

The possible bug is in

return ''

where the Windows branch does not return a proper tessdata-dir path value.
Adding a return seems to fix this problem for me.

@mathewthe2
Copy link
Owner

Which Tesseract version are you referring to?

And what do you mean by proper tessdata-dir path? Did you export the TESSDATA_PREFIX environment variable manually?

@ryuga93
Copy link
Author

ryuga93 commented Aug 17, 2021

I have 5.0 installed in my machine, so in my environment variable path setting, it is set to my installation folder for my own project use. By proper tessdata-dir path I mean the bundled path, ie the Tesseract bundled together with the executable.

In the Darwin branch there is a return statement for it,

return '--tessdata-dir {}'.format(str(Path(OSX_TESSERACT_DIR, "share", "tessdata")))

so I figured that Windows needs it's own return statement too, and added
return '--tessdata-dir {}'.format('%r'%str(Path(WIN_TESSERACT_DIR, "tessdata")))

after the line

os.rename(Path(WIN_TESSERACT_DIR, "tessdata-new"), Path(WIN_TESSERACT_DIR, "tessdata"))

in which my compilation worked.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants