Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tesseract OCR Language Data Configuration Error in Python Environment #537

Closed
BeHerz opened this issue Feb 25, 2024 · 4 comments
Closed

Comments

@BeHerz
Copy link

BeHerz commented Feb 25, 2024

I am experiencing a problem with the Tesseract OCR setup in a Python environment. Despite attempting to perform OCR on images using the pytesseract library, the process fails with an error related to loading the German language data files.

TesseractError: (1, 'Error opening data file /usr/share/tesseract-ocr/4.00/tessdata/deu.traineddata Please make sure the TESSDATA_PREFIX environment variable is set to the "tessdata" directory. Failed loading language 'deu'. Tesseract couldn't load any languages! Could not initialize tesseract.')

  1. Attempt to perform OCR on an image using pytesseract.image_to_string with lang='deu'.
  2. Receive error indicating the German language data file could not be loaded.
    Expected Behavior: The Tesseract OCR should be able to load the German language data and perform OCR on the image content without any errors.

Environment: phyton generated by chatGPT

@stefan6419846
Copy link
Contributor

Please provide the corresponding code you are using. What OS are you using and where are your language data files located at?

@BeHerz
Copy link
Author

BeHerz commented Feb 25, 2024

Device is iOS. The code where the Phyton is running is a Phyton Box in ChatGPT. I tried on WIN as well with the same problem.

Dont know where its located, it is requested by ChatGPT code window

IMG_5593
IMG_5592

@stefan6419846
Copy link
Contributor

I do not think that there is much we can do about this non-regular setup. You can try digging around in the system to determine more details about the OS and installed packages to determine the correct Tesseract data directory to pass as environment variable. Neverthless, I would recommend you to rather run the code on a proper local setup unless you are sure what you are doing and that this is the right approach.

@BeHerz BeHerz closed this as completed Feb 25, 2024
@BeHerz
Copy link
Author

BeHerz commented Feb 25, 2024

will try to solve it via OpenAI Developer Community

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants