-
Notifications
You must be signed in to change notification settings - Fork 714
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can't pass citation mark character into tessedit_char_whitelist #501
Comments
This seems to be related to pytesseract/pytesseract/pytesseract.py Line 252 in 672ac6d
|
Thanks for looking into it. In case someone else finds this issue my current workaround is: |
I'm running into issues trying to use " in the tessedit_char_whitelist config flag. This is most likely because " is also used by pytesseract to know when the config ends.
I have no idea if this should be considered a bug.
I'm mostly looking for alternative solutions, found no info in the documentation on whether you can just pass a config file instead.
charwhitelist = r'ABCDEFGHIJKLMNOPQRSTUVWZYXÅÄÖabcdefghijklmnopqrstuvwxyzåäö0123456789-()/=&%!?:;.,é ' + '\"'
Example output while trying to use this whitelist:
Smhllshjlp1922
ArfreningenSmhllshjlpklsskmpsorgnistion?
Example output without whitelist (and also expected result):
Samhällshjälp 1922
Är föreningen Samhällshjälp klasskampsorganisation?
python version: 3.10.6 run via bundled interpreter in an executable
pytesseract version: 0.3.10
tesseract version: UB Mannheim windows binary, v5.3.0.20221214
The text was updated successfully, but these errors were encountered: