-
Notifications
You must be signed in to change notification settings - Fork 930
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change Text extraction is not allowed error to warning #453
Change Text extraction is not allowed error to warning #453
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @madhurcodes,
Thanks for your work. Could you make sure that the changes are like the suggested changes in the issue:
- Set the default value for check_extractable to False.
- If check_extractable is True we throw an Error, if False we raise a warning.
- Remove the explicit arguments for check_extractable from the high_level module.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
Last thing: can you add a test that checks if the warning is warned and, more importantly, that the code completes without errors.
Hey as mentioned in the PR the PDFs I have which trigger this warning are copyright protected so I can't upload them here for testing. I have tested that it raises the correct warning on my machine. The code is really basic and it should be clear that there is no functional change so IMO it should not require a new test since the earlier error throwing code was also untested. |
I tried changing the content of an sample encrypted document, but apparently its hard to change the password of an encrypted file using only vim 😖 Should have looked at the issue earlier. An example pdf is posted by @Recursing. You could use that one. I also notices that the encrypted samples that are in the repo already are never used for testing, these can also be used. A lot of functionality of pdfminer.six is still untested. Having a test suite for all functionality makes it easier to add code, and to change/improve implementations in the future. That's why I'm asking. I do believe that this code works as intended 😄 If you have time to add the sample pdf from the issue, please do so. Otherwise I will find some time to do it myself and then merge this. Thanks for all the efforts so far! |
Thanks @madhurcodes |
I was using this library for parsing a large set of files and it gave an error halfway through because it encountered such a file. Users should not have to handle this manually in a library meant for parsing PDFs.
Changes the
Text extraction is not allowed
error thrown to a descriptive warning.Fixes #350
How Has This Been Tested?
I tested this on an example PDF which threw the mentioned error with the original code and gave the correct text after the change as expected. I cannot share the PDF because I don't own copyright to it.
Checklist