-
Notifications
You must be signed in to change notification settings - Fork 41
Conversation
@shahrukhx01 is attempting to deploy a commit to the Deepset Team on Vercel. A member of the Team first needs to authorize it. |
Hi @PiffPaffM FYI. please review when you’ve time. |
This pull request is being automatically deployed with Vercel (learn more). 🔍 Inspect: https://vercel.com/deepset-overnice/haystack-website/3XBLVdVQvE4G7Fp6mBEskKFvAzpv |
</code> | ||
<code> | ||
converter = PDFToTextOCRConverter(remove_numeric_tables=False, | ||
valid_languages=["deu","eng"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it correct to use "deu" here or does it need to be "de"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is "deu" since we are using tesseract, should we also add a link for language codes from tesseract?
https://tesseract-ocr.github.io/tessdoc/Data-Files-in-different-versions.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would be cool. I was a bit confused.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@PiffPaffM I have updated the documentation for this part
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! The code box for PDF looks a bit weird: https://haystack-website-git-fork-shahrukhx01-a-f7d6bb-deepset-overnice.vercel.app/usage/preprocessing#file-conversion. I think the comment is not interpreted correctly. Can you have a look?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have changed it into a code comment, but where should we put the link for Tessaract language codes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest you put the link in that same comment. However in the vercel preview, it seems like the import statement is on the same line as the comment.
https://haystack-website-hv1bt4525-deepset-overnice.vercel.app/usage/preprocessing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, we just merged a new branch in that moves this file from docs/latest/usage/preprocessing.mdx
to docs/latest/components/preprocessing.mdx
. Could you also make this change so that this PR can be merged?
@shahrukhx01 Thanks for the contribution to our docs! That is great! Looking forward to merging it after these minor changes. |
Hi @PiffPaffM following up on this :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
</code> | ||
<code> | ||
converter = PDFToTextOCRConverter(remove_numeric_tables=False, | ||
valid_languages=["deu","eng"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! The code box for PDF looks a bit weird: https://haystack-website-git-fork-shahrukhx01-a-f7d6bb-deepset-overnice.vercel.app/usage/preprocessing#file-conversion. I think the comment is not interpreted correctly. Can you have a look?
</code> | ||
<code> | ||
converter = PDFToTextOCRConverter(remove_numeric_tables=False, | ||
valid_languages=["deu","eng"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest you put the link in that same comment. However in the vercel preview, it seems like the import statement is on the same line as the comment.
https://haystack-website-hv1bt4525-deepset-overnice.vercel.app/usage/preprocessing
</code> | ||
<code> | ||
converter = PDFToTextOCRConverter(remove_numeric_tables=False, | ||
valid_languages=["deu","eng"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, we just merged a new branch in that moves this file from docs/latest/usage/preprocessing.mdx
to docs/latest/components/preprocessing.mdx
. Could you also make this change so that this PR can be merged?
Hi @shahrukhx01, I realise that the restructuring of the repo might have made this PR a little tricky to finish off. Would it be alright if I incorporate your changes into another PR of mine (#164)? This will save the effort of rebasing / merging branches. We will of course credit your work in the new PR :) |
@brandenchan yes definitely no problem. |
Based on OCR PR