-
Notifications
You must be signed in to change notification settings - Fork 9.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Request: Nougat OCR Integration #3294
Comments
|
As soon as @ggerganov tackles multi-modal (not sure, maybe he did already) I'm interested. For now: not in project scope, me thinks. |
I recently learned about this model and I am very interested in adding support for it. It's likely to remain low prio for the near future, but if there is a community effort, I'll be happy to support it |
Impressive results with English papers and Ebooks. Some preliminary findings on the nougat project.
1st My question: |
btw the encoding layers of both small and base nougat model use exactly the same swin model, the two models are only different in the underlying decoding layers [mbart] also imho, everything before and after the inner loop is not worth rewriting in C, since they literally takes no time to run. |
any updates??? |
It would be great to have the OCR integrated into the mix. Any updates on this would be awesome |
So i assume this is still not implemented? |
Request: Nougat OCR Integration
I suggest adding Nougat OCR into llama.cpp to enable the processing of scientific PDF documents.
This can act as a first step towards adding multimodal models to this project!
Implementation:
It seems that Nougat is based on standard transformer architecture (like Bart and Swin Transformer) and most of the work would be on figuring out how to add the image processing.
Let me know what you think!
P.S.: Love this repo! I hope to add my own retrieval-pretrained transformer at some point to this repo.
The text was updated successfully, but these errors were encountered: