Skip to content

OCRFusion is an integrated solution that combines multiple open-source OCR (Optical Character Recognition) models, layout analysis, and table parsing capabilities. This project unifies these functionalities into a single interface, providing a streamlined and efficient way to process and extract information from various types of documents.

License

Notifications You must be signed in to change notification settings

peakhell/OCRIntegrator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

OCRIntegrator

Encapsulates open-source OCR models, table detection, layout recognition, and other capabilities, providing services through a unified interface. Currently, only deepdoc is integrated, with more services to be integrated in the future.

Introduce

  1. In deepdoc, pdfplumber is used to read text, while OCR is used to recognize text. The text from pdfplumber is preferred, and OCR is used entirely for scanned documents.

🎬 Get Started

📝 Prerequisites

  • python >= 3.11 (recommended to use conda)
  • GPU > 6G
  • tensorrt == 10.0.1
  • CUDA == 12.3 (other versions may work theoretically, but have not been tested)
  • pycuda == 2024.1

运行环境

  1. Install Python 3.11, recommended to use conda.
  2. Install poetry:
     curl -sSL https://install.python-poetry.org | python3 -
  3. Install dependencies using poetry: poetry install
  4. Run the project: uvicorn main:app

Running with GPU requires installing TensorRT

  1. Install TensorRT, note that the name of tensorrt-cu12 needs to be modified according to the CUDA version.
    pip install tensorrt==10.0.1
    pip install tensorrt-cu12==10.0.1
  2. Install pycuda
    pip install pycuda == 2024.1

Below are screenshots of my environment for reference:

img.png img.png

DEMO

img.png

API Documentation

After starting, you can view the usage methods through the documentation: https://localhost:8000/docs https://localhost:8000/docs

About

OCRFusion is an integrated solution that combines multiple open-source OCR (Optical Character Recognition) models, layout analysis, and table parsing capabilities. This project unifies these functionalities into a single interface, providing a streamlined and efficient way to process and extract information from various types of documents.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages