This is an open source online OCR service built by the Django framework.
-
python 3.8
-
tesseract 4.1.x
https://github.com/tesseract-ocr/tessdoc/blob/master/Home.md
A macOS wrapper for the Tesseract API is also available at Tesseract macOS.
MacPorts
To install Tesseract run this command:
sudo port install tesseract
To install any language data, run:
sudo port install tesseract-<langcode>
List of available langcodes can be found on MacPorts tesseract page.
Homebrew
To install Tesseract run this command:
brew install tesseract
The tesseract directory can then be found using brew info tesseract
, e.g. /usr/local/Cellar/tesseract/3.05.02/share/tessdata/
.
Tesseract is available directly from many Linux distributions. The package is generally called 'tesseract' or 'tesseract-ocr' - search your distribution's repositories to find it. Thus you can install Tesseract 4.x and its developer tools on Ubuntu 18.x bionic by simply running:
sudo apt install tesseract-ocr
sudo apt install libtesseract-dev
Docker Engine
Within the download, you'll find the following directories and files. You'll see something like this:
├────.gitattributes
├────.gitignore
├────app/
│ ├────__init__.py
│ ├────admin.py
│ ├────apps.py
│ ├────migrations/
│ │ ├────0001_initial.py
│ │ ├────0002_auto_20200505_0909.py
│ │ └────__init__.py
│ ├────models.py
│ ├────ocr.py
│ ├────serializers.py
│ ├────urls_api.py
│ └────views.py
├────db/
│ └────__init__.py
├────dockerfile_base
├────dockerfile_product
├────manage.py
├────media/
│ └────test/
├────nginx.conf
├────onlineocr/
│ ├────__init__.py
│ ├────asgi.py
│ ├────settings.py
│ ├────urls.py
│ └────wsgi.py
├────README.md
├────requirements.txt
├────sources.list
├────start.sh
├────static/
├────templates/
├────uwsgi/
├────uwsgi.ini
└────uwsgi_params
Before run it you need first install Tesseract.
git clone https://github.com/ginguocun/onlineocr.git
cd onlineocr
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
python3 manage.py migrate
Start development server at https://127.0.0.1:9999/
python3 manage.py runserver 127.0.0.1:9999
git clone https://github.com/ginguocun/onlineocr.git
cd onlineocr
sudo docker build --rm -t onlineocr:base -f dockerfile_base .
sudo docker build --rm -t onlineocr:latest -f dockerfile_product .
Start development server at https://127.0.0.1:9999/
sudo docker run -it -p 9999:80 onlineocr:latest /bin/bash
The online API documents is at https://127.0.0.1:9999/docs/
The main APIs are listed below:
- /api/ocr/
- /api/history/
- /api/register/
- /api/token_obtain_pair/
- /api/token_refresh/
POST /api/ocr/
This is an API which can take an uploaded image(jpg, png) and find any letters in it.
The request body should be a "application/json"
encoded object, containing the following items.
Parameter | Description |
---|---|
image required |
A image file, the size should be less than 2Mb |
GET /api/history/
This is an API used to obtain the historical upload records.
The following parameters can be included as part of a URL query string.
Parameter | Description |
---|---|
page |
A page number within the paginated result set. |
search |
A search term. |
ordering |
Which field to use when ordering the results. |
POST /api/register/
This is an API for user registration.
The request body should be a "application/json"
encoded object, containing the following items.
Parameter | Description |
---|---|
username required |
Required. 150 characters or fewer. Letters, digits and @/./+/-/_ only. |
email |
|
password required |
POST /api/token_obtain_pair/
Takes a set of user credentials and returns an access and refresh JSON web token pair to prove the authentication of those credentials.
The request body should be a "application/json"
encoded object, containing the following items.
Parameter | Description |
---|---|
username required |
|
password required |
POST /api/token_refresh/
Takes a refresh type JSON web token and returns an access type JSON web token if the refresh token is valid.
The request body should be a "application/json"
encoded object, containing the following items.
Parameter | Description |
---|---|
refresh required |