GitHub - yinnhao/CRAFTcpp: C++ implementation for the Character-Region Awareness For Text detection (CRAFT) with TensorRT for accelerated inference；文字检测；中英文字幕检测；CUDA C++实现

C++ implementation of CRAFT text detector with TensorRT

CRAFT: Character-Region Awareness For Text detection | Paper | Official Pytorch code

Overview

This is a C++ implementation for the CRAFT text detector with TensorRT for accelerated inference. Compared to the official PyTorch implementation, it significantly improves text detection efficiency and facilitates deployment.

Upon testing, the inference speed on RTX 4090 is x12 faster than the original CRAFT-pytorch project.

In addition, I have also provided a Chinese and English video subtitle detection model fine-tuned using a custom dataset, which offers higher accuracy in subtitle detection.

Getting started

Requirements

gcc
CUDA
TensorRT

The environment we tested with is GCC 7.3.1 + CUDA 11.2 + TensorRT-8.5.3.1

Generate trt engine

Download the .pth model and place it in the 'pretrained' directory.
- Official pretrained model: craft_mlt_25k.pth
- I used a custom dataset to fine-tune the Chinese and English subtitle detection model：epoch_91.pth

Pth to Onnx

cd engine_generation
python torch2onnx.py --usefp16 --torch_path ../pretrained/craft_mlt_25k.pth

Onnx to trt engine

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/trt/lib
make
./onnx2trt ../pretrained/craft_mlt_25k_fp16.onnx ../pretrained/craft_mlt_25k_fp16_dynamic_shape.cache

Make and run demo

make
```
cd src
make
cd ..
make
```

run demo

(1) If the input file is in image format:

./test_img <engine_path> <input_path>

example:

./test_img ./pretrained/craft_mlt_25k_fp16_dynamic_shape.cache ./images/subtitle2.png

(2) If the input file is in YUV format:

./main <engine_path> <height> <width> <yuv_file_path>

example:

./test_yuv ./pretrained/craft_mlt_25k_fp16_dynamic_shape.cache 2160 3840 ./images/test_3840x2160_nv12.yuv

Interface Specification

The following interface can be utilized to integrate into your own code.

Initialization and loading of the TRT engine
```
void infer_init(int height, int width, const char* engine_path, float ratio)
```
- height: Height of the video/image
- width: Width of the video/image
- engine_path: Path to the engine
- ratio: Scaling ratio for the input image, ranging from (0, 1], typically taken as 0.5
Inference

(1) If the input file is in image format:
```
vector<int> infer_pipe_rgb(uint8_t *rgb)
```
- rgb: The memory address of the RGB image stored in the form of planes. Returns a vector, sequentially storing x_min, x_max, y_min, y_max for each box.
(2) If the input file is in YUV format:
```
vector<int> infer_pipe(uint8_t **in_yuv, int format, int* line_size);
```
- in_yuv: Memory address of NV12 Y and UV planes
Returns a vector, sequentially storing x_min, x_max, y_min, y_max for each box.
Destruction
```
void destroyObj()
```
Call this function after all inferences are completed.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.vscode		.vscode
engine_generation		engine_generation
images		images
include		include
res		res
src		src
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
test_img		test_img
test_img.cpp		test_img.cpp
test_yuv		test_yuv
test_yuv.cpp		test_yuv.cpp

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

C++ implementation of CRAFT text detector with TensorRT

Overview

Getting started

Requirements

Generate trt engine

Make and run demo

Interface Specification

About

Releases

Packages

Languages

yinnhao/CRAFTcpp

Folders and files

Latest commit

History

Repository files navigation

C++ implementation of CRAFT text detector with TensorRT

Overview

Getting started

Requirements

Generate trt engine

Make and run demo

Interface Specification

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages