An official Pytorch implement of the paper "One-stage Low-resolution Text Recognition with High-resolution Knowledge Transfer" (MM2023).
Authors: Hang Guo, Tao Dai, Mingyan Zhu, GuangHao Meng, Bin Chen, Zhi Wang, Shu-Tao Xia.
This work focus on the problem of text recognition on the low-resolution. A novel knowledge distillation framework is proposed, which can directly adapt the text recognizer to low-resolution. We hope that our work can inspire more studies on one-stage low-resolution text recognition.
The architecture of the proposed framework is as follows.
We refer to the student model adapted to low-resolution inputs as ABINet-LTR, MATRN-LTR and PARSeq-LTR, respectively. As pointed out in the paper, since the input images between the two branches are of different resolutions, we modified the convolution stride (for CNN backbone) or patch sizes (for ViT backbone) to ensure the consistency of the deep visual features. The pretrained weights can be downloaded as follows.
Model | ABINet-LTR | MATRN-LTR | PARSeq-LTR |
---|---|---|---|
Performance | 72.45% | 73.27% | 78.23% |
Please be noted that the pre-trained HR teacher model is still needed for both training and testing, you can download the model in their coresponding offical github repository, i.e. ABINet, MATRN and PARSeq.
In this work, we use STISR datasets TextZoom and five STR benchmarks, i.e., ICDAR2013, ICDAR2015, CUTE80, SVT and SVTP for model comparison. All the datasets are in lmdb
format. One can download these datasets from the following table.
Datasets | TextZoom | IC13 | IC15 | CUTE80 | SVT | SVTP |
---|---|---|---|---|---|---|
Download Link | link | link | link | link | link | link |
We have set some default hype-parameters in the config.yaml
and main.py
, so you can directly implement training and testing after you modify the path of datasets and pre-trained model.
python main.py
python main.py --go_test
If you find our work helpful, please consider citing us.
@inproceedings{guo2023one,
title={One-stage Low-resolution Text Recognition with High-resolution Knowledge Transfer},
author={Guo, Hang and Dai, Tao and Zhu, Mingyan and Meng, Guanghao and Chen, Bin and Wang, Zhi and Xia, Shu-Tao},
booktitle={Proceedings of the 31st ACM International Conference on Multimedia},
pages={2189--2198},
year={2023}
}