GitHub - jdfxzzy/PEAN: PEAN: A Diffusion-Based Prior-Enhanced Attention Network for Scene Text Image Super-Resolution (ACMMM 2024)

PEAN: A Diffusion-Based Prior-Enhanced Attention Network for Scene Text Image Super-Resolution (ACMMM 2024)

Zuoyan Zhao, Hui Xue, Pengfei Fang, Shipeng Zhu

This repository offers the official Pytorch code for this paper. If you have any question, feel free to contact Zuoyan Zhao ([email protected]).

[Main Paper] [Full Paper (arXiv)] [Code] [OpenReview] [Slides] [Video] [Poster]

News

[2024.10.04] Poster of this paper is now available. This poster will be displayed at Poster Session 3 (Oct. 31st, 4:10pm ~ 6:10pm) at posterboard P222.

[2024.08.20] 🔥🔥🔥 Code and weights of this model is now available on Github. [Link]

[2024.07.23] Full paper (including Supplementary Material) is now available on arXiv. [Link]

[2024.07.16] 🎉🎉🎉 This paper is accepted by ACMMM 2024. Congratulations to myself. Thank every reviewers for their appreciation to this work.

[2024.06.11] Reviews of this paper have been released. Luckily, it receives a score of "4 4 4" from three reviewers.

[2024.04.15] Preprint version of this paper is now available on arXiv. [Link]

[2024.04.14] I am nominated as a reviewer for ACMMM 2024.

[2024.04.13] This paper has been submitted to ACMMM 2024. Wish me good luck.

Environment

Other possible Python packages are also needed, please refer to requirements.txt for more information.

Datasets and Pre-trained Recognizers

Download the TextZoom dataset from: https://github.com/JasonBoy1/TextZoom.
Download the pre-trained recognizers from:
- ASTER: https://github.com/ayumiymk/aster.pytorch.
- CRNN: https://github.com/meijieru/crnn.pytorch.
- MORAN: https://github.com/Canjie-Luo/MORAN_v2.
- PARSeq: https://github.com/baudm/parseq.
Notes: It is necessary for you to modify the ./config/super_resolution.yaml file according to your path of dataset and recognizers.

Training and Testing the Model

According to our paper, the training phase of this model including a pre-training (optional) and fine-tuning process. If you want to start the pre-training process, you could use scripts like this:
```
python main.py --batch_size="32" --mask --rec="aster" --srb="1" --pre_training
```
Assuming that the pre-trained weight is saved at ./ckpt/checkpoint.pth. If you want to start the fine-tuning process with this checkpoint, you could use scripts like this:
```
python main.py --batch_size="32" --mask --rec="aster" --srb="1" --resume="./ckpt/checkpoint.pth"
```
Of course, the pre-training process is not necessary. You can also directly train the full model without a pre-trained checkpoint. Before training, you should firstly modify the value of "checkpoint" in ./config/cfg_diff_prior.json to your directory for saving the checkpoints of the TPEM.

The Transformer-based recognizer for the SFM loss can be downloaded at https://github.com/FudanVI/FudanOCR/tree/main/text-gestalt.
If you want to test the pre-trained model under the easy subset of TextZoom (assuming that this dataset is saved at /root/dataset/TextZoom/test/easy), you could use scripts like this:
```
python main.py --batch_size="32" --mask --rec="aster" --srb="1" --resume="./ckpt/checkpoint.pth" --pre_training --test --test_data_dir="/root/dataset/TextZoom/test/easy"
```
Assuming that the fine-tuned weight is saved at ./ckpt/checkpoint.pth, and the trained TPEM is saved at ./ckpt/TPEM_ckpt.pth. You should firstly change the value of "resume_state" in ./config/cfg_diff_prior.json to ./ckpt/TPEM_ckpt.pth. Then you can use scripts like this to test the model under the easy subset of TextZoom (assuming that this dataset is saved at /root/dataset/TextZoom/test/easy):
```
python main.py --batch_size="32" --mask --rec="aster" --srb="1" --resume="./ckpt/checkpoint.pth" --test --test_data_dir="/root/dataset/TextZoom/test/easy"
```

Weights of Our Implemented Models

We provide the weights of the pre-trained version of the model (PEAN_pretrained.pth) and the full model (PEAN_final.pth and TPEM_final.pth).
Baidu Netdisk: https://pan.baidu.com/s/1Bu2WdoZ1gIfHz8JRujVq9w, password: nr2n.
Google Drive: https://drive.google.com/file/d/1kGhPN2wUCV12Cu4yX4WGgMer3U9sNNPu/view?usp=sharing.

Acknowledgement

We inherited most of the frameworks from TATT, SR3 and Stripformer. Thank you for your contribution!

Recommended Papers

[DPMN] This is my first work on Scene Text Image Super-Resolution (STISR), which was accepted by AAAI 2023. [Paper] [Code]
[GSDM] An intresting work on Text Image Inpainting (TII), which was accepted by AAAI 2024. The idea of using a Structure Prediction Module and diffusion-based Reconstruction Module to complete this task was proposed by me. [Paper] [Code]

Citation

@inproceedings{zhao2024pean,
  title={{PEAN}: A Diffusion-Based Prior-Enhanced Attention Network for Scene Text Image Super-Resolution},
  author={Zuoyan Zhao and Hui Xue and Pengfei Fang and Shipeng Zhu},
  booktitle={Proceedings of the ACM International Conference on Multimedia},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
config		config
dataset		dataset
interfaces		interfaces
loss		loss
model		model
utils		utils
LICENSE		LICENSE
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PEAN: A Diffusion-Based Prior-Enhanced Attention Network for Scene Text Image Super-Resolution (ACMMM 2024)

News

Environment

Datasets and Pre-trained Recognizers

Training and Testing the Model

Weights of Our Implemented Models

Acknowledgement

Recommended Papers

Citation

About

Releases 1

Packages

Languages

License

jdfxzzy/PEAN

Folders and files

Latest commit

History

Repository files navigation

PEAN: A Diffusion-Based Prior-Enhanced Attention Network for Scene Text Image Super-Resolution (ACMMM 2024)

News

Environment

Datasets and Pre-trained Recognizers

Training and Testing the Model

Weights of Our Implemented Models

Acknowledgement

Recommended Papers

Citation

About

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages