Skip to content
/ PEAN Public

PEAN: A Diffusion-Based Prior-Enhanced Attention Network for Scene Text Image Super-Resolution (ACMMM 2024)

License

Notifications You must be signed in to change notification settings

jdfxzzy/PEAN

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PEAN: A Diffusion-Based Prior-Enhanced Attention Network for Scene Text Image Super-Resolution (ACMMM 2024)

Zuoyan Zhao, Hui Xue, Pengfei Fang, Shipeng Zhu

This repository offers the official Pytorch code for this paper. If you have any question, feel free to contact Zuoyan Zhao ([email protected]).

[Main Paper] [Full Paper (arXiv)] [Code] [OpenReview] [Slides] [Video] [Poster]

News

[2024.10.04] Poster of this paper is now available. This poster will be displayed at Poster Session 3 (Oct. 31st, 4:10pm ~ 6:10pm) at posterboard P222.

[2024.08.20] 🔥🔥🔥 Code and weights of this model is now available on Github. [Link]

[2024.07.23] Full paper (including Supplementary Material) is now available on arXiv. [Link]

[2024.07.16] 🎉🎉🎉 This paper is accepted by ACMMM 2024. Congratulations to myself. Thank every reviewers for their appreciation to this work.

[2024.06.11] Reviews of this paper have been released. Luckily, it receives a score of "4 4 4" from three reviewers.

[2024.04.15] Preprint version of this paper is now available on arXiv. [Link]

[2024.04.14] I am nominated as a reviewer for ACMMM 2024.

[2024.04.13] This paper has been submitted to ACMMM 2024. Wish me good luck.

Environment

python pytorch cuda numpy

Other possible Python packages are also needed, please refer to requirements.txt for more information.

Datasets and Pre-trained Recognizers

Training and Testing the Model

  • According to our paper, the training phase of this model including a pre-training (optional) and fine-tuning process. If you want to start the pre-training process, you could use scripts like this:

    python main.py --batch_size="32" --mask --rec="aster" --srb="1" --pre_training

    Assuming that the pre-trained weight is saved at ./ckpt/checkpoint.pth. If you want to start the fine-tuning process with this checkpoint, you could use scripts like this:

    python main.py --batch_size="32" --mask --rec="aster" --srb="1" --resume="./ckpt/checkpoint.pth"

    Of course, the pre-training process is not necessary. You can also directly train the full model without a pre-trained checkpoint. Before training, you should firstly modify the value of "checkpoint" in ./config/cfg_diff_prior.json to your directory for saving the checkpoints of the TPEM.

    The Transformer-based recognizer for the SFM loss can be downloaded at https://github.com/FudanVI/FudanOCR/tree/main/text-gestalt.

  • If you want to test the pre-trained model under the easy subset of TextZoom (assuming that this dataset is saved at /root/dataset/TextZoom/test/easy), you could use scripts like this:

    python main.py --batch_size="32" --mask --rec="aster" --srb="1" --resume="./ckpt/checkpoint.pth" --pre_training --test --test_data_dir="/root/dataset/TextZoom/test/easy"

    Assuming that the fine-tuned weight is saved at ./ckpt/checkpoint.pth, and the trained TPEM is saved at ./ckpt/TPEM_ckpt.pth. You should firstly change the value of "resume_state" in ./config/cfg_diff_prior.json to ./ckpt/TPEM_ckpt.pth. Then you can use scripts like this to test the model under the easy subset of TextZoom (assuming that this dataset is saved at /root/dataset/TextZoom/test/easy):

    python main.py --batch_size="32" --mask --rec="aster" --srb="1" --resume="./ckpt/checkpoint.pth" --test --test_data_dir="/root/dataset/TextZoom/test/easy"

Weights of Our Implemented Models

Acknowledgement

  • We inherited most of the frameworks from TATT, SR3 and Stripformer. Thank you for your contribution!

Recommended Papers

  • [DPMN] This is my first work on Scene Text Image Super-Resolution (STISR), which was accepted by AAAI 2023. [Paper] [Code]
  • [GSDM] An intresting work on Text Image Inpainting (TII), which was accepted by AAAI 2024. The idea of using a Structure Prediction Module and diffusion-based Reconstruction Module to complete this task was proposed by me. [Paper] [Code]

Citation

@inproceedings{zhao2024pean,
  title={{PEAN}: A Diffusion-Based Prior-Enhanced Attention Network for Scene Text Image Super-Resolution},
  author={Zuoyan Zhao and Hui Xue and Pengfei Fang and Shipeng Zhu},
  booktitle={Proceedings of the ACM International Conference on Multimedia},
  year={2024}
}

About

PEAN: A Diffusion-Based Prior-Enhanced Attention Network for Scene Text Image Super-Resolution (ACMMM 2024)

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages