Starred repositories
The paper collections for the autoregressive models in vision.
HART: Efficient Visual Generation with Hybrid Autoregressive Transformer
Official pytorch implementation of the paper: "An Edit Friendly DDPM Noise Space: Inversion and Manipulations". CVPR 2024.
Generating handwritten Chinese characters using CycleGAN
High-Resolution Image Synthesis with Latent Diffusion Models
[ECCV 2024] PowerPaint, a versatile image inpainting model that supports text-guided object inpainting, object removal, image outpainting and shape-guided object inpainting with only a single model…
Code for generating synthetic text images as described in "Synthetic Data for Text Localisation in Natural Images", Ankush Gupta, Andrea Vedaldi, Andrew Zisserman, CVPR 2016.
IMGUR5K handwriting set. It is a handwritten in-the-wild dataset, which contains challenging real world handwritten samples from different writers.The dataset is shared as a set of image urls with …
UC3M License Plate detection and recognition dataset
Collecting awesome papers of RAG for AIGC. We propose a taxonomy of RAG foundations, enhancements, and applications in paper "Retrieval-Augmented Generation for AI-Generated Content: A Survey".
Qwen2.5 is the large language model series developed by Qwen team, Alibaba Cloud.
A two stage lightweight and high performance license plate recognition in MTCNN and LPRNet
MiniCPM-V 2.6: A GPT-4V Level MLLM for Single Image, Multi Image and Video on Your Phone
License Plate Detection and Recognition in Unconstrained Scenarios
PyTorch code of my ICDAR 2021 paper Vision Transformer for Fast and Efficient Scene Text Recognition (ViTSTR)
Source for NomNaTong-regular Vietnamese chữ Nôm font.
A synthetic data generator for text recognition
TAO Toolkit deep learning networks with PyTorch backend
A synthetic data generator for text recognition
Leverage Deep Learning to digitize old Vietnamese handwritten for historical document archiving (Made with national pride in every single line of code): https://www.kaggle.com/datasets/quandang/nom…
CORD: A Consolidated Receipt Dataset for Post-OCR Parsing
docTR (Document Text Recognition) - a seamless, high-performing & accessible library for OCR-related tasks powered by Deep Learning.
整理目前开源的最优表格识别模型,完善前后处理,模型转换为ONNX Organize the currently open-source optimal table recognition models, improve pre-processing and post-processing, and convert the models to ONNX.
A one-stop, open-source, high-quality data extraction tool, supports PDF/webpage/e-book extraction.一站式开源高质量数据提取工具,支持PDF/网页/多格式电子书提取。
A toolbox of ocr models and algorithms based on MindSpore