Stars
End-to-End Object Detection with Transformers
A Comprehensive Toolkit for High-Quality PDF Content Extraction
A one-stop, open-source, high-quality data extraction tool, supports PDF/webpage/e-book extraction.一站式开源高质量数据提取工具,支持PDF/网页/多格式电子书提取。
[ICCV 2023] Code base for Revisiting Scene Text Recognition: A Data Perspective
Official code implementation of General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model
A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Tongyi Lab, Alibaba Group.
Offical implementation of "Integer-Valued Training and Spike-Driven Inference Spiking Neural Network for High-performance and Energy-efficient Object Detection" (ECCV2024 Best Paper Candidate / Oral)
A research project for text detection and recognition using PyTorch 1.2.
Official PyTorch Implementation of MambaVision: A Hybrid Mamba-Transformer Vision Backbone
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
Official implementation code of the paper <AnyText: Multilingual Visual Text Generation And Editing>
[NeurIPS 2024] The official code of "U-DiTs: Downsample Tokens in U-Shaped Diffusion Transformers"
Masked Diffusion Transformer is the SOTA for image synthesis. (ICCV 2023)
The official code of CornerTransformer (ECCV 2022, Oral) on top of MMOCR.
Deeper Depth Prediction with Fully Convolutional Residual Networks (FCRN)
Code for the paper STN-OCR: A single Neural Network for Text Detection and Text Recognition
Code for the AAAI 2018 publication "SEE: Towards Semi-Supervised End-to-End Scene Text Recognition"
A minimal GPU design in Verilog to learn how GPUs work from the ground up
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
Implementation of popular deep learning networks with TensorRT network definition API
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
Scene Text Recognition with Permuted Autoregressive Sequence Models (ECCV 2022)
Combining MMOCR with Segment Anything & Stable Diffusion. Automatically detect, recognize and segment text instances, with serval downstream tasks, e.g., Text Removal and Text Inpainting
Painter & SegGPT Series: Vision Foundation Models from BAAI
A quickstart and benchmark for pytorch distributed training.
Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything