![android logo](https://raw.githubusercontent.com/github/explore/8baf984947f4d9c32006bd03fa4c51ff91aadf8d/topics/android/android.png)
Block or Report
Block or report OneYearIsEnough
Contact GitHub support about this user’s behavior. Learn more about reporting abuse.
Report abuseLanguage
Sort by: Recently starred
Starred repositories
Adaptive Boundary Proposal Network for Arbitrary Shape Text Detection; Accepted by ICCV2021;The paper at: http:https://arxiv.org/abs/2107.12664
The largest collection of PyTorch image encoders / backbones. Including train, eval, inference, export scripts, and pretrained weights -- ResNet, ResNeXT, EfficientNet, NFNet, Vision Transformer (V…
Faster Arbitrarily-Shaped Text Detector with Minimalist Kernel Representation
MiniCPM-V 2.6: A GPT-4V Level MLLM for Single Image, Multi Image and Video on Your Phone
Count the MACs / FLOPs of your PyTorch model.
PyTorch code of my ICDAR 2021 paper Vision Transformer for Fast and Efficient Scene Text Recognition (ViTSTR)
Learning Generative Structure Prior for Blind Text Image Super-resolution [CVPR 2023]
Read Like Humans: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Recognition
Official PyTorch implementation of `[ACMMM 2023]Relational Contrastive Learning for Scene Text Recognition`
GPT4V-level open-source multi-modal model based on Llama3-8B
Use PEFT or Full-parameter to finetune 300+ LLMs or 60+ MLLMs. (Qwen2, GLM4v, Internlm2.5, Yi, Llama3.1, Llava-Video, Internvl2, MiniCPM-V-2.6, Deepseek, Baichuan2, Gemma2, Phi3-Vision, ...)
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的可商用开源多模态对话模型
mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding
Scene Text Recognition with Permuted Autoregressive Sequence Models (ECCV 2022)
【CVPR 2024 Highlight】Monkey (LMM): Image Resolution and Text Label Are Important Things for Large Multi-modal Models
[IJCAI2023] An official implement of the paper "Towards Robust Scene Text Image Super-resolution via Explicit Location Enhancement"
A series of large language models trained from scratch by developers @01-ai
DeepSeek-VL: Towards Real-World Vision-Language Understanding
[ECCV 2024] Official code implementation of Vary: Scaling Up the Vision Vocabulary of Large Vision Language Models.
✨✨Latest Advances on Multimodal Large Language Models
A paper collection of recent diffusion models for text-image generation tasks, e,g., visual text generation, font generation, text removal, text image super resolution, text editing, handwritten ge…
Code for Text Prior Guided Scene Text Image Super-Resolution (TIP 2023)
A toolbox of scene text super-resolution and recognition
A Text Attention Network for Spatial Deformation Robust Scene Text Image Super-resolution (CVPR2022)
Emote Portrait Alive: Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions
手写实现李航《统计学习方法》书中全部算法