Block or Report
Block or report fujingling
Contact GitHub support about this user’s behavior. Learn more about reporting abuse.
Report abuseStars
Language
Sort by: Recently starred
On the Hidden Mystery of OCR in Large Multimodal Models (OCRBench)
The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.
A Comprehensive Survey of Scientific Large Language Models and Their Applications in Scientific Discovery
Official Implementation of OCR-free Document Understanding Transformer (Donut) and Synthetic Document Generator (SynthDoG), ECCV 2022
Papers, Datasets, Algorithms, SOTA for STR. Long-time Maintaining
microsoft / Megatron-DeepSpeed
Forked from NVIDIA/Megatron-LMOngoing research training transformer language models at scale, including: BERT & GPT-2
Hunyuan-DiT : A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding
High-Quality Human Motion Video Generation with Confidence-aware Pose Guidance
MusePose: a Pose-Driven Image-to-Video Framework for Virtual Human Generation
Large Language Model Text Generation Inference
A high-throughput and memory-efficient inference and serving engine for LLMs
💥 Fast State-of-the-Art Tokenizers optimized for Research and Production
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
Materials for the Hugging Face Diffusion Models Course
🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.
ms-swift: Use PEFT or Full-parameter to finetune 300+ LLMs or 50+ MLLMs. (Qwen2, GLM4v, Internlm2.5, Yi, Llama3, Llava-Video, Internvl2, MiniCPM-V, Deepseek, Baichuan2, Gemma2, Phi3-Vision, ...)
GPT4V-level open-source multi-modal model based on Llama3-8B
GLM-4 series: Open Multilingual Multimodal Chat LMs | 开源多语言多模态对话模型
UniTable: Towards a Unified Table Foundation Model
[CVPR 2024 Highlight] Official implementation of the paper: Cooperation Does Matter: Exploring Multi-Order Bilateral Relations for Audio-Visual Segmentation
End-to-End Object Detection with Transformers
Open-source evaluation toolkit of large vision-language models (LVLMs), support GPT-4v, Gemini, QwenVLPlus, 50+ HF models, 20+ benchmarks
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4V. 接近GPT-4V表现的可商用开源多模态对话模型
Digital Avatar Conversational System - Linly-Talker. 😄✨ Linly-Talker is an intelligent AI system that combines large language models (LLMs) with visual models to create a novel human-AI interaction…
An implementation of "CLIP4STR: A Simple Baseline for Scene Text Recognition with Pre-trained Vision-Language Model".
MiniCPM-2B: An end-side LLM outperforming Llama2-13B.
[CVPR 2024] Real-Time Open-Vocabulary Object Detection