Skip to content
View fujingling's full-sized avatar
Block or Report

Block or report fujingling

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

On the Hidden Mystery of OCR in Large Multimodal Models (OCRBench)

Python 370 25 Updated May 23, 2024

The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.

Python 4,346 330 Updated May 28, 2024

A Comprehensive Survey of Scientific Large Language Models and Their Applications in Scientific Discovery

391 24 Updated Jun 20, 2024

Official Implementation of OCR-free Document Understanding Transformer (Donut) and Synthetic Document Generator (SynthDoG), ECCV 2022

Python 5,543 449 Updated Jul 11, 2024

A collection of OCR-related datasets

78 4 Updated Sep 7, 2022

Papers, Datasets, Algorithms, SOTA for STR. Long-time Maintaining

93 8 Updated Feb 25, 2022

收集并整理有关OCR的数据集并统一标注格式,以便实验需要

Python 854 189 Updated Nov 28, 2023

Ongoing research training transformer language models at scale, including: BERT & GPT-2

Python 1,744 333 Updated Jul 16, 2024

Hunyuan-DiT : A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding

Python 2,784 215 Updated Jul 16, 2024

High-Quality Human Motion Video Generation with Confidence-aware Pose Guidance

Python 990 80 Updated Jul 9, 2024

MusePose: a Pose-Driven Image-to-Video Framework for Virtual Human Generation

Python 1,911 133 Updated Jul 10, 2024

Large Language Model Text Generation Inference

Python 8,403 955 Updated Jul 16, 2024

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 23,008 3,255 Updated Jul 16, 2024

💥 Fast State-of-the-Art Tokenizers optimized for Research and Production

Rust 8,714 752 Updated Jul 15, 2024

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Python 33,880 3,974 Updated Jul 16, 2024

Materials for the Hugging Face Diffusion Models Course

Jupyter Notebook 3,398 359 Updated Apr 11, 2024

Diffusion-Tryon-Trainer

Python 105 22 Updated May 8, 2024

🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.

Python 24,118 4,978 Updated Jul 16, 2024

ms-swift: Use PEFT or Full-parameter to finetune 300+ LLMs or 50+ MLLMs. (Qwen2, GLM4v, Internlm2.5, Yi, Llama3, Llava-Video, Internvl2, MiniCPM-V, Deepseek, Baichuan2, Gemma2, Phi3-Vision, ...)

Python 2,394 221 Updated Jul 16, 2024

GPT4V-level open-source multi-modal model based on Llama3-8B

Python 1,580 84 Updated Jul 16, 2024

GLM-4 series: Open Multilingual Multimodal Chat LMs | 开源多语言多模态对话模型

Python 3,738 271 Updated Jul 16, 2024

UniTable: Towards a Unified Table Foundation Model

Jupyter Notebook 286 15 Updated Jun 4, 2024

[CVPR 2024 Highlight] Official implementation of the paper: Cooperation Does Matter: Exploring Multi-Order Bilateral Relations for Audio-Visual Segmentation

Python 26 3 Updated Jul 5, 2024

End-to-End Object Detection with Transformers

Python 13,124 2,379 Updated Mar 12, 2024

Open-source evaluation toolkit of large vision-language models (LVLMs), support GPT-4v, Gemini, QwenVLPlus, 50+ HF models, 20+ benchmarks

Python 700 82 Updated Jul 16, 2024

[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4V. 接近GPT-4V表现的可商用开源多模态对话模型

Python 4,166 312 Updated Jul 12, 2024

Digital Avatar Conversational System - Linly-Talker. 😄✨ Linly-Talker is an intelligent AI system that combines large language models (LLMs) with visual models to create a novel human-AI interaction…

Python 1,406 237 Updated Jul 9, 2024

An implementation of "CLIP4STR: A Simple Baseline for Scene Text Recognition with Pre-trained Vision-Language Model".

Python 90 12 Updated May 3, 2024

MiniCPM-2B: An end-side LLM outperforming Llama2-13B.

Python 4,439 322 Updated Jul 16, 2024

[CVPR 2024] Real-Time Open-Vocabulary Object Detection

Python 3,969 387 Updated Jul 8, 2024
Next