Block or Report
Block or report princewang1994
Contact GitHub support about this user’s behavior. Learn more about reporting abuse.
Report abuseLists (1)
Sort Name ascending (A-Z)
Language
Sort by: Recently starred
Starred repositories
Official Repository of paper VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding
🔥🔥🔥 Web-based linux server management control panel. / 现代化、开源的 Linux 服务器运维管理面板。
Official Implementation of OCR-free Document Understanding Transformer (Donut) and Synthetic Document Generator (SynthDoG), ECCV 2022
Get up and running with Llama 3, Mistral, Gemma 2, and other large language models.
My implementation of "Patch n’ Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution"
Reference implementation for DPO (Direct Preference Optimization)
Elysium: Exploring Object-level Perception in Videos via MLLM
devmaxxing / videocr-PaddleOCR
Forked from apm1467/videocrExtract hardcoded subtitles from videos using machine learning
RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.
Official codebase used to develop Vision Transformer, SigLIP, MLP-Mixer, LiT and more.
Cobra: Extending Mamba to Multi-modal Large Language Model for Efficient Inference
Official code implementation of Vary: Scaling Up the Vision Vocabulary of Large Vision Language Models.
视频硬字幕提取,生成srt文件。无需申请第三方API,本地实现文本识别。基于深度学习的视频字幕提取框架,包含字幕区域检测、字幕内容提取。A GUI tool for extracting hard-coded subtitle (hardsub) from videos and generating srt files.
A Native-PyTorch Library for LLM Fine-tuning
[GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ultra-simple, user-friendly …
official repository of aiXcoder-7B Code Large Language Model
Towards Video Text Visual Question Answering: Benchmark and Baseline
A large Cross-Modal Video Retrieval Dataset with Reading Comprehension
InternLM-XComposer2 is a groundbreaking vision-language large model (VLLM) excelling in free-form text-image composition and comprehension.
An efficient, flexible and full-featured toolkit for fine-tuning LLM (InternLM2, Llama3, Phi3, Qwen, Mistral, ...)
Minimal, clean code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization.
MiniCPM-2B: An end-side LLM outperforming Llama2-13B.
[CVPR 2024] Prompt Highlighter: Interactive Control for Multi-Modal LLMs
Official Implementation for LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models