-
Peking University
- Beijing China
- https://sikastar.github.io/
Lists (28)
Sort Name ascending (A-Z)
3D
Agent
Base Model
CLIP
Detection
DG
Diffusion
Foundational-Models
Generative Model
GPT
job
LLM
Multi-modal
Multi-modal prompts
OOD
PEFT
Pretrain
Prompt
Segmentation
Self-supervised
Semi-supervised
T2I&T2V
Transfer Learning
Transformer
UDA
生活
科研工具
资源
Starred repositories
GPT4V-level open-source multi-modal model based on Llama3-8B
Qwen2-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
The official Python library for the OpenAI API
Implementation of paper - Hyper-YOLO: When Visual Object Detection Meets Hypergraph Computation.
Implementation of paper - Hyper-YOLO: When Visual Object Detection Meets Hypergraph Computation.
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
[ECCV2024] Grounded Multimodal Large Language Model with Localized Visual Tokenization
Use PEFT or Full-parameter to finetune 300+ LLMs or 80+ MLLMs. (Qwen2, GLM4v, Internlm2.5, Yi, Llama3.1, Llava-Video, Internvl2, MiniCPM-V-2.6, Deepseek, Baichuan2, Gemma2, Phi3-Vision, ...)
🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.
ChatGLM3 series: Open Bilingual Chat LLMs | 开源双语对话语言模型
AI2-THOR Data Collection Tool Based On Keyboard Interaction
VILA - a multi-image visual language model with training, inference and evaluation recipe, deployable from cloud to edge (Jetson Orin and laptops)
[IJCV 2024] The official implementation of "Pattern-Expandable Image Copy Detection"
This repository is an official implementation of the paper "LW-DETR: A Transformer Replacement to YOLO for Real-Time Detection".
DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception
ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment
GLM-4 series: Open Multilingual Multimodal Chat LMs | 开源多语言多模态对话模型
a state-of-the-art-level open visual language model | 多模态预训练模型
MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities (ICML 2024)
[CVPR 2024] Real-Time Open-Vocabulary Object Detection
Hunyuan-DiT : A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
[CVPR2024 Highlight]GLEE: General Object Foundation Model for Images and Videos at Scale
Official implementation of the CVPR 2024 paper ViT-CoMer: Vision Transformer with Convolutional Multi-scale Feature Interaction for Dense Predictions.
Open-source and strong foundation image recognition models.
A Survey on Text-to-Video Generation/Synthesis.