-
Peking University
- https://patrick-tssn.github.io
Block or Report
Block or report patrick-tssn
Contact GitHub support about this user’s behavior. Learn more about reporting abuse.
Report abuseStars
Language
Sort by: Recently starred
OpenVINO™ is an open-source toolkit for optimizing and deploying AI inference
Toolkit to segment text into sentences or other semantic units in a robust, efficient and adaptable way.
Seamlessly integrate state-of-the-art transformer models into robotics stacks
Resources for our paper: "EvoAgent: Towards Automatic Multi-Agent Generation via Evolutionary Algorithms"
Cambrian-1 is a family of multimodal LLMs with a vision-centric design.
Long Context Transfer from Language to Vision
The open-source tool for building high-quality datasets and computer vision models
Repository for Meta Chameleon, a mixed-modal early-fusion foundation model from FAIR.
VideoHallucer, The first comprehensive benchmark for hallucination detection in large video-language models (LVLMs)
Official code for (IMA) Implicit Multimodal Alignment: On the Generalization of Frozen LLMs to Multimodal Inputs
MINT-1T: A one trillion token multimodal interleaved dataset.
Open-MAGVIT2: Democratizing Autoregressive Visual Generation
Pressure Testing Large Video-Language Models (LVLM): Doing multimodal retrieval from LVLM at any video lengths to measure accuracy
The codes about "Uni-MoE: Scaling Unified Multimodal Models with Mixture of Experts"
OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
OmniTokenizer: one model and one weight for image-video joint tokenization.
openvla / openvla
Forked from TRI-ML/prismatic-vlmsOpenVLA: An open-source vision-language-action model for robotic manipulation.
V-Express aims to generate a talking head video under the control of a reference image, an audio, and a sequence of V-Kps images.
[ECCV 2024] Bridging Different Language Models and Generative Vision Models for Text-to-Image Generation
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的可商用开源多模态对话模型
A generative speech model for daily dialogue.
[Arxiv-2024] MotionLLM: Understanding Human Behaviors from Human Motions and Videos
Official Repo for Fine-Tuning Large Vision-Language Models as Decision-Making Agents via Reinforcement Learning
Pandora: Towards General World Model with Natural Language Actions and Video States
RoboCasa: Large-Scale Simulation of Everyday Tasks for Generalist Robots
Implementation of 1D, 2D, and 3D FFT convolutions in PyTorch. Much faster than direct convolutions for large kernel sizes.