Skip to content
View patrick-tssn's full-sized avatar
Block or Report

Block or report patrick-tssn

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

OpenVINO™ is an open-source toolkit for optimizing and deploying AI inference

C++ 6,555 2,117 Updated Jul 26, 2024

Toolkit to segment text into sentences or other semantic units in a robust, efficient and adaptable way.

Python 634 37 Updated Jul 12, 2024

Seamlessly integrate state-of-the-art transformer models into robotics stacks

Python 135 17 Updated Jul 26, 2024

Resources for our paper: "EvoAgent: Towards Automatic Multi-Agent Generation via Evolutionary Algorithms"

Python 67 7 Updated Jul 12, 2024

Cambrian-1 is a family of multimodal LLMs with a vision-centric design.

Python 1,600 98 Updated Jul 26, 2024

Long Context Transfer from Language to Vision

Python 251 12 Updated Jul 12, 2024

LLM101n: Let's build a Storyteller

25,631 1,360 Updated Jul 21, 2024

The open-source tool for building high-quality datasets and computer vision models

Python 7,934 520 Updated Jul 26, 2024

Repository for Meta Chameleon, a mixed-modal early-fusion foundation model from FAIR.

Python 1,591 96 Updated Jul 26, 2024

VideoHallucer, The first comprehensive benchmark for hallucination detection in large video-language models (LVLMs)

Python 14 Updated Jun 25, 2024

Official code for (IMA) Implicit Multimodal Alignment: On the Generalization of Frozen LLMs to Multimodal Inputs

7 Updated May 28, 2024

MINT-1T: A one trillion token multimodal interleaved dataset.

450 6 Updated Jul 24, 2024
Python 30 2 Updated Jun 20, 2024

Open-MAGVIT2: Democratizing Autoregressive Visual Generation

Python 345 10 Updated Jul 10, 2024

Pressure Testing Large Video-Language Models (LVLM): Doing multimodal retrieval from LVLM at any video lengths to measure accuracy

Python 3 1 Updated Jun 21, 2024

The codes about "Uni-MoE: Scaling Unified Multimodal Models with Mixture of Experts"

Python 738 33 Updated Jul 23, 2024

OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text

201 4 Updated Jun 16, 2024

OmniTokenizer: one model and one weight for image-video joint tokenization.

Python 199 4 Updated Jul 9, 2024

OpenVLA: An open-source vision-language-action model for robotic manipulation.

Python 736 85 Updated Jul 15, 2024

V-Express aims to generate a talking head video under the control of a reference image, an audio, and a sequence of V-Kps images.

Python 2,093 261 Updated Jun 29, 2024

[ECCV 2024] Bridging Different Language Models and Generative Vision Models for Text-to-Image Generation

Python 294 21 Updated Jul 17, 2024

[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的可商用开源多模态对话模型

Python 4,420 341 Updated Jul 26, 2024

A generative speech model for daily dialogue.

Python 28,261 3,070 Updated Jul 25, 2024

[Arxiv-2024] MotionLLM: Understanding Human Behaviors from Human Motions and Videos

Python 189 4 Updated Jun 28, 2024

Official Repo for Fine-Tuning Large Vision-Language Models as Decision-Making Agents via Reinforcement Learning

Jupyter Notebook 150 14 Updated Jun 27, 2024
Python 273 7 Updated Jun 27, 2024

Pandora: Towards General World Model with Natural Language Actions and Video States

Python 440 29 Updated May 27, 2024
Python 400 20 Updated May 22, 2024

RoboCasa: Large-Scale Simulation of Everyday Tasks for Generalist Robots

Python 403 23 Updated Jul 12, 2024

Implementation of 1D, 2D, and 3D FFT convolutions in PyTorch. Much faster than direct convolutions for large kernel sizes.

Python 463 57 Updated Sep 28, 2023
Next