Stars
Repository for Meta Chameleon, a mixed-modal early-fusion foundation model from FAIR.
PromptBERT: Improving BERT Sentence Embeddings with Prompts
When do we not need larger vision models?
salihmarangoz / dinov2
Forked from facebookresearch/dinov2PyTorch code and models for the DINOv2 self-supervised learning method.
PyTorch code and models for the DINOv2 self-supervised learning method.
The repo for "Diagnosing and Re-learning for Balanced Multi-modal Learning", ECCV 2024
Official codebase used to develop Vision Transformer, SigLIP, MLP-Mixer, LiT and more.
A general representation model across vision, audio, language modalities. Paper: ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities
This repository contains various models targetting multimodal representation learning, multimodal fusion for downstream tasks such as multimodal sentiment analysis.
✨✨Latest Advances on Multimodal Large Language Models
MISA: Modality-Invariant and -Specific Representations for Multimodal Sentiment Analysis
a state-of-the-art-level open visual language model | 多模态预训练模型
Starter code for working with the YouTube-8M dataset.
A collection of videos annotated with timelines where each video is divided into segments, and each segment is labelled with a short free-text description
Summary about Video-to-Text datasets. This repository is part of the review paper *Bridging Vision and Language from the Video-to-Text Perspective: A Comprehensive Review*
Kolmogorov-Arnold Networks (KAN) using Chebyshev polynomials instead of B-splines.
[NeurIPS'22 Spotlight] Data and code for our paper CoNT: Contrastive Neural Text Generation
Multi-granularity Correspondence Learning from Long-term Noisy Videos [ICLR 2024, Oral]
The suite of modeling video with Mamba
Official inference library for Mistral models
Train transformer language models with reinforcement learning.
🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.
The simplest, fastest repository for training/finetuning medium-sized GPTs.
整理开源的中文大语言模型,以规模较小、可私有化部署、训练成本较低的模型为主,包括底座模型,垂直领域微调及应用,数据集与教程等。