Lists (22)
Sort Name ascending (A-Z)
awsome
backbones
captions
clustering
contrastve learning
diffusion_models
ego4d
few-shot
germany
learn
LLMs
long-tail
NCD
nlp
openset
resources
tech
time_transformers
transformers
video memory efficient
videos
work-in-progress
Stars
A method to increase the speed and lower the memory footprint of existing vision transformers.
An efficient, flexible and full-featured toolkit for fine-tuning LLM (InternLM2, Llama3, Phi3, Qwen, Mistral, ...)
Easily create large video dataset from video urls
The official repository for ICLR2024 paper "FROSTER: Frozen CLIP is a Strong Teacher for Open-Vocabulary Action Recognition"
[arXiv:2309.16669] Code release for "Training a Large Video Model on a Single Machine in a Day"
Fast and memory-efficient exact attention
OrCo: Towards Better Generalization via Orthogonality and Contrast for Few-Shot Class-Incremental Learning
Minimal, clean code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization.
mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video (ICML 2023)
Friends don't let friends make certain types of data visualization - What are they and why are they bad.
Scripts for fine-tuning Meta Llama with composable FSDP & PEFT methods to cover single/multi-node GPUs. Supports default & custom datasets for applications such as summarization and Q&A. Supporting…
Code and dataset for photorealistic Codec Avatars driven from audio
An open-source NLP research library, built on PyTorch.
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
Implementation of paper 'Helping Hands: An Object-Aware Ego-Centric Video Recognition Model'
DomainBed is a suite to test domain generalization algorithms
This is the official implement of paper "ActionCLIP: A New Paradigm for Action Recognition"
Official implementation of "Learning by Sorting: Self-supervised Learning with Group Ordering Constraints". ICCV 2023
Official implementation of "In-style: Bridging Text and Uncurated Videos with Style Transfer for Cross-modal Retrieval". ICCV 2023
Official implementation of "HowToCaption: Prompting LLMs to Transform Video Annotations at Scale." ECCV 2024
[ECCV2024] Video Foundation Models & Data for Multimodal Understanding
[ICCV2023 Oral] Unmasked Teacher: Towards Training-Efficient Video Foundation Models
[CVPR 2023] Official repository of paper titled "MaPLe: Multi-modal Prompt Learning".
💫 Industrial-strength Natural Language Processing (NLP) in Python