The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.

Jupyter Notebook 45,452 5,373 Updated Jun 24, 2024

mbzuai-oryx / groundingLMM

[CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses that are seamlessly integrated with object segmentation masks.

Python 666 31 Updated Jun 2, 2024

rikeilong / Bay-CAT

[ECCV’24] Official Implementation for CAT: Enhancing Multimodal Large Language Model to Answer Questions in Dynamic Audio-Visual Scenarios

27 1 Updated Jul 2, 2024

krantiparida / awesome-audio-visual

A curated list of different papers and datasets in various areas of audio-visual processing

631 70 Updated Jan 30, 2024

JinhuaLiang / APT

7 Updated Nov 30, 2023

THUDM / CogVLM2

GPT4V-level open-source multi-modal model based on Llama3-8B

Python 1,463 79 Updated Jul 8, 2024

jade-hpc-gpu / jade-hpc-gpu.github.io

Joint Academic Data Science Endeavour (JADE) is the largest GPU facility in the UK supporting world-leading research in machine learning (and this is the repo that powers its website)

HTML 24 7 Updated May 23, 2024

yannqi / COMBO-AVS

[CVPR 2024 Highlight] Official implementation of the paper: Cooperation Does Matter: Exploring Multi-Order Bilateral Relations for Audio-Visual Segmentation

Python 25 3 Updated Jul 5, 2024

ga642381 / speech-trident

Awesome speech/audio LLMs, representation learning, and codec models

499 24 Updated May 29, 2024

YuanGongND / ast

Code for the Interspeech 2021 paper "AST: Audio Spectrogram Transformer".

Jupyter Notebook 1,059 202 Updated May 21, 2023

AhmedBourouis / Scene-Sketch-Segmentation

Open Vocabulary Semantic Scene Sketch Understanding

Python 17 2 Updated Jul 1, 2024

DAMO-NLP-SG / Video-LLaMA

[EMNLP 2023 Demo] Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding

Python 2,588 236 Updated Jun 4, 2024

hkproj / pytorch-llama-notes

Notes about LLaMA 2 model

Python 36 4 Updated Aug 30, 2023

karpathy / minGPT

A minimal PyTorch re-implementation of the OpenAI GPT (Generative Pretrained Transformer) training

Python 19,409 2,402 Updated Apr 28, 2024

THUDM / SwissArmyTransformer

SwissArmyTransformer is a flexible and powerful library to develop your own Transformer variants.

Python 867 83 Updated Jun 12, 2024

shansongliu / MU-LLaMA

MU-LLaMA: Music Understanding Large Language Model

Python 210 16 Updated Mar 25, 2024

erow / disentanglement_lib

Python 5 Updated Sep 10, 2023

mlfoundations / open_flamingo

An open-source framework for training large multimodal models.

Python 3,562 270 Updated May 25, 2024

ViTAE-Transformer / QFormer

The official repo for [TPAMI'23] "Vision Transformer with Quadrangle Attention"

Python 139 7 Updated Apr 10, 2024

huggingface / peft

🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.

Python 14,947 1,424 Updated Jul 5, 2024

Abonia1 / LLM-finetuning

This repository provides code and resources for Parameter Efficient Fine-Tuning (PEFT), a technique for improving fine-tuning efficiency in natural language processing tasks.

Jupyter Notebook 14 4 Updated Feb 23, 2024

marmot-xy / CMBS

cross modal background suppression for audio-visual event localization

Python 32 6 Updated Mar 18, 2022

OFA-Sys / ONE-PEACE

A general representation model across vision, audio, language modalities. Paper: ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities

Python 884 54 Updated Jun 27, 2024

Sreyan88 / LAPE

A unified framework for Low-resource Audio Processing and Evaluation (SSL Pre-training and Downstream Fine-tuning)

Python 27 2 Updated Jul 6, 2023

kweonwooj / papers

summary of ML papers I've read

318 34 Updated Jul 27, 2018

microsoft / unilm

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities

Python 19,120 2,438 Updated Jul 7, 2024

qiuqiangkong / audioset_tagging_cnn

Python 1,279 247 Updated Jul 13, 2021