Highlights
- Pro
Block or Report
Block or report ta012
Contact GitHub support about this user’s behavior. Learn more about reporting abuse.
Report abuseLists (2)
Sort Name ascending (A-Z)
Stars
Language
Sort by: Recently starred
A New Tamil Large Language Model (LLM) Based on Llama 2
tiktoken is a fast BPE tokeniser for use with OpenAI's models.
Papers and resources related to the security and privacy of LLMs 🤖
The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
[CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses that are seamlessly integrated with object segmentation masks.
[ECCV’24] Official Implementation for CAT: Enhancing Multimodal Large Language Model to Answer Questions in Dynamic Audio-Visual Scenarios
A curated list of different papers and datasets in various areas of audio-visual processing
GPT4V-level open-source multi-modal model based on Llama3-8B
Joint Academic Data Science Endeavour (JADE) is the largest GPU facility in the UK supporting world-leading research in machine learning (and this is the repo that powers its website)
[CVPR 2024 Highlight] Official implementation of the paper: Cooperation Does Matter: Exploring Multi-Order Bilateral Relations for Audio-Visual Segmentation
Awesome speech/audio LLMs, representation learning, and codec models
Code for the Interspeech 2021 paper "AST: Audio Spectrogram Transformer".
Open Vocabulary Semantic Scene Sketch Understanding
[EMNLP 2023 Demo] Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding
A minimal PyTorch re-implementation of the OpenAI GPT (Generative Pretrained Transformer) training
SwissArmyTransformer is a flexible and powerful library to develop your own Transformer variants.
MU-LLaMA: Music Understanding Large Language Model
An open-source framework for training large multimodal models.
The official repo for [TPAMI'23] "Vision Transformer with Quadrangle Attention"
🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.
This repository provides code and resources for Parameter Efficient Fine-Tuning (PEFT), a technique for improving fine-tuning efficiency in natural language processing tasks.
cross modal background suppression for audio-visual event localization
A general representation model across vision, audio, language modalities. Paper: ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities
A unified framework for Low-resource Audio Processing and Evaluation (SSL Pre-training and Downstream Fine-tuning)
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities