Skip to content
View ilpoviertola's full-sized avatar
❤️‍🔥
❤️‍🔥

Highlights

  • Pro

Block or report ilpoviertola

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

A curated list of audio-visual learning methods and datasets.

223 17 Updated Sep 11, 2024

🎥 Python and OpenCV-based scene cut/transition detection program & library.

Python 3,176 388 Updated Oct 6, 2024

[ECCV2024] Video Foundation Models & Data for Multimodal Understanding

Python 1,345 85 Updated Sep 23, 2024

A linear estimator on top of clip to predict the aesthetic quality of pictures

Jupyter Notebook 457 20 Updated Aug 15, 2022

Video datasets

1,155 91 Updated Mar 8, 2023

Autoregressive Model Beats Diffusion: 🦙 Llama for Scalable Image Generation

Python 1,227 50 Updated Aug 15, 2024

🤖 Machine Learning Summer School deadlines

HTML 2,649 295 Updated Sep 10, 2024

Awesome speech/audio LLMs, representation learning, and codec models

630 28 Updated Sep 24, 2024

Foundational model for human-like, expressive TTS

Python 3,784 651 Updated Jul 30, 2024

Idempotent Generative Network's unofficial pytorch implementation

Python 43 4 Updated Nov 19, 2023

PyTorch code and models for V-JEPA self-supervised learning from video.

Python 2,631 250 Updated Aug 9, 2024

Efficient synchronization from sparse cues

Python 26 4 Updated Apr 25, 2024

FLUX, Stable Diffusion, SDXL, SD3, LoRA, Fine Tuning, DreamBooth, Training, Automatic1111, Forge WebUI, SwarmUI, DeepFake, TTS, Animation, Text To Video, Tutorials, Guides, Lectures, Courses, Comfy…

Jupyter Notebook 2,067 289 Updated Oct 7, 2024

[ICCV2023 Oral] Unmasked Teacher: Towards Training-Efficient Video Foundation Models

Python 286 15 Updated May 27, 2024
Python 1,750 54 Updated Jun 28, 2024
Python 8,363 490 Updated Oct 9, 2024

Implementation of Perceiver, General Perception with Iterative Attention, in Pytorch

Python 1,081 134 Updated Aug 22, 2023

An open-source framework for training large multimodal models.

Python 3,690 280 Updated Aug 31, 2024

Official codebase for "Unveiling the Power of Audio-Visual Early Fusion Transformers with Dense Interactions through Masked Modeling".

Python 16 1 Updated Aug 2, 2024

[ACL'19] [PyTorch] Multimodal Transformer

Python 805 150 Updated Sep 12, 2022

Frechet Audio Distance evaluation in PyTorch

Python 34 3 Updated Jun 9, 2023

✨✨Latest Advances on Multimodal Large Language Models

12,097 774 Updated Oct 9, 2024

Implementation of Multistream Transformers in Pytorch

Python 54 3 Updated Jul 31, 2021

Get up and running with Llama 3.2, Mistral, Gemma 2, and other large language models.

Go 92,741 7,314 Updated Oct 10, 2024

Unofficial pytorch implementation of the paper "Learnable Fourier Features for Multi-Dimensional Spatial Positional Encoding", NeurIPS 2021.

Python 13 1 Updated Apr 24, 2024

Learnable Fourier Features for Multi-Dimensional Spatial Positional Encoding

Python 42 9 Updated Sep 30, 2024

A (nearly) no-CSS, fast, minimalist Jekyll theme.

HTML 1,096 546 Updated Aug 5, 2024

Track and predict the energy consumption and carbon footprint of training deep learning models.

Python 374 27 Updated Sep 20, 2024

nanogpt turned into a chat model

Python 61 11 Updated Aug 30, 2023

Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable…

Python 20,725 2,112 Updated Jul 18, 2024
Next