-
Tampere University
- Finland
-
14:35
(UTC +03:00) - ilpoviertola.github.io
- in/ilpo-viertola
Highlights
- Pro
Stars
A curated list of audio-visual learning methods and datasets.
🎥 Python and OpenCV-based scene cut/transition detection program & library.
[ECCV2024] Video Foundation Models & Data for Multimodal Understanding
A linear estimator on top of clip to predict the aesthetic quality of pictures
Autoregressive Model Beats Diffusion: 🦙 Llama for Scalable Image Generation
🤖 Machine Learning Summer School deadlines
Awesome speech/audio LLMs, representation learning, and codec models
Foundational model for human-like, expressive TTS
Idempotent Generative Network's unofficial pytorch implementation
PyTorch code and models for V-JEPA self-supervised learning from video.
FLUX, Stable Diffusion, SDXL, SD3, LoRA, Fine Tuning, DreamBooth, Training, Automatic1111, Forge WebUI, SwarmUI, DeepFake, TTS, Animation, Text To Video, Tutorials, Guides, Lectures, Courses, Comfy…
[ICCV2023 Oral] Unmasked Teacher: Towards Training-Efficient Video Foundation Models
Implementation of Perceiver, General Perception with Iterative Attention, in Pytorch
An open-source framework for training large multimodal models.
Official codebase for "Unveiling the Power of Audio-Visual Early Fusion Transformers with Dense Interactions through Masked Modeling".
[ACL'19] [PyTorch] Multimodal Transformer
Frechet Audio Distance evaluation in PyTorch
✨✨Latest Advances on Multimodal Large Language Models
Implementation of Multistream Transformers in Pytorch
Get up and running with Llama 3.2, Mistral, Gemma 2, and other large language models.
Unofficial pytorch implementation of the paper "Learnable Fourier Features for Multi-Dimensional Spatial Positional Encoding", NeurIPS 2021.
Learnable Fourier Features for Multi-Dimensional Spatial Positional Encoding
A (nearly) no-CSS, fast, minimalist Jekyll theme.
Track and predict the energy consumption and carbon footprint of training deep learning models.
Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable…