Lists (2)
Sort Name ascending (A-Z)
Stars
Cross-platform audio/video downloader
Tauri & ReactJS boilerplate for a modern desktop application. Not a project nor a substitute for my Tauri video tutorials.
Rapidly scaffold out a new tauri app project.
A Tarui Python Sidecar Example, using Pyinstaller.
Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and…
pure luajit ffi socket bindings for unix and windows
Learning to cut end-to-end pretrained modules
This is the official repository for our ECCV 2022 paper titled, "The Anatomy of Video Editing: A Dataset and Benchmark Suite for AI-Assisted Video Editing"
Robust Speech Recognition via Large-Scale Weak Supervision
Deploying a React App (created using create-react-app) to GitHub Pages
Qwen2-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
Open Source API and interchange format for editorial timeline information.
A high-throughput and memory-efficient inference and serving engine for LLMs
MLX-VLM is a package for running Vision LLMs locally on your Mac using MLX.
Transcription with speaker diarization pipeline
A python package to analyze and compare voices with deep learning
Speaker identification/verification models for Machine Learning for Computer Vision class at UNIBO
Papers, code and datasets about deep learning and multi-modal learning for video analysis
This repository contains various models targetting multimodal representation learning, multimodal fusion for downstream tasks such as multimodal sentiment analysis.
🎥➡️📝 Hermes: Blazing-fast video transcription powered by AI gods! Transcribe 6.5 minutes of video in just 1 second using Groq's LPU. Choose your transcription deity: MLX Whisper (local), Groq (spee…
An extremely fast implementation of whisper optimized for Apple Silicon using MLX.
Transcribe and summarize videos using whisper and llms on apple mlx framework
Distilled variant of Whisper for speech recognition. 6x faster, 50% smaller, within 1% word error rate.
Caesium is an image compression software that helps you store, send and share digital pictures, supporting JPG, PNG, WebP and TIFF formats. You can quickly reduce the file size (and resolution, if …
Distribute and run LLMs with a single file.
"EasyRec: Simple yet Effective Language Model for Recommendation"
A plotting tool that outputs Line Rider maps, so you can watch a man on a sled scoot down your loss curves. 🎿