Block or Report
Block or report bil-ash
Contact GitHub support about this user’s behavior. Learn more about reporting abuse.
Report abuseStars
Language
Sort by: Recently starred
PixArt-α: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis
Implementation of the RWKV language model in pure WebGPU/Rust.
The official repo of Qwen2-Audio chat & pretrained large audio language model proposed by Alibaba Cloud.
Fine-Tuning of a multi-language transformer model on Nvidia GPUs.
Following master Karpathy with GPT-2 implementation and training, writing lots of comments cause I have memory of a goldfish
EMNLP 23 - Integrating Whisper Encoder to LLaMA Decoder for Generative ASR Error Correction
A Framework of Small-scale Large Multimodal Models
Autoregressive Model Beats Diffusion: 🦙 Llama for Scalable Image Generation
A base64 encoder/decoder with gzip or deflate abilities.
Vision-language conversation in 10 languages including English, Chinese, French, Spanish, Russian, Japanese, Arabic, Hindi, Bengali and Urdu.
MobiLlama : Small Language Model tailored for edge devices
this plugin embeds an video player. when upload a video this does not appear as a link but as players in the content. Accepted video formats MP4, WEBM, MOV, OGV。
Lumina-T2X is a unified framework for Text to Any Modality Generation
Examples of Pages using WebM files with Encrypted Media Extensions
A simple MP3 and AAC Decoder (not only) for Arduino based on libhelix
Code for "AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling"
ESP32 library that generates composite video signal for PAL, SECAM and NTSC.
Build android apps without any java, entirely in C and Make
Combining ViT and GPT-2 for image captioning. Trained on MS-COCO. The model was implemented mostly from scratch.