Stars
Codebase for Instruction Following without Instruction Tuning
[Nature Communications] The official codes for "Towards Building Multilingual Language Model for Medicine"
Vision-Augmented Retrieval and Generation (VARAG) - Vision first RAG Engine
[EMNLP 2024] MatchTime: Towards Automatic Soccer Game Commentary Generation
Handwritten Text Recognition and Character Detection
计算机视觉课程设计-基于Chinese-CLIP的图文检索系统
LOTUS: The semantic query engine - process data with LMs as easily as writing pandas code
Generates an audiobook with chapters and ebook metadata using Calibre and Xtts from Coqui tts, and with optional voice cloning, and supports multiple languages
Ultra-low bitrate neural audio codec (0.31~1.40 kbps) with a better semantic in the latent space.
This is the code for the SpeechTokenizer presented in the SpeechTokenizer: Unified Speech Tokenizer for Speech Language Models. Samples are presented on
Speech, Language, Audio, Music Processing with Large Language Model
The official repository of the Omni-MATH benchmark.
Self-Reflection in LLM Agents: Effects on Problem-Solving Performance
Phidias: A Generative Model for Creating 3D Content from Text, Image, and 3D Conditions with Reference-Augmented Diffusion
A visual and transparent alternative to open-source ChatGPT O1
Time-MoE: Billion-Scale Time Series Foundation Models with Mixture of Experts
Official implementation of "MIMO: Controllable Character Video Synthesis with Spatial Decomposed Modeling"
This is a user guide for the MiniCPM and MiniCPM-V series of small language models (SLMs) developed by ModelBest. “面壁小钢炮” focuses on achieving exceptional performance on the edge.
Effective and efficient hour-scale long video understanding model
Free and open-source map hosting solution with custom styles for websites and apps, using OpenStreetMap data
Official PyTorch implementation of "Expressive Whole-Body 3D Gaussian Avatar", ECCV 2024.