Stars
State-of-the-Art zero-shot voice conversion & singing voice conversion with in context learning
Use late-interaction multi-modal models such as ColPali in just a few lines of code.
Accelerating the development of large multimodal models (LMMs) with lmms-eval
Awesome Japan Open Data - 日本のオープンデータ情報一覧・まとめ
SageMakerで機械学習モデルを構築、学習、デプロイする方法が学べるNotebookと教材集
Various AI scripts. Mostly Stable Diffusion stuff.
An open-source RAG-based tool for chatting with your documents.
[ICML 2024] MagicPose(also known as MagicDance): Realistic Human Poses and Facial Expressions Retargeting with Identity-aware Diffusion
The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery 🧑🔬
MiniCPM-V 2.6: A GPT-4V Level MLLM for Single Image, Multi Image and Video on Your Phone
Enhanced ChatGPT Clone: Features Anthropic, AWS, OpenAI, Assistants API, Azure, Groq, o1, GPT-4o, Mistral, OpenRouter, Vertex AI, Gemini, Artifacts, AI model switching, message search, langchain, D…
Dify is an open-source LLM app development platform. Dify's intuitive interface combines AI workflow, RAG pipeline, agent capabilities, model management, observability features and more, letting yo…
Find parts of long text or data, allowing for some changes/typos.
Collect some papers and datastes about few-shot object detection for computer vision.
Open-source observability for your LLM application, based on OpenTelemetry
日本語LLMまとめ - Overview of Japanese LLMs
MusePose: a Pose-Driven Image-to-Video Framework for Virtual Human Generation
Stable Diffusion web UI
An extremely fast Python package and project manager, written in Rust.
MTL-TabNet: Multi-task Learning based Model for Image-based Table Recognition
litagin02 / Style-Bert-VITS2
Forked from fishaudio/Bert-VITS2Style-Bert-VITS2: Bert-VITS2 with more controllable voice styles.
OCR, layout analysis, reading order, table recognition in 90+ languages
Official PyTorch implementation of SegFormer