- Shenzhen,China
Block or Report
Block or report ywdong
Contact GitHub support about this user’s behavior. Learn more about reporting abuse.
Report abuseLanguage
Sort by: Recently starred
Starred repositories
Awesome OCR multiple programing languages toolkits based on ONNXRuntime, OpenVION and PaddlePaddle. (将PaddleOCR模型做了转换,采用ONNXRuntime推理,速度很快)
💡 All-in-one open-source embeddings database for semantic search, LLM orchestration and language model workflows
NISQA - Non-Intrusive Speech Quality and TTS Naturalness Assessment
Reference-aware automatic speech evaluation toolkit
Find parts of long text or data, allowing for some changes/typos.
Anole: An Open, Autoregressive and Native Multimodal Models for Interleaved Image-Text Generation
a research paper for generative cartoon interpolation
Streamer-Sales 销冠 —— 卖货主播 LLM 大模型🛒🎁,一个能够根据给定的商品特点从激发用户购买意愿角度出发进行商品解说的卖货主播大模型。🚀⭐内含详细的数据生成流程❗ 📦另外还集成了 LMDeploy 加速推理🚀、RAG检索增强生成 📚、TTS文字转语音🔊、数字人生成 🦸、 Agent 使用网络查询实时信息🌐、ASR 语音转文字🎙️
An open-source project dedicated to tracking and segmenting any objects in videos, either automatically or interactively. The primary algorithms utilized include the Segment Anything Model (SAM) fo…
EfficientViT is a new family of vision models for efficient high-resolution vision.
Cambrian-1 is a family of multimodal LLMs with a vision-centric design.
ICLR2024 Spotlight: curation/training code, metadata, distribution and pre-trained models for MetaCLIP; CVPR 2024: MoDE: CLIP Data Experts via Clustering
[ICML'24 Oral] "MagicLens: Self-Supervised Image Retrieval with Open-Ended Instructions"
[CVPR24 Oral] Official repository for RALF: Retrieval-Augmented Layout Transformer for Content-Aware Layout Generation
code for ACL2024-main: BatchEval: Towards Human-like Text Evaluation
DynamiCrafter that works natively with ComfyUI's nodes, optimizations, and more.
GLM-4 series: Open Multilingual Multimodal Chat LMs | 开源多语言多模态对话模型
GPT4V-level open-source multi-modal model based on Llama3-8B
A copy of ComfyUI_IPAdapter_plus, Only changed node name to coexist with ComfyUI_IPAdapter_plus v1 version.
[ECCV2024] This is an official inference code of the paper "Glyph-ByT5: A Customized Text Encoder for Accurate Visual Text Rendering" and "Glyph-ByT5-v2: A Strong Aesthetic Baseline for Accurate Mu…
A generative speech model for daily dialogue.
A deep-dive on the entire history of deep-learning
Dify is an open-source LLM app development platform. Dify's intuitive interface combines AI workflow, RAG pipeline, agent capabilities, model management, observability features and more, letting yo…
Framework for orchestrating role-playing, autonomous AI agents. By fostering collaborative intelligence, CrewAI empowers agents to work together seamlessly, tackling complex tasks.
llama3 implementation one matrix multiplication at a time