Block or Report
Block or report LLMFocus
Contact GitHub support about this user’s behavior. Learn more about reporting abuse.
Report abuseLists (26)
Sort Name ascending (A-Z)
agent
bookmark
censorship
coding
computer vision
cookbook
designer
evaluation and testing llm
graph generator
image generation
improve performance
keyboard
llm interface
medical
mobile
nursing home
Production resources
prompt
Quantization of LLM
rag
simulation
study
training and fine tunning
useful non llm tool
voice
web scraper
Stars
Language
Sort by: Recently starred
Never forget the resource that helps to close that sales call! Power a real-time speech-to-text agent with retrieval augmented generation based on webscraped customer use-cases.
A unified codebase for finetuning (full, lora) large multimodal models, supporting llava-1.5, qwen-vl, llava-interleave, llava-next-video, phi3-v etc.
Implementation of Paint-with-words with Stable Diffusion : method from eDiff-I that let you generate image from text-labeled segmentation map.
Large Model Proxy is designed to make it easy to run multiple resource-heavy Large Models (LM) on the same machine with limited amount of VRAM/other resources. It listens on a dedicated port for ea…
EfficientViT is a new family of vision models for efficient high-resolution vision.
An open source implementation of CLIP.
UniTable: Towards a Unified Table Foundation Model
Implementation of Nougat Neural Optical Understanding for Academic Documents
Code for Text2Performer. Paper: Text2Performer: Text-Driven Human Video Generation
Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
Tika-Python is a Python binding to the Apache Tika™ REST services allowing Tika to be called natively in the Python community.
Claude Plus is an advanced AI-powered development assistant that combines the capabilities of Anthropic's Claude AI with a suite of development tools.
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models
🎙️ Speak with AI - Run locally using ollama or OpenAI - XTTS or OpenAI Speech or ElevenLabs
An app that blurs faces in realtime using VisionCamera, Skia and MLKit 😷
[MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
Large Language-and-Vision Assistant for Biomedicine, built towards multimodal GPT-4 level capabilities.
Multi-Aspect Vision Language Pretraining - CVPR2024
[Arxiv-2024] CheXagent: Towards a Foundation Model for Chest X-Ray Interpretation
A collection of resources on applications of multi-modal learning in medical imaging.
"Worse than Random? An Embarrassingly Simple Probing Evaluation of Large Multimodal Models in Medical VQA"
BiomedGPT: A Unified and Generalist Biomedical Generative Pre-trained Transformer for Vision, Language, and Multimodal Tasks
Agent benchmark for medical diagnosis
Interact with your documents using the power of GPT, 100% privately, no data leaks