Highlights
- Pro
Block or Report
Block or report leeloolee
Contact GitHub support about this user’s behavior. Learn more about reporting abuse.
Report abuseStars
Language
Sort by: Recently starred
Contextual Object Detection with Multimodal Large Language Models
A UI-Focused Agent for Windows OS Interaction.
WorkArena: How Capable are Web Agents at Solving Common Knowledge Work Tasks?
Scenic: A Jax Library for Computer Vision Research and Beyond
SVIT: Scaling up Visual Instruction Tuning
Data release for the ImageInWords (IIW) paper.
List of references and online resources related to data science, machine learning and deep learning.
Universal LLM Deployment Engine with ML Compilation
Using pre-trained Diffusion models as priors for inference tasks
Generative Diffusion Prior for Unified Image Restoration and Enhancement (CVPR2023)
Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything
🧬 Generative modeling of regulatory DNA sequences with diffusion probabilistic models 💨
Medical Image Segmentation with Diffusion Model
v objective diffusion inference code for PyTorch.
[ICML 2024] Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs (PRG)
Dataset introduced in PlotQA: Reasoning over Scientific Plots
Beyond Hallucinations: Enhancing LVLMs through Hallucination-Aware Direct Preference Optimization
A lightweight, scalable, and general framework for visual question answering research
A collection of resources on applications of multi-modal learning in medical imaging.
Pythonic wrappers for Cider/CiderD evaluation metrics. Provides CIDEr as well as CIDEr-D (CIDEr Defended) which is more robust to gaming effects. We also add the possibility to replace the original…
Let's build better datasets, together!
MATH-Vision dataset and code to measure Multimodal Mathematical Reasoning capabilities.
An LLM-powered knowledge curation system that researches a topic and generates a full-length report with citations.
LightEval is a lightweight LLM evaluation suite that Hugging Face has been using internally with the recently released LLM data processing library datatrove and LLM training library nanotron.
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
ChatArena (or Chat Arena) is a Multi-Agent Language Game Environments for LLMs. The goal is to develop communication and collaboration capabilities of AIs.
Tutel MoE: An Optimized Mixture-of-Experts Implementation
Mixture-of-Experts for Large Vision-Language Models