Ferret: Refer and Ground Anything Anywhere at Any Granularity

An end-to-end MLLM that can accept any-form referring and ground anything in response.*

Overview

Diagram of Ferret Model.

Key Contributions:

Ferret Model - Hybrid Region Representation + Spatial-aware Visual Sampler enable fine-grained and open-vocabulary referring and grounding in MLLM.
GRIT Dataset (~1.1M) - A Large-scale, Hierarchical, Robust ground-and-refer instruction tuning dataset.
Ferret-Bench - A multimodal evaluation benchmark that jointly requires Referring/Grounding, Semantics, Knowledge, and Reasoning.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
figs		figs
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md