Merve Noyan's picture

Merve Noyan PRO

merve

·

AI & ML interests

VLMs, vision & co

Articles

Llama can now see and run on your device - welcome Llama 3.2

Preference Optimization for Vision Language Models

Fine-tuning Florence-2 - Microsoft's Cutting-edge Vision Language Models

PaliGemma – Google's Cutting-Edge Open Vision Language Model

Vision Language Models Explained

Introduction to Quantization cooked in 🤗 with 💗🧑‍🍳

Deploy MusicGen in no time with Inference Endpoints

Open-Source Text Generation & LLM Ecosystem at Hugging Face

Jupyter X Hugging Face

Using Machine Learning to Aid Survivors and Race through Time

Introducing Skops

Announcing the Hugging Face Fellowship Program

Showcase Your Projects in Spaces using Gradio

Hosting your Models and Datasets on Hugging Face Spaces using Streamlit

Organizations

Posts 68

Post

205

Another great week in open ML!
Here's a small recap 🫰🏻

Model releases
⏯️ Video Language Models
AI at Meta released Vision-CAIR/LongVU_Qwen2_7B, a new state-of-the-art long video LM model based on DINOv2, SigLIP, Qwen2 and Llama 3.2

💬 Small language models
Hugging Face released HuggingFaceTB/SmolLM2-1.7B, a family of new smol language models with Apache 2.0 license that come in sizes 135M, 360M and 1.7B, along with datasets.
Meta released facebook/MobileLLM-1B, a new family of on-device LLMs of sizes 125M, 350M and 600M

🖼️ Image Generation
Stability AI released stabilityai/stable-diffusion-3.5-medium, a 2B model with commercially permissive license

🖼️💬Any-to-Any
gpt-omni/mini-omni2 is closest reproduction to GPT-4o, a new LLM that can take image-text-audio input and output speech is released!

Dataset releases
🖼️ Spawning/PD12M, a new captioning dataset of 12.4 million examples generated using Florence-2

Post

4427

Hugging Face Hub Python library now comes with easy inference for vision language models! ✨

$ pip install huggingface_hub 🤗

Collections 33

spaces 104

Running on Zero

OWLSAM

State-of-the-art open-vocabulary image segmentation ⚡️

No application file

Sam2.1

SuperPoint

Running on CPU Upgrade

Gradio Tgi

Vision Papers

OWLSAM2

models 88

merve/google-ckpts

Updated 10 days ago

merve/google-tokenizers

Updated 10 days ago

merve/siglip-so400m-patch16-256-i18n

Updated 10 days ago • 27 • 8

merve/idefics3-llama-vqav2

merve/idefics3llama-vqav2

Updated Sep 11 • 8

merve/flux-dreambooth-lora

Updated Aug 16 • 1

merve/trained-flux-lora-lego

Text-to-Image • Updated Aug 16 • 22 • • 1

merve/flux-lego-lora-dreambooth

Text-to-Image • Updated Aug 16 • 1k • • 13

merve/sam2-hiera-large

Mask Generation • Updated Aug 2 • 890k • 2

merve/sam2-hiera-base-plus

Mask Generation • Updated Aug 2 • 96

datasets 26

merve/model-test-inputs

Updated 10 days ago • 48

merve/vqav2-small

Viewer • Updated Aug 8 • 21.4k • 868 • 7

merve/SGinW

Preview • Updated Jul 11 • 139

merve/pascal-voc

Viewer • Updated Jul 6 • 336k • 245

merve/YouCook2

Viewer • Updated May 28 • 2k • 53

merve/faiss_embeddings

Updated Jan 25 • 12

merve/pokemon-ds-embeddings

Viewer • Updated Jan 10 • 833 • 53 • 4

merve/tr-h4-norobots

Updated Jan 7 • 65 • 10

merve/lego_sets_latest

Viewer • Updated Jan 6 • 61 • 155 • 2

merve/ai-tube-dummy

Updated Dec 1, 2023 • 49