Artanic30

🏠

Working from home

Artanic30

🏠

Working from home

25 followers · 21 following

ShanghaiTech University
Shanghai
https://scholar.google.com/citations?user=j_8OPwwAAAAJ&hl=en

Achievements

Organizations

Stars

mrwu-mac / R-Bench

Repo for the paper `Evaluating and Analyzing Relationship Hallucinations in Large Vision-Language Models' (ICML2024)

Python 18 Updated Sep 3, 2024

YiyangZhou / POVID

[Arxiv] Aligning Modalities in Vision Large Language Models via Preference Fine-tuning

Python 61 1 Updated Apr 30, 2024

RLHF-V / RLHF-V

[CVPR'24] RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback

Python 216 6 Updated Sep 11, 2024

zhang-yu-wei / InBedder

Source code for InBedder, an instruction-following text embedder

Python 20 Updated May 16, 2024

pipilurj / bootstrapped-preference-optimization-BPO

code for "Strengthening Multimodal Large Language Model with Bootstrapped Preference Optimization"

Python 40 1 Updated Aug 23, 2024

BAAI-DCAI / Bunny

A family of lightweight multimodal models.

Python 872 66 Updated Sep 4, 2024

chancharikmitra / CCoT

[CVPR 2024] Official Code for the Paper "Compositional Chain-of-Thought Prompting for Large Multimodal Models"

Python 55 3 Updated Jun 20, 2024

PKU-YuanGroup / MoE-LLaVA

Mixture-of-Experts for Large Vision-Language Models

Python 1,906 121 Updated May 15, 2024

GAIR-NLP / anole

Anole: An Open, Autoregressive and Native Multimodal Models for Interleaved Image-Text Generation

Python 639 36 Updated Aug 5, 2024

FreedomIntelligence / ALLaVA

Harnessing 1.4M GPT4V-synthesized Data for A Lite Vision-Language Model

Python 235 8 Updated Jun 25, 2024

LiJunnan1992 / DivideMix

Code for paper: DivideMix: Learning with Noisy Labels as Semi-supervised Learning

Python 530 83 Updated Sep 14, 2020

Artanic30 / MacCap

AAAI 2024 Accepted Paper Mining Fine-Grained Image-Text Alignment for Zero-Shot Captioning via Text-Only Training

Python 8 1 Updated Jul 2, 2024

facebookresearch / chameleon

Repository for Meta Chameleon, a mixed-modal early-fusion foundation model from FAIR.

Python 1,752 107 Updated Jul 29, 2024

FoundationVision / LlamaGen

Autoregressive Model Beats Diffusion: 🦙 Llama for Scalable Image Generation

Python 1,192 46 Updated Aug 15, 2024

zd11024 / NaviLLM

[CVPR 2024] The code for paper 'Towards Learning a Generalist Model for Embodied Navigation'

Python 101 7 Updated Jun 18, 2024

Alpha-VLLM / Lumina-T2X

Lumina-T2X is a unified framework for Text to Any Modality Generation

Python 2,021 85 Updated Aug 6, 2024

DiT-3D / DiT-3D

🔥🔥🔥Official Codebase of "DiT-3D: Exploring Plain Diffusion Transformers for 3D Shape Generation"

Python 211 15 Updated May 17, 2024

jyFengGoGo / InstructDet

Python 31 1 Updated Mar 22, 2024

LinWeizheDragon / Retrieval-Augmented-Visual-Question-Answering

This is the official repository for Retrieval Augmented Visual Question Answering

Python 158 14 Updated Sep 3, 2024

beichenzbc / Long-CLIP

[ECCV 2024] official code for "Long-CLIP: Unlocking the Long-Text Capability of CLIP"

Python 586 27 Updated Aug 13, 2024

SiyuanHuang95 / ManipVQA

[IROS24 Oral]ManipVQA: Injecting Robotic Affordance and Physically Grounded Information into Multi-Modal Large Language Models

Python 58 3 Updated Aug 22, 2024

om-ai-lab / VL-CheckList

Evaluating Vision & Language Pretraining Models with Objects, Attributes and Relations.

Python 124 4 Updated Jun 9, 2023

OpenGVLab / all-seeing

[ICLR 2024 & ECCV 2024] The All-Seeing Projects: Towards Panoptic Visual Recognition&Understanding and General Relation Comprehension of the Open World"

Python 443 14 Updated Aug 9, 2024

willisma / SiT

Official PyTorch Implementation of "SiT: Exploring Flow and Diffusion-based Generative Models with Scalable Interpolant Transformers"

Python 587 26 Updated Mar 12, 2024

state-spaces / mamba

Mamba SSM architecture

Python 12,520 1,053 Updated Aug 15, 2024

tickstep / aliyunpan

阿里云盘命令行客户端，支持JavaScript插件，支持同步备份功能。

Go 4,058 350 Updated Sep 9, 2024

mbzuai-oryx / groundingLMM

[CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses that are seamlessly integrated with object segmentation masks.

Python 740 37 Updated Jun 2, 2024

arijitray1993 / COLA

COLA: Evaluate how well your vision-language model can Compose Objects Localized with Attributes!

22 Updated Jun 1, 2024

AIFEG / BenchLMM

[ECCV 2024] BenchLMM: Benchmarking Cross-style Visual Capability of Large Multimodal Models

Python 81 6 Updated Aug 19, 2024

Meituan-AutoML / MobileVLM

Strong and Open Vision Language Assistant for Mobile Devices

Python 967 64 Updated Apr 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Artanic30

Achievements

Achievements

Organizations

Block or report Artanic30

Stars

mrwu-mac / R-Bench

YiyangZhou / POVID

RLHF-V / RLHF-V

zhang-yu-wei / InBedder

pipilurj / bootstrapped-preference-optimization-BPO

BAAI-DCAI / Bunny

chancharikmitra / CCoT

PKU-YuanGroup / MoE-LLaVA

GAIR-NLP / anole

FreedomIntelligence / ALLaVA

LiJunnan1992 / DivideMix

Artanic30 / MacCap

facebookresearch / chameleon

FoundationVision / LlamaGen

zd11024 / NaviLLM

Alpha-VLLM / Lumina-T2X

DiT-3D / DiT-3D

jyFengGoGo / InstructDet

LinWeizheDragon / Retrieval-Augmented-Visual-Question-Answering

beichenzbc / Long-CLIP

SiyuanHuang95 / ManipVQA

om-ai-lab / VL-CheckList

OpenGVLab / all-seeing

willisma / SiT

state-spaces / mamba

tickstep / aliyunpan

mbzuai-oryx / groundingLMM

arijitray1993 / COLA

AIFEG / BenchLMM

Meituan-AutoML / MobileVLM