Skip to content
View SifengHe's full-sized avatar

Block or report SifengHe

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

Collection of awesome resources on image-to-image translation.

1,185 122 Updated Oct 22, 2024

✨✨Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis

406 12 Updated Jun 18, 2024

Survey on Data-centric Large Language Models

65 Updated Jul 8, 2024

DreamSim: Learning New Dimensions of Human Visual Similarity using Synthetic Data (NeurIPS 2023 Spotlight) / / / / When Does Perceptual Alignment Benefit Vision Representations? (NeurIPS 2024)

Python 392 18 Updated Nov 11, 2024

A one-stop data processing system to make data higher-quality, juicier, and more digestible for (multimodal) LLMs! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷为大模型提供更高质量、更丰富、更易”消化“的数据!

Python 2,930 175 Updated Nov 15, 2024

Open-Sora: Democratizing Efficient Video Production for All

Python 22,243 2,176 Updated Aug 9, 2024

[ICCV 2023] BoxDiff: Text-to-Image Synthesis with Training-Free Box-Constrained Diffusion

Python 251 17 Updated Nov 12, 2024

Official implementation of "Describing Differences in Image Sets with Natural Language" (CVPR 2024 Oral)

Jupyter Notebook 107 12 Updated Apr 8, 2024

Official code for paper "UniIR: Training and Benchmarking Universal Multimodal Information Retrievers" (ECCV 2024)

Python 107 13 Updated Oct 1, 2024

My journey during 10 weeks of building FiftyOne plugins

18 3 Updated Nov 12, 2023

[ICLR'24 Spotlight] Uni3D: 3D Visual Representation from BAAI

Python 489 28 Updated Jan 17, 2024

ICLR2024 Spotlight: curation/training code, metadata, distribution and pre-trained models for MetaCLIP; CVPR 2024: MoDE: CLIP Data Experts via Clustering

Python 1,252 54 Updated Oct 23, 2024

Amundsen is a metadata driven application for improving the productivity of data analysts, data scientists and engineers when interacting with data.

Python 4,441 960 Updated Nov 12, 2024

LANCE: Stress-testing Visual Models by Generating Language-guided Counterfactual Images

Jupyter Notebook 28 1 Updated Nov 30, 2023

A collection of papers on the topic of ``Computer Vision in the Wild (CVinW)''

1,193 58 Updated Mar 14, 2024

🧀 Code and models for the ICML 2023 paper "Grounding Language Models to Images for Multimodal Inputs and Outputs".

Jupyter Notebook 478 35 Updated Oct 30, 2023
Python 134 24 Updated Oct 30, 2023

LangChain 的中文入门教程

7,462 597 Updated Aug 11, 2024

Caption-Anything is a versatile tool combining image segmentation, visual captioning, and ChatGPT, generating tailored captions with diverse controls for user preferences. https://huggingface.co/sp…

Python 1,682 102 Updated Aug 29, 2023

General AI methods for Anything: AnyObject, AnyGeneration, AnyModel, AnyTask, AnyX

1,728 96 Updated Nov 15, 2023

[A toolbox for fun.] Transform Image into Unique Paragraph with ChatGPT, BLIP2, OFA, GRIT, Segment Anything, ControlNet.

Python 790 54 Updated Apr 28, 2023

[ECCV 2024] Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"

Python 6,745 685 Updated Aug 12, 2024

🐟 Code and models for the NeurIPS 2023 paper "Generating Images with Multimodal Language Models".

Jupyter Notebook 430 36 Updated Jan 19, 2024

MultiMAE: Multi-modal Multi-task Masked Autoencoders, ECCV 2022

Python 548 59 Updated Dec 13, 2022

VideoLLM: Modeling Video Sequence with Large Language Models

154 3 Updated Aug 18, 2023

[ACL 2023] Code and data for our paper "Measuring Progress in Fine-grained Vision-and-Language Understanding"

Jupyter Notebook 13 1 Updated Jun 11, 2023

Personalize Segment Anything Model (SAM) with 1 shot in 10 seconds

Python 1,519 104 Updated Jul 22, 2024

[NeurIPS 2023] Official implementation of the paper "Segment Everything Everywhere All at Once"

Python 4,387 401 Updated Aug 19, 2024

Relate Anything Model is capable of taking an image as input and utilizing SAM to identify the corresponding mask within the image.

Python 441 20 Updated Jul 4, 2023
Next