-
Apple
- Beijing
- https://www.linkedin.com/in/sifeng-he-969230134/
Starred repositories
Collection of awesome resources on image-to-image translation.
✨✨Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis
Survey on Data-centric Large Language Models
DreamSim: Learning New Dimensions of Human Visual Similarity using Synthetic Data (NeurIPS 2023 Spotlight) / / / / When Does Perceptual Alignment Benefit Vision Representations? (NeurIPS 2024)
A one-stop data processing system to make data higher-quality, juicier, and more digestible for (multimodal) LLMs! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷为大模型提供更高质量、更丰富、更易”消化“的数据!
Open-Sora: Democratizing Efficient Video Production for All
[ICCV 2023] BoxDiff: Text-to-Image Synthesis with Training-Free Box-Constrained Diffusion
Official implementation of "Describing Differences in Image Sets with Natural Language" (CVPR 2024 Oral)
Official code for paper "UniIR: Training and Benchmarking Universal Multimodal Information Retrievers" (ECCV 2024)
My journey during 10 weeks of building FiftyOne plugins
[ICLR'24 Spotlight] Uni3D: 3D Visual Representation from BAAI
ICLR2024 Spotlight: curation/training code, metadata, distribution and pre-trained models for MetaCLIP; CVPR 2024: MoDE: CLIP Data Experts via Clustering
Amundsen is a metadata driven application for improving the productivity of data analysts, data scientists and engineers when interacting with data.
LANCE: Stress-testing Visual Models by Generating Language-guided Counterfactual Images
A collection of papers on the topic of ``Computer Vision in the Wild (CVinW)''
🧀 Code and models for the ICML 2023 paper "Grounding Language Models to Images for Multimodal Inputs and Outputs".
LangChain 的中文入门教程
Caption-Anything is a versatile tool combining image segmentation, visual captioning, and ChatGPT, generating tailored captions with diverse controls for user preferences. https://huggingface.co/sp…
General AI methods for Anything: AnyObject, AnyGeneration, AnyModel, AnyTask, AnyX
[A toolbox for fun.] Transform Image into Unique Paragraph with ChatGPT, BLIP2, OFA, GRIT, Segment Anything, ControlNet.
[ECCV 2024] Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"
🐟 Code and models for the NeurIPS 2023 paper "Generating Images with Multimodal Language Models".
MultiMAE: Multi-modal Multi-task Masked Autoencoders, ECCV 2022
VideoLLM: Modeling Video Sequence with Large Language Models
[ACL 2023] Code and data for our paper "Measuring Progress in Fine-grained Vision-and-Language Understanding"
Personalize Segment Anything Model (SAM) with 1 shot in 10 seconds
[NeurIPS 2023] Official implementation of the paper "Segment Everything Everywhere All at Once"
Relate Anything Model is capable of taking an image as input and utilizing SAM to identify the corresponding mask within the image.