Skip to content
View jwyang's full-sized avatar
🏠
🏠

Organizations

@microsoft

Block or report jwyang

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Matryoshka Multimodal Models

Python 67 4 Updated Aug 22, 2024
Python 31 1 Updated Jun 17, 2024

[COLM-2024] List Items One by One: A New Data Source and Learning Paradigm for Multimodal LLMs

Python 112 2 Updated Aug 23, 2024

Reaching LLaMA2 Performance with 0.1M Dollars

Python 955 77 Updated Jul 23, 2024

Memory optimization and training recipes to extrapolate language models' context length to 1 million tokens, with minimal hardware.

Python 604 41 Updated Jul 26, 2024

[CVPR2024 Highlight]GLEE: General Object Foundation Model for Images and Videos at Scale

Python 1,025 82 Updated Aug 8, 2024
Python 556 27 Updated Feb 15, 2024
Python 336 13 Updated Jul 29, 2024

streamline the fine-tuning process for multimodal models: PaliGemma, Florence-2, Phi-3.5 Vision

Python 1,228 89 Updated Sep 12, 2024

[CVPR 2024] Official implementation of the paper "Visual In-context Learning"

Python 363 15 Updated Apr 8, 2024

Browse the web with GPT-4V and Vimium

Python 2,594 197 Updated Aug 10, 2024

Must-have resource for anyone who wants to experiment with and build on the OpenAI vision API 🔥

Python 1,626 126 Updated Feb 22, 2024

AI agent using GPT-4V(ision) capable of using a mouse/keyboard to interact with web UI

JavaScript 951 86 Updated Jan 31, 2024

Set-of-Mark Prompting for GPT-4V and LMMs

Python 1,093 85 Updated Aug 19, 2024

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 26,701 3,911 Updated Sep 14, 2024

[CVPR 2023] Official Implementation of X-Decoder for generalized decoding for pixel, image and language

Python 1,282 132 Updated Oct 5, 2023

Official repository for "Video-FocalNets: Spatio-Temporal Focal Modulation for Video Action Recognition" [ICCV 2023]

Python 84 16 Updated Apr 30, 2024

[ECCV 2024] Official implementation of the paper "Semantic-SAM: Segment and Recognize Anything at Any Granularity"

Python 2,254 107 Updated Jul 19, 2024

Official PyTorch implementation of the paper "In-Context Learning Unlocked for Diffusion Models"

Python 370 9 Updated Mar 25, 2024

[NeurIPS 2023] Official implementation of the paper "Segment Everything Everywhere All at Once"

Python 4,308 382 Updated Aug 19, 2024

arXiv LaTeX Cleaner: Easily clean the LaTeX code of your paper to submit to arXiv

Python 5,188 325 Updated Jul 21, 2024

[ICCV 2023] Official implementation of the paper "A Simple Framework for Open-Vocabulary Segmentation and Detection"

Python 637 39 Updated Jan 22, 2024

[ICLR'23 Spotlight🔥] The first successful BERT/MAE-style pretraining on any convolutional network; Pytorch impl. of "Designing BERT for Convolutional Networks: Sparse and Hierarchical Masked Modeling"

Python 1,421 82 Updated Jan 23, 2024

Code base for MinD-Vis

Python 743 91 Updated May 24, 2023

Focal-Unet: Unet-like Focal Modulation for Medical Image Segmentation

Python 39 8 Updated May 27, 2023

This repo contains the code and configuration files for reproducing object detection results of FocalNets with DINO

Python 64 10 Updated Mar 10, 2023

detrex is a research platform for DETR-based object detection, segmentation, pose estimation and other visual recognition tasks.

Python 1,969 206 Updated Aug 15, 2024

[CVPR 2022] Official code for "RegionCLIP: Region-based Language-Image Pretraining"

Python 696 51 Updated Mar 20, 2024

[ICLR 2023] Official implementation of the paper "DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection"

Python 2,163 234 Updated Jul 31, 2024
Next