Skip to content
View dmarx's full-sized avatar

Organizations

@pytti-tools

Block or report dmarx

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Stars

Multi-modal

198 repositories

Align and Prompt: Video-and-Language Pre-training with Entity Prompts

Python 185 18 Updated Sep 20, 2022

Official implementations for various pre-training models of ERNIE-family, covering topics of Language Understanding & Generation, Multimodal Understanding & Generation, and beyond.

Python 6,282 1,279 Updated Aug 31, 2024

PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation

Jupyter Notebook 4,613 615 Updated Aug 5, 2024

[CVPR 2022] Show Me What and Tell Me How: Video Synthesis via Multimodal Conditioning

Python 193 21 Updated Jun 16, 2022

Referring Video Object Segmentation / Multi-Object Tracking Repo

Python 84 4 Updated Jul 27, 2023
Python 640 69 Updated Mar 4, 2024

Code used in "Understanding Dimensional Collapse in Contrastive Self-supervised Learning" paper.

Python 74 6 Updated Sep 6, 2022

CLIP-GEN: Language-Free Training of a Text-to-Image Generator with CLIP

Python 130 13 Updated Jun 10, 2022

Official PyTorch implementation of "Large-scale Bilingual Language-Image Contrastive Learning" (ICLRW 2022)

Jupyter Notebook 94 11 Updated Apr 13, 2022

PyTorch implementation of a 1.3B text-to-image generation model trained on 14 million image-text pairs

Python 629 66 Updated Aug 9, 2022

L-Verse: Bidirectional Generation Between Image and Text

Python 109 6 Updated Nov 15, 2022

Reference models and tools for Cloud TPUs.

Jupyter Notebook 5,210 1,772 Updated Aug 29, 2024

[ICCV 2021- Oral] Official PyTorch implementation for Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers, a novel method to visualize any Transformer-…

Jupyter Notebook 772 106 Updated Aug 24, 2023

This is the repo for my experiments with StyleGAN2. There are many like it, but this one is mine. Contains code for the paper Audio-reactive Latent Interpolations with StyleGAN.

Python 179 29 Updated Jun 26, 2021

DALL-Eval: Probing the Reasoning Skills and Social Biases of Text-to-Image Generation Models (ICCV 2023)

Jupyter Notebook 137 6 Updated Nov 27, 2023

MAGMA - a GPT-style multimodal model that can understand any combination of images and language. NOTE: The freely available model from this repo is only a demo. For the latest multimodal and multil…

Python 475 55 Updated Mar 20, 2023

PyTorch code for "Unifying Vision-and-Language Tasks via Text Generation" (ICML 2021)

Python 357 56 Updated Jul 29, 2023

[ICCV2021 & TPAMI2023] Vision-Language Transformer and Query Generation for Referring Segmentation

Python 338 21 Updated Jan 7, 2022

Code release for SLIP Self-supervision meets Language-Image Pre-training

Python 738 67 Updated Feb 9, 2023

CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image

Jupyter Notebook 24,565 3,201 Updated Jul 23, 2024

A concise but complete implementation of CLIP with various experimental improvements from recent papers

Python 675 46 Updated Oct 16, 2023

Repository for "Generating images from caption and vice versa via CLIP-Guided Generative Latent Space Search"

Python 179 23 Updated Sep 30, 2021

A pytorch Implementation of Open Vocabulary Object Detection with Pseudo Bounding-Box Labels

Python 55 6 Updated Mar 27, 2023

Open-Vocabulary Instance Segmentation via Robust Cross-Modal Pseudo-Labeling @ CVPR22

Python 42 6 Updated Oct 10, 2022

Research code for ECCV 2020 paper "UNITER: UNiversal Image-TExt Representation Learning"

Python 777 109 Updated Jun 30, 2021

official code repo for paper "CogView2: Faster and Better Text-to-Image Generation via Hierarchical Transformers"

Python 942 79 Updated Aug 3, 2022

WIT (Wikipedia-based Image Text) Dataset is a large multimodal multilingual dataset comprising 37M+ image-text sets with 11M+ unique images across 100+ languages.

989 40 Updated Jul 12, 2024

Official implementation for "Blended Diffusion for Text-driven Editing of Natural Images" [CVPR 2022]

Jupyter Notebook 546 40 Updated Jun 4, 2024

Pytorch implementation of Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors

Python 330 19 Updated Aug 9, 2022

[CVPR2022 oral] A Simple and Effective Baseline for Text-to-Image Synthesis

Python 293 68 Updated Mar 4, 2023