dmarx

David Marx dmarx

Engineer / Machine Learning Researcher interested in deep learning, probabilistic ML, generative models, multi-modal SSL, visual understanding, geometric

473 followers · 327 following

Stability.ai, Eleuther.ai
Seattle, WA
https://dmarx.github.io
@DigThatData

Achievements

x3 x3

Achievements

x3 x3

Organizations

Stars

Multi-modal

198 repositories

salesforce / ALPRO

Align and Prompt: Video-and-Language Pre-training with Entity Prompts

Python 185 18 Updated Sep 20, 2022

PaddlePaddle / ERNIE

Official implementations for various pre-training models of ERNIE-family, covering topics of Language Understanding & Generation, Multimodal Understanding & Generation, and beyond.

Python 6,282 1,279 Updated Aug 31, 2024

salesforce / BLIP

PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation

Jupyter Notebook 4,613 615 Updated Aug 5, 2024

snap-research / MMVID

[CVPR 2022] Show Me What and Tell Me How: Video Synthesis via Multimodal Conditioning

Python 193 21 Updated Jun 16, 2022

JerryX1110 / awesome-rvos

Referring Video Object Segmentation / Multi-Object Tracking Repo

Python 84 4 Updated Jul 27, 2023

mttr2021 / MTTR

Python 640 69 Updated Mar 4, 2024

facebookresearch / directclr

Code used in "Understanding Dimensional Collapse in Contrastive Self-supervised Learning" paper.

Python 74 6 Updated Sep 6, 2022

HFAiLab / clip-gen

CLIP-GEN: Language-Free Training of a Text-to-Image Generator with CLIP

Python 130 13 Updated Jun 10, 2022

navervision / KELIP

Official PyTorch implementation of "Large-scale Bilingual Language-Image Contrastive Learning" (ICLRW 2022)

Jupyter Notebook 94 11 Updated Apr 13, 2022

kakaobrain / mindall-e

PyTorch implementation of a 1.3B text-to-image generation model trained on 14 million image-text pairs

Python 629 66 Updated Aug 9, 2022

MIMICLab / L-Verse

L-Verse: Bidirectional Generation Between Image and Text

Python 109 6 Updated Nov 15, 2022

tensorflow / tpu

Reference models and tools for Cloud TPUs.

Jupyter Notebook 5,210 1,772 Updated Aug 29, 2024

hila-chefer / Transformer-MM-Explainability

[ICCV 2021- Oral] Official PyTorch implementation for Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers, a novel method to visualize any Transformer-…

Jupyter Notebook 772 106 Updated Aug 24, 2023

JCBrouwer / maua-stylegan2

This is the repo for my experiments with StyleGAN2. There are many like it, but this one is mine. Contains code for the paper Audio-reactive Latent Interpolations with StyleGAN.

Python 179 29 Updated Jun 26, 2021

j-min / DallEval

DALL-Eval: Probing the Reasoning Skills and Social Biases of Text-to-Image Generation Models (ICCV 2023)

Jupyter Notebook 137 6 Updated Nov 27, 2023

Aleph-Alpha / magma

MAGMA - a GPT-style multimodal model that can understand any combination of images and language. NOTE: The freely available model from this repo is only a demo. For the latest multimodal and multil…

Python 475 55 Updated Mar 20, 2023

j-min / VL-T5

PyTorch code for "Unifying Vision-and-Language Tasks via Text Generation" (ICML 2021)

Python 357 56 Updated Jul 29, 2023

henghuiding / Vision-Language-Transformer

[ICCV2021 & TPAMI2023] Vision-Language Transformer and Query Generation for Referring Segmentation

Python 338 21 Updated Jan 7, 2022

facebookresearch / SLIP

Code release for SLIP Self-supervision meets Language-Image Pre-training

Python 738 67 Updated Feb 9, 2023

openai / CLIP

CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image

Jupyter Notebook 24,565 3,201 Updated Jul 23, 2024

lucidrains / x-clip

A concise but complete implementation of CLIP with various experimental improvements from recent papers

Python 675 46 Updated Oct 16, 2023

galatolofederico / clip-glass

Repository for "Generating images from caption and vice versa via CLIP-Guided Generative Latent Space Search"

Python 179 23 Updated Sep 30, 2021

salesforce / PB-OVD

A pytorch Implementation of Open Vocabulary Object Detection with Pseudo Bounding-Box Labels

Python 55 6 Updated Mar 27, 2023

hbdat / cvpr22_cross_modal_pseudo_labeling

Open-Vocabulary Instance Segmentation via Robust Cross-Modal Pseudo-Labeling @ CVPR22

Python 42 6 Updated Oct 10, 2022

ChenRocks / UNITER

Research code for ECCV 2020 paper "UNITER: UNiversal Image-TExt Representation Learning"

Python 777 109 Updated Jun 30, 2021

THUDM / CogView2

official code repo for paper "CogView2: Faster and Better Text-to-Image Generation via Hierarchical Transformers"

Python 942 79 Updated Aug 3, 2022

google-research-datasets / wit

WIT (Wikipedia-based Image Text) Dataset is a large multimodal multilingual dataset comprising 37M+ image-text sets with 11M+ unique images across 100+ languages.

989 40 Updated Jul 12, 2024

omriav / blended-diffusion

Official implementation for "Blended Diffusion for Text-driven Editing of Natural Images" [CVPR 2022]

Jupyter Notebook 546 40 Updated Jun 4, 2024

CasualGANPapers / Make-A-Scene

Pytorch implementation of Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors

Python 330 19 Updated Aug 9, 2022

tobran / DF-GAN

[CVPR2022 oral] A Simple and Effective Baseline for Text-to-Image Synthesis

Python 293 68 Updated Mar 4, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

David Marx dmarx

Achievements

Achievements

Organizations

Block or report dmarx

Multi-modal

salesforce / ALPRO

PaddlePaddle / ERNIE

salesforce / BLIP

snap-research / MMVID

JerryX1110 / awesome-rvos

mttr2021 / MTTR

facebookresearch / directclr

HFAiLab / clip-gen

navervision / KELIP

kakaobrain / mindall-e

MIMICLab / L-Verse

tensorflow / tpu

hila-chefer / Transformer-MM-Explainability

JCBrouwer / maua-stylegan2

j-min / DallEval

Aleph-Alpha / magma

j-min / VL-T5

henghuiding / Vision-Language-Transformer

facebookresearch / SLIP

openai / CLIP

lucidrains / x-clip

galatolofederico / clip-glass

salesforce / PB-OVD

hbdat / cvpr22_cross_modal_pseudo_labeling

ChenRocks / UNITER

THUDM / CogView2

google-research-datasets / wit

omriav / blended-diffusion

CasualGANPapers / Make-A-Scene

tobran / DF-GAN