multimodal-deep-learning

CVPR'24, Official Codebase of our Paper: "Let's Think Outside the Box: Exploring Leap-of-Thought in Large Language Models with Creative Humor Generation".

association multimodal-deep-learning humor-generation large-language-models leap-of-thought

Updated Apr 13, 2024
Python

MILVLG / prophet

Star

Implementation of CVPR 2023 paper "Prompting Large Language Models with Answer Heuristics for Knowledge-based Visual Question Answering".

pytorch visual-question-answering multimodal-deep-learning gpt-3 prompt-engineering okvqa a-okvqa

Updated May 23, 2023
Python

drprojects / DeepViewAgg

Star

[CVPR'22 Best Paper Finalist] Official PyTorch implementation of the method presented in "Learning Multi-View Aggregation In the Wild for Large-Scale 3D Semantic Segmentation"

image deep-learning point-cloud pytorch attention semantic-segmentation cvpr point-cloud-segmentation multimodal multimodal-deep-learning multi-view pytorch-geometric s3dis torch-points3d kitti-360 cvpr2022

Updated Aug 21, 2023
Python

YuanGongND / cav-mae

Star

Code and Pretrained Models for ICLR 2023 Paper "Contrastive Audio-Visual Masked Autoencoder".

audio computer-vision audio-processing multimodal multimodal-deep-learning

Updated Mar 20, 2024
Python

DavidHuji / CapDec

Star

CapDec: SOTA Zero Shot Image Captioning Using CLIP and GPT2, EMNLP 2022 (findings)

clip zero-shot-learning captioning multimodal-deep-learning gpt-2 clipcap

Updated Jan 28, 2024
Python

This repository contains the code for a video captioning system inspired by Sequence to Sequence -- Video to Text. This system takes as input a video and generates a caption in English describing the video.

tensorflow seq2seq sequence-to-sequence video-captioning s2vt multimodal-deep-learning

Updated Oct 12, 2019
Python

declare-lab / Multimodal-Infomax

Star

This repository contains the official implementation code of the paper Improving Multimodal Fusion with Hierarchical Mutual Information Maximization for Multimodal Sentiment Analysis, accepted at EMNLP 2021.

multimodal-sentiment-analysis multimodal-deep-learning multimodal-fusion

Updated Mar 14, 2023
Python

florencejt / fusilli

Star

A Python package housing a collection of deep-learning multi-modal data fusion method pipelines! From data loading, to training, to evaluation - fusilli's got you covered 🌸

machine-learning cnn pytorch attention-mechanism imaging multimodality multivariate-analysis variational-autoencoder data-fusion multimodal multimodal-deep-learning multi-view-learning multi-view graph-neural-network pytorch-lightning

Updated Jun 3, 2024
Python

kyegomez / NaViT

Sponsor

Star

My implementation of "Patch n’ Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution"

vit attention-mechanism clip multimodality multimodal-learning multimodal multimodal-deep-learning gpt4

Updated Jun 17, 2024
Python

kyegomez / the-compiler

Sponsor

Star

Seed, Code, Harvest: Grow Your Own App with Tree of Thoughts!

reinforcement-learning deep-learning deep-learning-algorithms artficial-intelligence agora multimodal-deep-learning multi-modality multi-modal-fusion prompt-engineering chain-of-thought chatgpt autogpt tree-of-thoughts

Updated Jul 27, 2023
Python

LeapLabTHU / Pseudo-Q

Star

[CVPR 2022] Pseudo-Q: Generating Pseudo Language Queries for Visual Grounding

computer-vision deep-learning pytorch vision-and-language multimodal-deep-learning visual-grounding cvpr2022

Updated Jul 13, 2024
Python

cap-ntu / Video-to-Retail-Platform

Star

An intelligent multimodal-learning based system for video, product and ads analysis. Based on the system, people can build a lot of downstream applications such as product recommendation, video retrieval, etc.

machine-learning deep-neural-networks deep-learning multimedia network-server multimodal-deep-learning ai-system