text-only training or language-free training for multimodal tasks (image/audio/video caption, retrieval, text2image)
awesome
image-captioning
zero-shot
video-captioning
text2image
audio-captioning
composed-image-retrieval
text-only-supervision
text-only-training
language-free-training
-
Updated
Oct 15, 2024