DAMO-NLP-SG / Video-LLaMA Star 2.6k Code Issues Pull requests [EMNLP 2023 Demo] Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding llama large-language-models video-language-pretraining vision-language-pretraining cross-modal-pretraining blip2 minigpt4 multi-modal-chatgpt Updated Jun 4, 2024 Python
JacobYuan7 / RLIP Star 71 Code Issues Pull requests [NeurIPS 2022 Spotlight] RLIP: Relational Language-Image Pre-training and a series of other methods to solve HOI detection and Scene Graph Generation. relation detection-model relation-detection hoi-detection cross-modal-pretraining Updated May 26, 2024 Python