Skip to content
View mmaaz60's full-sized avatar
😀
😀

Organizations

@mbzuai-oryx

Block or report mmaaz60

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Beta Lists are currently in beta. Share feedback and report bugs.
Showing results

The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…

Jupyter Notebook 10,450 824 Updated Aug 21, 2024

Official implementation of paper titled "GroupMamba: Parameter-Efficient and Accurate Group Visual State Space Model"

Python 56 3 Updated Jul 19, 2024

Official Repository of paper VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding

Python 183 11 Updated Aug 11, 2024

🔥🔥 LLaVA++: Extending LLaVA with Phi-3 and LLaMA-3 (LLaVA LLaMA-3, LLaVA Phi-3)

Python 789 57 Updated Jul 10, 2024

[ECCV 2024🔥] Official implementation of the paper "ST-LLM: Large Language Models Are Effective Temporal Learners"

Python 99 3 Updated Apr 24, 2024
Jupyter Notebook 54 1 Updated Aug 27, 2024

Composed Video Retrieval

Python 41 Updated May 2, 2024

[WACV 2025] Efficient Video Object Segmentation via Modulated Cross-Attention Memory

Python 45 2 Updated Sep 3, 2024

MobiLlama : Small Language Model tailored for edge devices

Python 579 42 Updated Mar 3, 2024

(WACV 2025) Vision-language conversation in 10 languages including English, Chinese, French, Spanish, Russian, Japanese, Arabic, Hindi, Bengali and Urdu.

Python 77 5 Updated Sep 3, 2024

VLM Evaluation: Benchmark for VLMs, spanning text generation tasks from VQA to Captioning

Python 73 10 Updated Sep 5, 2024

Code for MultiPLY: A Multisensory Object-Centric Embodied Large Language Model in 3D World

Python 115 6 Updated Mar 17, 2024

[CVPR 2024 🔥] GeoChat, the first grounded Large Vision Language Model for Remote Sensing

Python 404 29 Updated Jul 25, 2024

PG-Video-LLaVA: Pixel Grounding in Large Multimodal Video Models

Python 233 11 Updated Jan 2, 2024

[NeurIPS 2023] Align Your Prompts: Test-Time Prompting with Distribution Alignment for Zero-Shot Generalization

Python 91 10 Updated Feb 11, 2024
Python 37 5 Updated Nov 9, 2023

[CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses that are seamlessly integrated with object segmentation masks.

Python 739 37 Updated Jun 2, 2024

[ICLR 2024] Official code for the paper "LLM Blueprint: Enabling Text-to-Image Generation with Complex and Detailed Prompts"

Jupyter Notebook 65 2 Updated May 18, 2024

BuboGPT: Enabling Visual Grounding in Multi-Modal LLMs

Python 494 33 Updated Jul 21, 2023

[CVPR 2024] 🎬💭 chat with over 10K frames of video!

Python 488 39 Updated Sep 6, 2024

This repository provides a comprehensive collection of research papers focused on multimodal representation learning, all of which have been cited and discussed in the survey just accepted https://…

65 7 Updated Oct 19, 2023

[MICCAI 2023] Official code repository of paper titled "Frequency Domain Adversarial Training for Robust Volumetric Medical Segmentation" accepted in MICCAI 2023 conference.

Python 48 Updated Nov 14, 2023

[ICCV'23 Main Track, WECIA'23 Oral] Official repository of paper titled "Self-regulating Prompts: Foundational Model Adaptation without Forgetting".

Python 216 8 Updated Sep 28, 2023

[EMNLP'23] ClimateGPT: a specialized LLM for conversations related to Climate Change and Sustainability topics in both English and Arabic languages.

Python 73 9 Updated Jan 30, 2024

[BIONLP@ACL 2024] XrayGPT: Chest Radiographs Summarization using Medical Vision-Language Models.

Python 459 52 Updated Aug 8, 2024

[ACL 2024 🔥] Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted fo…

Python 1,145 98 Updated Aug 27, 2024

Open-sourced codes for MiniGPT-4 and MiniGPT-v2 (https://minigpt-4.github.io, https://minigpt-v2.github.io/)

Python 25,285 2,901 Updated Sep 2, 2024

Official implementation for the paper "Prompt Pre-Training with Over Twenty-Thousand Classes for Open-Vocabulary Visual Recognition"

Python 251 8 Updated May 3, 2024

[ICCV'23] Official repository of paper SwiftFormer: Efficient Additive Attention for Transformer-based Real-time Mobile Vision Applications

Python 241 25 Updated Jan 12, 2024
Next