mmaaz60

😀

Muhammad Maaz mmaaz60

😀

An Electrical Engineer with experience in Computer Vision software development. Skilled in Machine Learning, Deep Learning and Computer Vision.

128 followers · 4 following

Achievements

x2 x2

Achievements

x2 x2

Organizations

Lists (1)

Sort

🔮 Future ideas

1 repository

Beta Lists are currently in beta. Share feedback and report bugs.

Stars

facebookresearch / segment-anything-2

The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…

Jupyter Notebook 10,450 824 Updated Aug 21, 2024

Amshaker / GroupMamba

Official implementation of paper titled "GroupMamba: Parameter-Efficient and Accurate Group Visual State Space Model"

Python 56 3 Updated Jul 19, 2024

mbzuai-oryx / VideoGPT-plus

Official Repository of paper VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding

Python 183 11 Updated Aug 11, 2024

mbzuai-oryx / LLaVA-pp

🔥🔥 LLaVA++: Extending LLaVA with Phi-3 and LLaMA-3 (LLaVA LLaMA-3, LLaVA Phi-3)

Python 789 57 Updated Jul 10, 2024

TencentARC / ST-LLM

[ECCV 2024🔥] Official implementation of the paper "ST-LLM: Large Language Models Are Effective Temporal Learners"

Python 99 3 Updated Apr 24, 2024

BioMedIA-MBZUAI / MedPromptX

Jupyter Notebook 54 1 Updated Aug 27, 2024

OmkarThawakar / composed-video-retrieval

Composed Video Retrieval

Python 41 Updated May 2, 2024

Amshaker / MAVOS

[WACV 2025] Efficient Video Object Segmentation via Modulated Cross-Attention Memory

Python 45 2 Updated Sep 3, 2024

mbzuai-oryx / MobiLlama

MobiLlama : Small Language Model tailored for edge devices

Python 579 42 Updated Mar 3, 2024

mbzuai-oryx / PALO

(WACV 2025) Vision-language conversation in 10 languages including English, Chinese, French, Spanish, Russian, Japanese, Arabic, Hindi, Bengali and Urdu.

Python 77 5 Updated Sep 3, 2024

TRI-ML / vlm-evaluation

VLM Evaluation: Benchmark for VLMs, spanning text generation tasks from VQA to Captioning

Python 73 10 Updated Sep 5, 2024

UMass-Foundation-Model / MultiPLY

Code for MultiPLY: A Multisensory Object-Centric Embodied Large Language Model in 3D World

Python 115 6 Updated Mar 17, 2024

mbzuai-oryx / GeoChat

[CVPR 2024 🔥] GeoChat, the first grounded Large Vision Language Model for Remote Sensing

Python 404 29 Updated Jul 25, 2024

mbzuai-oryx / Awesome-CV-Foundational-Models

Forked from awaisrauf/Awesome-CV-Foundational-Models

7 Updated Jul 31, 2023

mbzuai-oryx / Video-LLaVA

PG-Video-LLaVA: Pixel Grounding in Large Multimodal Video Models

Python 233 11 Updated Jan 2, 2024

jameelhassan / PromptAlign

[NeurIPS 2023] Align Your Prompts: Test-Time Prompting with Distribution Alignment for Zero-Shot Generalization

Python 91 10 Updated Feb 11, 2024

akhtarvision / cal-detr

Python 37 5 Updated Nov 9, 2023

mbzuai-oryx / groundingLMM

[CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses that are seamlessly integrated with object segmentation masks.

Python 739 37 Updated Jun 2, 2024

hananshafi / llmblueprint

[ICLR 2024] Official code for the paper "LLM Blueprint: Enabling Text-to-Image Generation with Complex and Detailed Prompts"

Jupyter Notebook 65 2 Updated May 18, 2024

magic-research / bubogpt

BuboGPT: Enabling Visual Grounding in Multi-Modal LLMs

Python 494 33 Updated Jul 21, 2023

rese1f / MovieChat

[CVPR 2024] 🎬💭 chat with over 10K frames of video!

Python 488 39 Updated Sep 6, 2024

marslanm / Multimodality-Representation-Learning

This repository provides a comprehensive collection of research papers focused on multimodal representation learning, all of which have been cited and discussed in the survey just accepted https://…

65 7 Updated Oct 19, 2023

asif-hanif / vafa

[MICCAI 2023] Official code repository of paper titled "Frequency Domain Adversarial Training for Robust Volumetric Medical Segmentation" accepted in MICCAI 2023 conference.

Python 48 Updated Nov 14, 2023

muzairkhattak / PromptSRC

[ICCV'23 Main Track, WECIA'23 Oral] Official repository of paper titled "Self-regulating Prompts: Foundational Model Adaptation without Forgetting".

Python 216 8 Updated Sep 28, 2023

mbzuai-oryx / ClimateGPT

[EMNLP'23] ClimateGPT: a specialized LLM for conversations related to Climate Change and Sustainability topics in both English and Arabic languages.

Python 73 9 Updated Jan 30, 2024

mbzuai-oryx / XrayGPT

[BIONLP@ACL 2024] XrayGPT: Chest Radiographs Summarization using Medical Vision-Language Models.

Python 459 52 Updated Aug 8, 2024

mbzuai-oryx / Video-ChatGPT

[ACL 2024 🔥] Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted fo…

Python 1,145 98 Updated Aug 27, 2024

Vision-CAIR / MiniGPT-4

Open-sourced codes for MiniGPT-4 and MiniGPT-v2 (https://minigpt-4.github.io, https://minigpt-v2.github.io/)

Python 25,285 2,901 Updated Sep 2, 2024

amazon-science / prompt-pretraining

Official implementation for the paper "Prompt Pre-Training with Over Twenty-Thousand Classes for Open-Vocabulary Visual Recognition"

Python 251 8 Updated May 3, 2024

Amshaker / SwiftFormer

[ICCV'23] Official repository of paper SwiftFormer: Efficient Additive Attention for Transformer-based Real-time Mobile Vision Applications

Python 241 25 Updated Jan 12, 2024