Stars
Official implementation for "Automatic Chain of Thought Prompting in Large Language Models" (stay tuned & more will be updated)
The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
This is the official PyTorch implementation of the paper Open-Vocabulary Semantic Segmentation with Mask-adapted CLIP.
[RSS2024] Official implementation of "Hierarchical Open-Vocabulary 3D Scene Graphs for Language-Grounded Robot Navigation"
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
Generate comic panels using a LLM + SDXL. Powered by Hugging Face 🤗
create any comic page using state-of-the-art text to image and large language models with your limitless imagination
Python implementation of the paper Learning hierarchical relationships for object-goal navigation
ZSON: Zero-Shot Object-Goal Navigation using Multimodal Goal Embeddings. NeurIPS 2022
official implementation for ECCV 2024 paper "Prioritized Semantic Learning for Zero-shot Instance Navigation"
Official GitHub Repository for Paper "Bridging Zero-shot Object Navigation and Foundation Models through Pixel-Guided Navigation Skill", ICRA 2024
Layout-based Causal Inference for Object Navigation (CVPR 2023)
[CVPR 2023] We propose a framework for the challenging 3D-aware ObjectNav based on two straightforward sub-policies. The two sub-polices, namely corner-guided exploration policy and category-aware …
Aligning Knowledge Graph with Visual Perception for Object-goal Navigation (ICRA 2024)
PyTorch implementation of paper: GaussNav: Gaussian Splatting for Visual Navigation
DeepSeek-VL: Towards Real-World Vision-Language Understanding
Chinese and English multimodal conversational language model | 多模态中英双语对话语言模型
Imagine Before Go: Self-Supervised Generative Map for Object Goal Navigation (CVPR2024)
Class activation maps for your PyTorch models (CAM, Grad-CAM, Grad-CAM++, Smooth Grad-CAM++, Score-CAM, SS-CAM, IS-CAM, XGrad-CAM, Layer-CAM)
This is a docker which contain both MatterSim and the BEVBert.
AI Research Platform for Reinforcement Learning from Real Panoramic Images.
Find What You Want: Learning Demand-conditioned Object Attribute Space for Demand-driven Navigation
Official code and checkpoint release for mobile robot foundation models: GNM, ViNT, and NoMaD.
[ICCV23] Bird’s-Eye-View Scene Graph for Vision-Language Navigation
Commonsense Scene Graph-based Target Localization for Object Search
Visual Navigation with Spatial Attention