Title | Link |
---|---|
How to Grow a Mind: Statistics, Structure, and Abstraction | Paper |
DreamCoder: Growing generalizable, interpretable knowledge with wake-sleep Bayesian program learning | Paper |
Human-level concept learning through probabilistic program induction | Paper |
Compositional generalization through meta sequence-to-sequence learning | Paper |
Human-link systematic generalization through a meta-learning neural network | Paper |
Neural Discrete Representation Learning | VQ-VAE |
Generating Diverse High-Fidelity Images with VQ-VAE-2 | Paper |
Hierarchical Quantized Autoencoders | Paper |
Differentiable Graph Module (DGM) for Graph Convolutional Networks | Paper |
Compositional generalization through abstract representations in human and artificial neural networks | Paper |
Graph Attention Networks | Paper |
Graph Transformer Networks | Paper |
Systematicity Emerges in Transformers when Abstract Grammatical Roles Guide Attention | Paper |
Inducing Systematicity in Transformers by Attending to Structurally Quantized Embeddings | Paper |
Title | Link |
---|---|
Yell At Your Robot Improving On-the-Fly from Language Corrections | Paper |
RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control | Paper |
A Joint Modeling of Vision-Language-Action for Target-oriented Grasping in Clutter | Paper |
ADAPT: Vision-Language Navigation with Modality-Aligned Action Prompts | Paper |
RT-H: Action Hierarchies Using Language | Paper |
RT-1: Robotics Transformer for Real-World Control at Scale | Paper |
Gesture-Informed Robot Assistance via Foundation Models | Paper |
Robots that ask for help: Uncertainty Alignment for Large Language Model Planners | Paper |
Language-Driven Representation Learning for Robotics | Paper |
PRISE: Learning Temporal Action Abstractions as a Sequence Compression Problem | Paper |
Scaling Instructable Agents Across Many Simulated Worlds | Paper |
PaLM-E: An Embodied Multimodal Language Model | Paper |
PIVOT: Iterative Visual Prompting Elicits Actionable Knowledge for VLMs | Paper |
Vision-Language Models Provide Promptable Representations for Reinforcement Learning | Paper |
MOKA: Open-Vocabulary Robotic Manipulation through Mark-Based Visual Prompting | Paper |
Paper | |
Interactive Language: Talking to Robots in Real Time | Paper |
Language Conditioned Imitation Learning over Unstructured Data | Paper |
Language-Conditioned Path Planning | Paper |
RoboVQA: Multimodal Long-Horizon Reasoning for Robotics | Paper |
AutoRT: Embodied Foundation Models for Large Scale Orchestration of Robotic Agents | Paper |
Look Before You Leap: Unveiling the Power of | |
GPT-4V in Robotic Vision-Language Planning | Paper |
Interactive Task Planning with Language Models | Paper |
Video Language Planning | Paper |
Large Langueage Models are Visual Reasoning Coordinators | Paper |
Learning Universal Policies via Text-Guided Video Generation | Paper |
Video as the New Language for Real-World Decision Making | Paper |
Title | Link |
---|---|
Do As I Can, Not As I Say: Grounding Language in Robotic Affordances | Paper |
Physically Ground Vision-Language Models for Robotic Manipulation | Paper |
Grounded Decoding: Guiding Text Generation with Grounded Models for Embodied Agents | Paper |
LLaVA-Grounding: Grounded Visual Chat with Large Multimodal Models | Paper |
ManipVQA: Injecting Robotic Affordance and Physically Grounded Information into Multi-Modal Large Language Models | Paper |
Title | Link |
---|---|
Pushing the Limits of Cross-Embodiment Learning for Manipulation and Navigation | Paper |
Goal Representations for Instruction Following: A Semi-Supervised Language Interface to Control | Paper |
Hierarchical Reinforcement Learning in | |
Complex 3D Environments | Paper |
Genie: Generative Interactive Environments | Paper |
Large Language Models Can Implement Policy Iteration | Paper |
Efficient Data Collection for Robotic Manipulation via Compositional Generalization | Paper |
Robots That Can See: Leveraging Human Pose for Trajectory Prediction | Paper |
Mitigating Spurious Correlations in Multi-modal Models during Fine-tunning | Paper |
Navigation with Large Language Models: Semantic Guesswork as a Heuristic for Planning | Paper |
Gradient-based Planning with World Models | Paper |
A Path Towards Autonomous Machine Intelligence | Paper |
https://arxiv.org/abs/2403.17844
Title | Link |
---|---|
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training | Paper |
MM-LLms. Recent Advances in MultiModal Large Language Models | Paper |
Gemini: A Family of Highly Capable Multimodal Models | Paper |
Title | Link |
---|---|
Open X-Embodiment: Robotic Learning Datasets and RT-X Models | Paper |
DROID: A Large-Scale In-The-Wild Robot Manipulation Dataset | Paper |
DataCOMP: In search of the next generation of multimodal datasets | Paper |
Multimodal Algorithmic Reasoning Workshop (CVPR2024). Challenge: vision-and-language reasoning on abstract visual puzzles. SMART-101 dataset | 1. Workshop 2. Dataset |
Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives | Paper |
https://www.youtube.com/watch?v=akDSG9FsoCk&ab_channel=YunzhuLi