Skip to content

gepizar/vla-papers

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 

Repository files navigation

Compositional Learning

Title Link
How to Grow a Mind: Statistics, Structure, and Abstraction Paper
DreamCoder: Growing generalizable, interpretable knowledge with wake-sleep Bayesian program learning Paper
Human-level concept learning through probabilistic program induction Paper
Compositional generalization through meta sequence-to-sequence learning Paper
Human-link systematic generalization through a meta-learning neural network Paper
Neural Discrete Representation Learning VQ-VAE
Generating Diverse High-Fidelity Images with VQ-VAE-2 Paper
Hierarchical Quantized Autoencoders Paper
Differentiable Graph Module (DGM) for Graph Convolutional Networks Paper
Compositional generalization through abstract representations in human and artificial neural networks Paper
Graph Attention Networks Paper
Graph Transformer Networks Paper
Systematicity Emerges in Transformers when Abstract Grammatical Roles Guide Attention Paper
Inducing Systematicity in Transformers by Attending to Structurally Quantized Embeddings Paper

vla-papers

Title Link
Yell At Your Robot Improving On-the-Fly from Language Corrections Paper
RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control Paper
A Joint Modeling of Vision-Language-Action for Target-oriented Grasping in Clutter Paper
ADAPT: Vision-Language Navigation with Modality-Aligned Action Prompts Paper
RT-H: Action Hierarchies Using Language Paper
RT-1: Robotics Transformer for Real-World Control at Scale Paper
Gesture-Informed Robot Assistance via Foundation Models Paper
Robots that ask for help: Uncertainty Alignment for Large Language Model Planners Paper
Language-Driven Representation Learning for Robotics Paper
PRISE: Learning Temporal Action Abstractions as a Sequence Compression Problem Paper
Scaling Instructable Agents Across Many Simulated Worlds Paper
PaLM-E: An Embodied Multimodal Language Model Paper
PIVOT: Iterative Visual Prompting Elicits Actionable Knowledge for VLMs Paper
Vision-Language Models Provide Promptable Representations for Reinforcement Learning Paper
MOKA: Open-Vocabulary Robotic Manipulation through Mark-Based Visual Prompting Paper
Paper
Interactive Language: Talking to Robots in Real Time Paper
Language Conditioned Imitation Learning over Unstructured Data Paper
Language-Conditioned Path Planning Paper
RoboVQA: Multimodal Long-Horizon Reasoning for Robotics Paper
AutoRT: Embodied Foundation Models for Large Scale Orchestration of Robotic Agents Paper
Look Before You Leap: Unveiling the Power of
GPT-4V in Robotic Vision-Language Planning Paper
Interactive Task Planning with Language Models Paper
Video Language Planning Paper
Large Langueage Models are Visual Reasoning Coordinators Paper
Learning Universal Policies via Text-Guided Video Generation Paper
Video as the New Language for Real-World Decision Making Paper

Grounding / Affordance

Title Link
Do As I Can, Not As I Say: Grounding Language in Robotic Affordances Paper
Physically Ground Vision-Language Models for Robotic Manipulation Paper
Grounded Decoding: Guiding Text Generation with Grounded Models for Embodied Agents Paper
LLaVA-Grounding: Grounded Visual Chat with Large Multimodal Models Paper
ManipVQA: Injecting Robotic Affordance and Physically Grounded Information into Multi-Modal Large Language Models Paper

Other

Title Link
Pushing the Limits of Cross-Embodiment Learning for Manipulation and Navigation Paper
Goal Representations for Instruction Following: A Semi-Supervised Language Interface to Control Paper
Hierarchical Reinforcement Learning in
Complex 3D Environments Paper
Genie: Generative Interactive Environments Paper
Large Language Models Can Implement Policy Iteration Paper
Efficient Data Collection for Robotic Manipulation via Compositional Generalization Paper
Robots That Can See: Leveraging Human Pose for Trajectory Prediction Paper
Mitigating Spurious Correlations in Multi-modal Models during Fine-tunning Paper
Navigation with Large Language Models: Semantic Guesswork as a Heuristic for Planning Paper
Gradient-based Planning with World Models Paper
A Path Towards Autonomous Machine Intelligence Paper

https://arxiv.org/abs/2403.17844

MM LLM

Title Link
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training Paper
MM-LLms. Recent Advances in MultiModal Large Language Models Paper
Gemini: A Family of Highly Capable Multimodal Models Paper

Datasets / Benchmark

Title Link
Open X-Embodiment: Robotic Learning Datasets and RT-X Models Paper
DROID: A Large-Scale In-The-Wild Robot Manipulation Dataset Paper
DataCOMP: In search of the next generation of multimodal datasets Paper
Multimodal Algorithmic Reasoning Workshop (CVPR2024). Challenge: vision-and-language reasoning on abstract visual puzzles. SMART-101 dataset 1. Workshop 2. Dataset
Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives Paper

https://www.youtube.com/watch?v=akDSG9FsoCk&ab_channel=YunzhuLi

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published