Compositional Learning

Title	Link
How to Grow a Mind: Statistics, Structure, and Abstraction	Paper
DreamCoder: Growing generalizable, interpretable knowledge with wake-sleep Bayesian program learning	Paper
Human-level concept learning through probabilistic program induction	Paper
Compositional generalization through meta sequence-to-sequence learning	Paper
Human-link systematic generalization through a meta-learning neural network	Paper
Neural Discrete Representation Learning	VQ-VAE
Generating Diverse High-Fidelity Images with VQ-VAE-2	Paper
Hierarchical Quantized Autoencoders	Paper
Differentiable Graph Module (DGM) for Graph Convolutional Networks	Paper
Compositional generalization through abstract representations in human and artificial neural networks	Paper
Graph Attention Networks	Paper
Graph Transformer Networks	Paper
Systematicity Emerges in Transformers when Abstract Grammatical Roles Guide Attention	Paper
Inducing Systematicity in Transformers by Attending to Structurally Quantized Embeddings	Paper

vla-papers

Title	Link
Yell At Your Robot Improving On-the-Fly from Language Corrections	Paper
RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control	Paper
A Joint Modeling of Vision-Language-Action for Target-oriented Grasping in Clutter	Paper
ADAPT: Vision-Language Navigation with Modality-Aligned Action Prompts	Paper
RT-H: Action Hierarchies Using Language	Paper
RT-1: Robotics Transformer for Real-World Control at Scale	Paper
Gesture-Informed Robot Assistance via Foundation Models	Paper
Robots that ask for help: Uncertainty Alignment for Large Language Model Planners	Paper
Language-Driven Representation Learning for Robotics	Paper
PRISE: Learning Temporal Action Abstractions as a Sequence Compression Problem	Paper
Scaling Instructable Agents Across Many Simulated Worlds	Paper
PaLM-E: An Embodied Multimodal Language Model	Paper
PIVOT: Iterative Visual Prompting Elicits Actionable Knowledge for VLMs	Paper
Vision-Language Models Provide Promptable Representations for Reinforcement Learning	Paper
MOKA: Open-Vocabulary Robotic Manipulation through Mark-Based Visual Prompting	Paper
	Paper
Interactive Language: Talking to Robots in Real Time	Paper
Language Conditioned Imitation Learning over Unstructured Data	Paper
Language-Conditioned Path Planning	Paper
RoboVQA: Multimodal Long-Horizon Reasoning for Robotics	Paper
AutoRT: Embodied Foundation Models for Large Scale Orchestration of Robotic Agents	Paper
Look Before You Leap: Unveiling the Power of
GPT-4V in Robotic Vision-Language Planning	Paper
Interactive Task Planning with Language Models	Paper
Video Language Planning	Paper
Large Langueage Models are Visual Reasoning Coordinators	Paper
Learning Universal Policies via Text-Guided Video Generation	Paper
Video as the New Language for Real-World Decision Making	Paper

Grounding / Affordance

Title	Link
Do As I Can, Not As I Say: Grounding Language in Robotic Affordances	Paper
Physically Ground Vision-Language Models for Robotic Manipulation	Paper
Grounded Decoding: Guiding Text Generation with Grounded Models for Embodied Agents	Paper
LLaVA-Grounding: Grounded Visual Chat with Large Multimodal Models	Paper
ManipVQA: Injecting Robotic Affordance and Physically Grounded Information into Multi-Modal Large Language Models	Paper

Other

Title	Link
Pushing the Limits of Cross-Embodiment Learning for Manipulation and Navigation	Paper
Goal Representations for Instruction Following: A Semi-Supervised Language Interface to Control	Paper
Hierarchical Reinforcement Learning in
Complex 3D Environments	Paper
Genie: Generative Interactive Environments	Paper
Large Language Models Can Implement Policy Iteration	Paper
Efficient Data Collection for Robotic Manipulation via Compositional Generalization	Paper
Robots That Can See: Leveraging Human Pose for Trajectory Prediction	Paper
Mitigating Spurious Correlations in Multi-modal Models during Fine-tunning	Paper
Navigation with Large Language Models: Semantic Guesswork as a Heuristic for Planning	Paper
Gradient-based Planning with World Models	Paper
A Path Towards Autonomous Machine Intelligence	Paper

https://arxiv.org/abs/2403.17844

MM LLM

Title	Link
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training	Paper
MM-LLms. Recent Advances in MultiModal Large Language Models	Paper
Gemini: A Family of Highly Capable Multimodal Models	Paper

Datasets / Benchmark

Title	Link
Open X-Embodiment: Robotic Learning Datasets and RT-X Models	Paper
DROID: A Large-Scale In-The-Wild Robot Manipulation Dataset	Paper
DataCOMP: In search of the next generation of multimodal datasets	Paper
Multimodal Algorithmic Reasoning Workshop (CVPR2024). Challenge: vision-and-language reasoning on abstract visual puzzles. SMART-101 dataset	1. Workshop 2. Dataset
Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives	Paper

https://www.youtube.com/watch?v=akDSG9FsoCk&ab_channel=YunzhuLi

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Compositional Learning

vla-papers

Grounding / Affordance

Other

MM LLM

Datasets / Benchmark

About

Releases

Packages

Contributors 2

gepizar/vla-papers

Folders and files

Latest commit

History

Repository files navigation

Compositional Learning

vla-papers

Grounding / Affordance

Other

MM LLM

Datasets / Benchmark

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Packages