Skip to main content

Showing 1–38 of 38 results for author: Florence, P

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.07775  [pdf, other

    cs.RO cs.AI

    Mobility VLA: Multimodal Instruction Navigation with Long-Context VLMs and Topological Graphs

    Authors: Hao-Tien Lewis Chiang, Zhuo Xu, Zipeng Fu, Mithun George Jacob, Tingnan Zhang, Tsang-Wei Edward Lee, Wenhao Yu, Connor Schenck, David Rendleman, Dhruv Shah, Fei Xia, Jasmine Hsu, Jonathan Hoech, Pete Florence, Sean Kirmani, Sumeet Singh, Vikas Sindhwani, Carolina Parada, Chelsea Finn, Peng Xu, Sergey Levine, Jie Tan

    Abstract: An elusive goal in navigation research is to build an intelligent agent that can understand multimodal instructions including natural language and image, and perform useful navigation. To achieve this, we study a widely useful category of navigation tasks we call Multimodal Instruction Navigation with demonstration Tours (MINT), in which the environment prior is provided through a previously recor… ▽ More

    Submitted 12 July, 2024; v1 submitted 10 July, 2024; originally announced July 2024.

  2. arXiv:2405.02292  [pdf, other

    cs.RO cs.LG

    ALOHA 2: An Enhanced Low-Cost Hardware for Bimanual Teleoperation

    Authors: ALOHA 2 Team, Jorge Aldaco, Travis Armstrong, Robert Baruch, Jeff Bingham, Sanky Chan, Kenneth Draper, Debidatta Dwibedi, Chelsea Finn, Pete Florence, Spencer Goodrich, Wayne Gramlich, Torr Hage, Alexander Herzog, Jonathan Hoech, Thinh Nguyen, Ian Storz, Baruch Tabanpour, Leila Takayama, Jonathan Tompson, Ayzaan Wahid, Ted Wahrburg, Sichun Xu, Sergey Yaroshenko, Kevin Zakka , et al. (1 additional authors not shown)

    Abstract: Diverse demonstration datasets have powered significant advances in robot learning, but the dexterity and scale of such data can be limited by the hardware cost, the hardware robustness, and the ease of teleoperation. We introduce ALOHA 2, an enhanced version of ALOHA that has greater performance, ergonomics, and robustness compared to the original design. To accelerate research in large-scale bim… ▽ More

    Submitted 7 February, 2024; originally announced May 2024.

    Comments: Project website: aloha-2.github.io

  3. arXiv:2401.12168  [pdf, other

    cs.CV cs.CL cs.LG cs.RO

    SpatialVLM: Endowing Vision-Language Models with Spatial Reasoning Capabilities

    Authors: Boyuan Chen, Zhuo Xu, Sean Kirmani, Brian Ichter, Danny Driess, Pete Florence, Dorsa Sadigh, Leonidas Guibas, Fei Xia

    Abstract: Understanding and reasoning about spatial relationships is a fundamental capability for Visual Question Answering (VQA) and robotics. While Vision Language Models (VLM) have demonstrated remarkable performance in certain VQA benchmarks, they still lack capabilities in 3D spatial reasoning, such as recognizing quantitative relationships of physical objects like distances or size differences. We hyp… ▽ More

    Submitted 22 January, 2024; originally announced January 2024.

  4. arXiv:2311.00899  [pdf, other

    cs.RO

    RoboVQA: Multimodal Long-Horizon Reasoning for Robotics

    Authors: Pierre Sermanet, Tianli Ding, Jeffrey Zhao, Fei Xia, Debidatta Dwibedi, Keerthana Gopalakrishnan, Christine Chan, Gabriel Dulac-Arnold, Sharath Maddineni, Nikhil J Joshi, Pete Florence, Wei Han, Robert Baruch, Yao Lu, Suvir Mirchandani, Peng Xu, Pannag Sanketi, Karol Hausman, Izhak Shafran, Brian Ichter, Yuan Cao

    Abstract: We present a scalable, bottom-up and intrinsically diverse data collection scheme that can be used for high-level reasoning with long and medium horizons and that has 2.2x higher throughput compared to traditional narrow top-down step-by-step collection. We collect realistic data by performing any user requests within the entirety of 3 office buildings and using multiple robot and human embodiment… ▽ More

    Submitted 1 November, 2023; originally announced November 2023.

  5. arXiv:2310.10625  [pdf, other

    cs.CV cs.AI cs.LG cs.RO

    Video Language Planning

    Authors: Yilun Du, Mengjiao Yang, Pete Florence, Fei Xia, Ayzaan Wahid, Brian Ichter, Pierre Sermanet, Tianhe Yu, Pieter Abbeel, Joshua B. Tenenbaum, Leslie Kaelbling, Andy Zeng, Jonathan Tompson

    Abstract: We are interested in enabling visual planning for complex long-horizon tasks in the space of generated videos and language, leveraging recent advances in large generative models pretrained on Internet-scale data. To this end, we present video language planning (VLP), an algorithm that consists of a tree search procedure, where we train (i) vision-language models to serve as both policies and value… ▽ More

    Submitted 16 October, 2023; originally announced October 2023.

    Comments: https://video-language-planning.github.io/

  6. arXiv:2307.15818  [pdf, other

    cs.RO cs.CL cs.CV cs.LG

    RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control

    Authors: Anthony Brohan, Noah Brown, Justice Carbajal, Yevgen Chebotar, Xi Chen, Krzysztof Choromanski, Tianli Ding, Danny Driess, Avinava Dubey, Chelsea Finn, Pete Florence, Chuyuan Fu, Montse Gonzalez Arenas, Keerthana Gopalakrishnan, Kehang Han, Karol Hausman, Alexander Herzog, Jasmine Hsu, Brian Ichter, Alex Irpan, Nikhil Joshi, Ryan Julian, Dmitry Kalashnikov, Yuheng Kuang, Isabel Leal , et al. (29 additional authors not shown)

    Abstract: We study how vision-language models trained on Internet-scale data can be incorporated directly into end-to-end robotic control to boost generalization and enable emergent semantic reasoning. Our goal is to enable a single end-to-end trained model to both learn to map robot observations to actions and enjoy the benefits of large-scale pretraining on language and vision-language data from the web.… ▽ More

    Submitted 28 July, 2023; originally announced July 2023.

    Comments: Website: https://robotics-transformer.github.io/

  7. arXiv:2307.14535  [pdf, other

    cs.RO

    Scaling Up and Distilling Down: Language-Guided Robot Skill Acquisition

    Authors: Huy Ha, Pete Florence, Shuran Song

    Abstract: We present a framework for robot skill acquisition, which 1) efficiently scale up data generation of language-labelled robot data and 2) effectively distills this data down into a robust multi-task language-conditioned visuo-motor policy. For (1), we use a large language model (LLM) to guide high-level planning, and sampling-based robot planners (e.g. motion or grasp samplers) for generating diver… ▽ More

    Submitted 30 September, 2023; v1 submitted 26 July, 2023; originally announced July 2023.

    Comments: 25 pages, 9 figures, videos and code links on website https://www.cs.columbia.edu/~huy/scalingup/

    ACM Class: I.2.9

  8. arXiv:2307.14334  [pdf, other

    cs.CL cs.CV

    Towards Generalist Biomedical AI

    Authors: Tao Tu, Shekoofeh Azizi, Danny Driess, Mike Schaekermann, Mohamed Amin, Pi-Chuan Chang, Andrew Carroll, Chuck Lau, Ryutaro Tanno, Ira Ktena, Basil Mustafa, Aakanksha Chowdhery, Yun Liu, Simon Kornblith, David Fleet, Philip Mansfield, Sushant Prakash, Renee Wong, Sunny Virmani, Christopher Semturs, S Sara Mahdavi, Bradley Green, Ewa Dominowska, Blaise Aguera y Arcas, Joelle Barral , et al. (7 additional authors not shown)

    Abstract: Medicine is inherently multimodal, with rich data modalities spanning text, imaging, genomics, and more. Generalist biomedical artificial intelligence (AI) systems that flexibly encode, integrate, and interpret this data at scale can potentially enable impactful applications ranging from scientific discovery to care delivery. To enable the development of these models, we first curate MultiMedBench… ▽ More

    Submitted 26 July, 2023; originally announced July 2023.

  9. arXiv:2307.04721  [pdf, other

    cs.AI cs.CL cs.RO

    Large Language Models as General Pattern Machines

    Authors: Suvir Mirchandani, Fei Xia, Pete Florence, Brian Ichter, Danny Driess, Montserrat Gonzalez Arenas, Kanishka Rao, Dorsa Sadigh, Andy Zeng

    Abstract: We observe that pre-trained large language models (LLMs) are capable of autoregressively completing complex token sequences -- from arbitrary ones procedurally generated by probabilistic context-free grammars (PCFG), to more rich spatial patterns found in the Abstraction and Reasoning Corpus (ARC), a general AI benchmark, prompted in the style of ASCII art. Surprisingly, pattern completion profici… ▽ More

    Submitted 25 October, 2023; v1 submitted 10 July, 2023; originally announced July 2023.

    Comments: 21 pages, 25 figures. To appear at Conference on Robot Learning (CoRL) 2023

  10. arXiv:2304.04150  [pdf, other

    cs.RO cs.AI

    RoboPianist: Dexterous Piano Playing with Deep Reinforcement Learning

    Authors: Kevin Zakka, Philipp Wu, Laura Smith, Nimrod Gileadi, Taylor Howell, Xue Bin Peng, Sumeet Singh, Yuval Tassa, Pete Florence, Andy Zeng, Pieter Abbeel

    Abstract: Replicating human-like dexterity in robot hands represents one of the largest open problems in robotics. Reinforcement learning is a promising approach that has achieved impressive progress in the last few years; however, the class of problems it has typically addressed corresponds to a rather narrow definition of dexterity as compared to human capabilities. To address this gap, we investigate pia… ▽ More

    Submitted 3 December, 2023; v1 submitted 8 April, 2023; originally announced April 2023.

    Comments: Accepted to the Conference on Robot Learning (CORL) 2023

  11. arXiv:2303.03378  [pdf, other

    cs.LG cs.AI cs.RO

    PaLM-E: An Embodied Multimodal Language Model

    Authors: Danny Driess, Fei Xia, Mehdi S. M. Sajjadi, Corey Lynch, Aakanksha Chowdhery, Brian Ichter, Ayzaan Wahid, Jonathan Tompson, Quan Vuong, Tianhe Yu, Wenlong Huang, Yevgen Chebotar, Pierre Sermanet, Daniel Duckworth, Sergey Levine, Vincent Vanhoucke, Karol Hausman, Marc Toussaint, Klaus Greff, Andy Zeng, Igor Mordatch, Pete Florence

    Abstract: Large language models excel at a wide range of complex tasks. However, enabling general inference in the real world, e.g., for robotics problems, raises the challenge of grounding. We propose embodied language models to directly incorporate real-world continuous sensor modalities into language models and thereby establish the link between words and percepts. Input to our embodied language model ar… ▽ More

    Submitted 6 March, 2023; originally announced March 2023.

  12. arXiv:2303.00855  [pdf

    cs.RO cs.AI cs.CL cs.CV cs.LG

    Grounded Decoding: Guiding Text Generation with Grounded Models for Embodied Agents

    Authors: Wenlong Huang, Fei Xia, Dhruv Shah, Danny Driess, Andy Zeng, Yao Lu, Pete Florence, Igor Mordatch, Sergey Levine, Karol Hausman, Brian Ichter

    Abstract: Recent progress in large language models (LLMs) has demonstrated the ability to learn and leverage Internet-scale knowledge through pre-training with autoregressive models. Unfortunately, applying such models to settings with embodied agents, such as robots, is challenging due to their lack of experience with the physical world, inability to parse non-language observations, and ignorance of reward… ▽ More

    Submitted 11 December, 2023; v1 submitted 1 March, 2023; originally announced March 2023.

  13. arXiv:2301.08556  [pdf, other

    cs.LG cs.CV cs.RO

    NeRF in the Palm of Your Hand: Corrective Augmentation for Robotics via Novel-View Synthesis

    Authors: Allan Zhou, Moo Jin Kim, Lirui Wang, Pete Florence, Chelsea Finn

    Abstract: Expert demonstrations are a rich source of supervision for training visual robotic manipulation policies, but imitation learning methods often require either a large number of demonstrations or expensive online expert supervision to learn reactive closed-loop behaviors. In this work, we introduce SPARTN (Synthetic Perturbations for Augmenting Robot Trajectories via NeRF): a fully-offline data augm… ▽ More

    Submitted 18 January, 2023; originally announced January 2023.

  14. arXiv:2212.06764  [pdf, other

    cs.RO

    Single-Level Differentiable Contact Simulation

    Authors: Simon Le Cleac'h, Mac Schwager, Zachary Manchester, Vikas Sindhwani, Pete Florence, Sumeet Singh

    Abstract: We present a differentiable formulation of rigid-body contact dynamics for objects and robots represented as compositions of convex primitives. Existing optimization-based approaches simulating contact between convex primitives rely on a bilevel formulation that separates collision detection and contact simulation. These approaches are unreliable in realistic contact simulation scenarios because i… ▽ More

    Submitted 3 January, 2023; v1 submitted 13 December, 2022; originally announced December 2022.

  15. arXiv:2212.06088  [pdf, other

    cs.RO

    MIRA: Mental Imagery for Robotic Affordances

    Authors: Lin Yen-Chen, Pete Florence, Andy Zeng, Jonathan T. Barron, Yilun Du, Wei-Chiu Ma, Anthony Simeonov, Alberto Rodriguez Garcia, Phillip Isola

    Abstract: Humans form mental images of 3D scenes to support counterfactual imagination, planning, and motor control. Our abilities to predict the appearance and affordance of the scene from previously unobserved viewpoints aid us in performing manipulation tasks (e.g., 6-DoF kitting) with a level of ease that is currently out of reach for existing robot learning frameworks. In this work, we aim to build art… ▽ More

    Submitted 12 December, 2022; originally announced December 2022.

    Comments: CoRL 2022, webpage: https://yenchenlin.me/mira

  16. arXiv:2210.06407  [pdf, other

    cs.RO cs.AI cs.LG

    Interactive Language: Talking to Robots in Real Time

    Authors: Corey Lynch, Ayzaan Wahid, Jonathan Tompson, Tianli Ding, James Betker, Robert Baruch, Travis Armstrong, Pete Florence

    Abstract: We present a framework for building interactive, real-time, natural language-instructable robots in the real world, and we open source related assets (dataset, environment, benchmark, and policies). Trained with behavioral cloning on a dataset of hundreds of thousands of language-annotated trajectories, a produced policy can proficiently execute an order of magnitude more commands than previous wo… ▽ More

    Submitted 12 October, 2022; originally announced October 2022.

  17. arXiv:2210.03701  [pdf, other

    cs.RO

    VIRDO++: Real-World, Visuo-tactile Dynamics and Perception of Deformable Objects

    Authors: Youngsun Wi, Andy Zeng, Pete Florence, Nima Fazeli

    Abstract: Deformable objects manipulation can benefit from representations that seamlessly integrate vision and touch while handling occlusions. In this work, we present a novel approach for, and real-world demonstration of, multimodal visuo-tactile state-estimation and dynamics prediction for deformable objects. Our approach, VIRDO++, builds on recent progress in multimodal neural implicit representations… ▽ More

    Submitted 7 October, 2022; originally announced October 2022.

  18. arXiv:2209.07753  [pdf, other

    cs.RO

    Code as Policies: Language Model Programs for Embodied Control

    Authors: Jacky Liang, Wenlong Huang, Fei Xia, Peng Xu, Karol Hausman, Brian Ichter, Pete Florence, Andy Zeng

    Abstract: Large language models (LLMs) trained on code completion have been shown to be capable of synthesizing simple Python programs from docstrings [1]. We find that these code-writing LLMs can be re-purposed to write robot policy code, given natural language commands. Specifically, policy code can express functions or feedback loops that process perception outputs (e.g.,from object detectors [2], [3]) a… ▽ More

    Submitted 24 May, 2023; v1 submitted 16 September, 2022; originally announced September 2022.

  19. arXiv:2207.05608  [pdf, other

    cs.RO cs.AI cs.CL cs.CV cs.LG

    Inner Monologue: Embodied Reasoning through Planning with Language Models

    Authors: Wenlong Huang, Fei Xia, Ted Xiao, Harris Chan, Jacky Liang, Pete Florence, Andy Zeng, Jonathan Tompson, Igor Mordatch, Yevgen Chebotar, Pierre Sermanet, Noah Brown, Tomas Jackson, Linda Luu, Sergey Levine, Karol Hausman, Brian Ichter

    Abstract: Recent works have shown how the reasoning capabilities of Large Language Models (LLMs) can be applied to domains beyond natural language processing, such as planning and interaction for robots. These embodied problems require an agent to understand many semantic aspects of the world: the repertoire of skills available, how these skills influence the world, and how changes to the world map back to… ▽ More

    Submitted 12 July, 2022; originally announced July 2022.

    Comments: Project website: https://innermonologue.github.io

  20. arXiv:2206.01634  [pdf, other

    cs.LG cs.CV cs.RO

    Reinforcement Learning with Neural Radiance Fields

    Authors: Danny Driess, Ingmar Schubert, Pete Florence, Yunzhu Li, Marc Toussaint

    Abstract: It is a long-standing problem to find effective representations for training reinforcement learning (RL) agents. This paper demonstrates that learning state representations with supervision from Neural Radiance Fields (NeRFs) can improve the performance of RL compared to other learned representations or even low-dimensional, hand-engineered state information. Specifically, we propose to train an e… ▽ More

    Submitted 3 June, 2022; originally announced June 2022.

  21. arXiv:2205.06333  [pdf, other

    cs.RO cs.AI cs.CV cs.LG

    Visuomotor Control in Multi-Object Scenes Using Object-Aware Representations

    Authors: Negin Heravi, Ayzaan Wahid, Corey Lynch, Pete Florence, Travis Armstrong, Jonathan Tompson, Pierre Sermanet, Jeannette Bohg, Debidatta Dwibedi

    Abstract: Perceptual understanding of the scene and the relationship between its different components is important for successful completion of robotic tasks. Representation learning has been shown to be a powerful technique for this, but most of the current methodologies learn task specific representations that do not necessarily transfer well to other tasks. Furthermore, representations learned by supervi… ▽ More

    Submitted 12 March, 2023; v1 submitted 12 May, 2022; originally announced May 2022.

  22. arXiv:2204.00598  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language

    Authors: Andy Zeng, Maria Attarian, Brian Ichter, Krzysztof Choromanski, Adrian Wong, Stefan Welker, Federico Tombari, Aveek Purohit, Michael Ryoo, Vikas Sindhwani, Johnny Lee, Vincent Vanhoucke, Pete Florence

    Abstract: Large pretrained (e.g., "foundation") models exhibit distinct capabilities depending on the domain of data they are trained on. While these domains are generic, they may only barely overlap. For example, visual-language models (VLMs) are trained on Internet-scale image captions, but large language models (LMs) are further trained on Internet-scale text with no images (e.g., spreadsheets, SAT quest… ▽ More

    Submitted 27 May, 2022; v1 submitted 1 April, 2022; originally announced April 2022.

    Comments: https://socraticmodels.github.io/

  23. arXiv:2203.01983  [pdf, other

    cs.RO

    Implicit Kinematic Policies: Unifying Joint and Cartesian Action Spaces in End-to-End Robot Learning

    Authors: Aditya Ganapathi, Pete Florence, Jake Varley, Kaylee Burns, Ken Goldberg, Andy Zeng

    Abstract: Action representation is an important yet often overlooked aspect in end-to-end robot learning with deep networks. Choosing one action space over another (e.g. target joint positions, or Cartesian end-effector poses) can result in surprisingly stark performance differences between various downstream tasks -- and as a result, considerable research has been devoted to finding the right action space… ▽ More

    Submitted 3 March, 2022; originally announced March 2022.

    Comments: International Conference on Robotics and Automation (ICRA) 2022

  24. arXiv:2203.01913  [pdf, other

    cs.RO cs.CV

    NeRF-Supervision: Learning Dense Object Descriptors from Neural Radiance Fields

    Authors: Lin Yen-Chen, Pete Florence, Jonathan T. Barron, Tsung-Yi Lin, Alberto Rodriguez, Phillip Isola

    Abstract: Thin, reflective objects such as forks and whisks are common in our daily lives, but they are particularly challenging for robot perception because it is hard to reconstruct them using commodity RGB-D cameras or multi-view stereo techniques. While traditional pipelines struggle with objects like these, Neural Radiance Fields (NeRFs) have recently been shown to be remarkably effective for performin… ▽ More

    Submitted 27 April, 2022; v1 submitted 3 March, 2022; originally announced March 2022.

    Comments: ICRA 2022, Website: https://yenchenlin.me/nerf-supervision/

  25. arXiv:2202.00868  [pdf, other

    cs.RO

    VIRDO: Visio-tactile Implicit Representations of Deformable Objects

    Authors: Youngsun Wi, Pete Florence, Andy Zeng, Nima Fazeli

    Abstract: Deformable object manipulation requires computationally efficient representations that are compatible with robotic sensing modalities. In this paper, we present VIRDO:an implicit, multi-modal, and continuous representation for deformable-elastic objects. VIRDO operates directly on visual (point cloud) and tactile (reaction forces) modalities and learns rich latent embeddings of contact locations a… ▽ More

    Submitted 26 September, 2022; v1 submitted 1 February, 2022; originally announced February 2022.

    Comments: This work has been accepted to ICRA 2022

  26. arXiv:2109.04928  [pdf, other

    cs.RO eess.SY

    Trajectory Optimization with Optimization-Based Dynamics

    Authors: Taylor A. Howell, Simon Le Cleac'h, Sumeet Singh, Pete Florence, Zachary Manchester, Vikas Sindhwani

    Abstract: We present a framework for bi-level trajectory optimization in which a system's dynamics are encoded as the solution to a constrained optimization problem and smooth gradients of this lower-level problem are passed to an upper-level trajectory optimizer. This optimization-based dynamics representation enables constraint handling, additional variables, and non-smooth behavior to be abstracted away… ▽ More

    Submitted 11 January, 2023; v1 submitted 10 September, 2021; originally announced September 2021.

    Comments: Minor fixes. Table formatting. Terminology modifications

  27. arXiv:2109.00137  [pdf, other

    cs.RO cs.CV cs.LG

    Implicit Behavioral Cloning

    Authors: Pete Florence, Corey Lynch, Andy Zeng, Oscar Ramirez, Ayzaan Wahid, Laura Downs, Adrian Wong, Johnny Lee, Igor Mordatch, Jonathan Tompson

    Abstract: We find that across a wide range of robot policy learning scenarios, treating supervised policy learning with an implicit model generally performs better, on average, than commonly used explicit models. We present extensive experiments on this finding, and we provide both intuitive insight and theoretical arguments distinguishing the properties of implicit models compared to their explicit counter… ▽ More

    Submitted 31 August, 2021; originally announced September 2021.

  28. arXiv:2106.03911  [pdf, other

    cs.RO cs.AI cs.CV cs.LG

    XIRL: Cross-embodiment Inverse Reinforcement Learning

    Authors: Kevin Zakka, Andy Zeng, Pete Florence, Jonathan Tompson, Jeannette Bohg, Debidatta Dwibedi

    Abstract: We investigate the visual cross-embodiment imitation setting, in which agents learn policies from videos of other agents (such as humans) demonstrating the same task, but with stark differences in their embodiments -- shape, actions, end-effector dynamics, etc. In this work, we demonstrate that it is possible to automatically discover and learn vision-based reward functions from cross-embodiment d… ▽ More

    Submitted 13 December, 2021; v1 submitted 7 June, 2021; originally announced June 2021.

    Comments: Oral Accept, CoRL '21

  29. arXiv:2012.05877  [pdf, other

    cs.CV cs.RO

    INeRF: Inverting Neural Radiance Fields for Pose Estimation

    Authors: Lin Yen-Chen, Pete Florence, Jonathan T. Barron, Alberto Rodriguez, Phillip Isola, Tsung-Yi Lin

    Abstract: We present iNeRF, a framework that performs mesh-free pose estimation by "inverting" a Neural RadianceField (NeRF). NeRFs have been shown to be remarkably effective for the task of view synthesis - synthesizing photorealistic novel views of real-world scenes or objects. In this work, we investigate whether we can apply analysis-by-synthesis via NeRF for mesh-free, RGB-only 6DoF pose estimation - g… ▽ More

    Submitted 10 August, 2021; v1 submitted 10 December, 2020; originally announced December 2020.

    Comments: IROS 2021, Website: https://yenchenlin.me/inerf/

  30. arXiv:2012.03385  [pdf, other

    cs.RO cs.LG

    Learning to Rearrange Deformable Cables, Fabrics, and Bags with Goal-Conditioned Transporter Networks

    Authors: Daniel Seita, Pete Florence, Jonathan Tompson, Erwin Coumans, Vikas Sindhwani, Ken Goldberg, Andy Zeng

    Abstract: Rearranging and manipulating deformable objects such as cables, fabrics, and bags is a long-standing challenge in robotic manipulation. The complex dynamics and high-dimensional configuration spaces of deformables, compared to rigid objects, make manipulation difficult not only for multi-step planning, but even for goal specification. Goals cannot be as easily specified as rigid object poses, and… ▽ More

    Submitted 18 June, 2023; v1 submitted 6 December, 2020; originally announced December 2020.

    Comments: See https://berkeleyautomation.github.io/bags/ for project website and code; v3 is ICRA 2021 version and v4 adds physical experiments and improves simulation results

  31. arXiv:2010.14406  [pdf, other

    cs.RO

    Transporter Networks: Rearranging the Visual World for Robotic Manipulation

    Authors: Andy Zeng, Pete Florence, Jonathan Tompson, Stefan Welker, Jonathan Chien, Maria Attarian, Travis Armstrong, Ivan Krasin, Dan Duong, Ayzaan Wahid, Vikas Sindhwani, Johnny Lee

    Abstract: Robotic manipulation can be formulated as inducing a sequence of spatial displacements: where the space being moved can encompass an object, part of an object, or end effector. In this work, we propose the Transporter Network, a simple model architecture that rearranges deep features to infer spatial displacements from visual input - which can parameterize robot actions. It makes no assumptions of… ▽ More

    Submitted 5 January, 2022; v1 submitted 27 October, 2020; originally announced October 2020.

    Comments: Project webpage: https://transporternets.github.io Summary video: https://youtu.be/8afHfReCfPo?t=12214

  32. arXiv:2009.05085  [pdf, other

    cs.RO

    Keypoints into the Future: Self-Supervised Correspondence in Model-Based Reinforcement Learning

    Authors: Lucas Manuelli, Yunzhu Li, Pete Florence, Russ Tedrake

    Abstract: Predictive models have been at the core of many robotic systems, from quadrotors to walking robots. However, it has been challenging to develop and apply such models to practical robotic manipulation due to high-dimensional sensory observations such as images. Previous approaches to learning models in the context of robotic manipulation have either learned whole image dynamics or used autoencoders… ▽ More

    Submitted 10 September, 2020; originally announced September 2020.

  33. arXiv:1909.06933  [pdf, other

    cs.RO cs.CV cs.LG

    Self-Supervised Correspondence in Visuomotor Policy Learning

    Authors: Peter Florence, Lucas Manuelli, Russ Tedrake

    Abstract: In this paper we explore using self-supervised correspondence for improving the generalization performance and sample efficiency of visuomotor policy learning. Prior work has primarily used approaches such as autoencoding, pose-based losses, and end-to-end policy optimization in order to train the visual portion of visuomotor policies. We instead propose an approach using self-supervised dense vis… ▽ More

    Submitted 15 September, 2019; originally announced September 2019.

    Comments: Video at: https://sites.google.com/view/visuomotor-correspondence

  34. arXiv:1903.06684  [pdf, other

    cs.RO

    kPAM: KeyPoint Affordances for Category-Level Robotic Manipulation

    Authors: Lucas Manuelli, Wei Gao, Peter Florence, Russ Tedrake

    Abstract: We would like robots to achieve purposeful manipulation by placing any instance from a category of objects into a desired set of goal states. Existing manipulation pipelines typically specify the desired configuration as a target 6-DOF pose and rely on explicitly estimating the pose of the manipulated objects. However, representing an object with a parameterized transformation defined on a fixed t… ▽ More

    Submitted 29 October, 2019; v1 submitted 15 March, 2019; originally announced March 2019.

    Comments: First two authors contributed equally. The video and supplemental material is available at https://sites.google.com/view/kpam

  35. arXiv:1901.05103  [pdf, other

    cs.CV

    DeepSDF: Learning Continuous Signed Distance Functions for Shape Representation

    Authors: Jeong Joon Park, Peter Florence, Julian Straub, Richard Newcombe, Steven Lovegrove

    Abstract: Computer graphics, 3D computer vision and robotics communities have produced multiple approaches to representing 3D geometry for rendering and reconstruction. These provide trade-offs across fidelity, efficiency and compression capabilities. In this work, we introduce DeepSDF, a learned continuous Signed Distance Function (SDF) representation of a class of shapes that enables high quality shape re… ▽ More

    Submitted 15 January, 2019; originally announced January 2019.

  36. arXiv:1806.08756  [pdf, other

    cs.RO cs.CV cs.LG

    Dense Object Nets: Learning Dense Visual Object Descriptors By and For Robotic Manipulation

    Authors: Peter R. Florence, Lucas Manuelli, Russ Tedrake

    Abstract: What is the right object representation for manipulation? We would like robots to visually perceive scenes and learn an understanding of the objects in them that (i) is task-agnostic and can be used as a building block for a variety of manipulation tasks, (ii) is generally applicable to both rigid and non-rigid objects, (iii) takes advantage of the strong priors provided by 3D vision, and (iv) is… ▽ More

    Submitted 7 September, 2018; v1 submitted 22 June, 2018; originally announced June 2018.

  37. arXiv:1802.09076  [pdf, other

    cs.RO

    NanoMap: Fast, Uncertainty-Aware Proximity Queries with Lazy Search over Local 3D Data

    Authors: Peter R. Florence, John Carter, Jake Ware, Russ Tedrake

    Abstract: We would like robots to be able to safely navigate at high speed, efficiently use local 3D information, and robustly plan motions that consider pose uncertainty of measurements in a local map structure. This is hard to do with previously existing mapping approaches, like occupancy grids, that are focused on incrementally fusing 3D data into a common world frame. In particular, both their fragile s… ▽ More

    Submitted 25 February, 2018; originally announced February 2018.

    Comments: To Appear at ICRA 2018

  38. arXiv:1707.04796  [pdf, other

    cs.CV cs.RO

    LabelFusion: A Pipeline for Generating Ground Truth Labels for Real RGBD Data of Cluttered Scenes

    Authors: Pat Marion, Peter R. Florence, Lucas Manuelli, Russ Tedrake

    Abstract: Deep neural network (DNN) architectures have been shown to outperform traditional pipelines for object segmentation and pose estimation using RGBD data, but the performance of these DNN pipelines is directly tied to how representative the training data is of the true data. Hence a key requirement for employing these methods in practice is to have a large set of labeled data for your specific robot… ▽ More

    Submitted 26 September, 2017; v1 submitted 15 July, 2017; originally announced July 2017.