skip to main content
research-article
Open access

CARL: controllable agent with reinforcement learning for quadruped locomotion

Published: 12 August 2020 Publication History

Abstract

Motion synthesis in a dynamic environment has been a long-standing problem for character animation. Methods using motion capture data tend to scale poorly in complex environments because of their larger capturing and labeling requirement. Physics-based controllers are effective in this regard, albeit less controllable. In this paper, we present CARL, a quadruped agent that can be controlled with high-level directives and react naturally to dynamic environments. Starting with an agent that can imitate individual animation clips, we use Generative Adversarial Networks to adapt high-level controls, such as speed and heading, to action distributions that correspond to the original animations. Further fine-tuning through the deep reinforcement learning enables the agent to recover from unseen external perturbations while producing smooth transitions. It then becomes straightforward to create autonomous agents in dynamic environments by adding navigation modules over the entire process. We evaluate our approach by measuring the agent's ability to follow user control and provide a visual analysis of the generated motion to show its effectiveness.

Supplemental Material

MP4 File
Presentation video
Transcript for: Presentation video
MP4 File

References

[1]
Trapit Bansal, Jakub Pachocki, Szymon Sidor, Ilya Sutskever, and Igor Mordatch. 2018. Emergent Complexity via Multi-Agent Competition. In International Conference on Learning Representations.
[2]
Kevin Bergamin, Simon Clavet, Daniel Holden, and James Richard Forbes. 2019. DReCon: Data-Driven Responsive Control of Physics-Based Characters. ACM Transactions on Graphics (TOG) 38, 6 (2019), 206.
[3]
Greg Brockman, Vicki Cheung, Ludwig Pettersson, Jonas Schneider, John Schulman, Jie Tang, and Wojciech Zaremba. 2016. OpenAI Gym. arXiv preprint arXiv:1606.01540 (2016).
[4]
Jinxiang Chai and Jessica K Hodgins. 2005. Performance Animation from Low-dimensional Control Signals. ACM Transactions on Graphics (TOG) 24, 3 (2005), 686--696.
[5]
Nuttapong Chentanez, Matthias Müller, Miles Macklin, Viktor Makoviychuk, and Stefan Jeschke. 2018. Physics-Based Motion Capture Imitation with Deep Reinforcement Learning. In Proceedings of the 11th Annual International Conference on Motion, Interaction, and Games. ACM, 1.
[6]
Yunjey Choi, Minje Choi, Munyoung Kim, Jung-Woo Ha, Sunghun Kim, and Jaegul Choo. 2018. StarGAN: Unified Generative Adversarial Networks for Multi-Domain Image-to-Image Translation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 8789--8797.
[7]
Stelian Coros, Philippe Beaudoin, and Michiel Van de Panne. 2009. Robust task-based control policies for physics-based characters. In ACM Transactions on Graphics (TOG), Vol. 28. ACM, 170.
[8]
Stelian Coros, Philippe Beaudoin, and Michiel Van de Panne. 2010. Generalized biped walking control. In ACM Transactions on Graphics (TOG), Vol. 29. ACM, 130.
[9]
Stelian Coros, Andrej Karpathy, Ben Jones, Lionel Reveret, and Michiel Van De Panne. 2011. Locomotion skills for simulated quadrupeds. In ACM Transactions on Graphics (TOG), Vol. 30. ACM, 59.
[10]
Erwin Coumans et al. 2013. Bullet physics library. Open source: bulletphysics.org (2013).
[11]
Marco Da Silva, Yeuhi Abe, and Jovan Popović. 2008. Simulation of human motion data using short-horizon model-predictive control. In Computer Graphics Forum, Vol. 27. Wiley Online Library, 371--380.
[12]
Dhaivat Dholakiya, Shounak Bhattacharya, Ajay Gunalan, Abhik Singla, Shalabh Bhatnagar, Bharadwaj Amrutur, Ashitava Ghosal, and Shishir Kolathaya. 2019. Design, development and experimental realization of a quadrupedal research platform: Stoch. In 2019 5th International Conference on Control, Automation and Robotics (ICCAR). IEEE, 229--234.
[13]
Jesse Engel, Kumar Krishna Agrawal, Shuo Chen, Ishaan Gulrajani, Chris Donahue, and Adam Roberts. 2019. Gansynth: Adversarial neural audio synthesis. arXiv preprint arXiv:1902.08710 (2019).
[14]
Katerina Fragkiadaki, Sergey Levine, Panna Felsen, and Jitendra Malik. 2015. Recurrent Network Models for Human Dynamics. In Proceedings of the IEEE International Conference on Computer Vision. 4346--4354.
[15]
Justin Fu, Katie Luo, and Sergey Levine. 2018. Learning robust rewards with adversarial inverse reinforcement learning. In International Conference on Learning Representations.
[16]
Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative Adversarial nets. In Advances in neural information processing systems. 2672--2680.
[17]
Keith Grochow, Steven L Martin, Aaron Hertzmann, and Zoran Popović. 2004. Style-based inverse kinematics. In ACM transactions on graphics (TOG), Vol. 23. ACM, 522--531.
[18]
Perttu Hämäläinen, Joose Rajamäki, and C Karen Liu. 2015. Online control of simulated humanoids using particle belief propagation. ACM Transactions on Graphics (TOG) 34, 4 (2015), 81.
[19]
Nicolas Heess, Srinivasan Sriram, Jay Lemmon, Josh Merel, Greg Wayne, Yuval Tassa, Tom Erez, Ziyu Wang, SM Eslami, Martin Riedmiller, et al. 2017. Emergence of locomotion behaviours in rich environments. arXiv preprint arXiv:1707.02286 (2017).
[20]
Jonathan Ho and Stefano Ermon. 2016. Generative adversarial imitation learning. In Advances in neural information processing systems. 4565--4573.
[21]
Daniel Holden, Taku Komura, and Jun Saito. 2017. Phase-Functioned Neural Networks for Character Control. ACM Transactions on Graphics (TOG) 36, 4 (2017), 42.
[22]
Daniel Holden, Jun Saito, and Taku Komura. 2016. A deep learning framework for character motion synthesis and editing. ACM Transactions on Graphics (TOG) 35, 4 (2016), 138.
[23]
Ting-Chieh Huang, Yi-Jheng Huang, and Wen-Chieh Lin. 2013. Real-time Horse Gait Synthesis. Computer Animation and Virtual Worlds 24, 2 (2013), 87--95.
[24]
Jemin Hwangbo, Joonho Lee, Alexey Dosovitskiy, Dario Bellicoso, Vassilios Tsounis, Vladlen Koltun, and Marco Hutter. 2019. Learning agile and dynamic motor skills for legged robots. Science Robotics 4, 26 (2019), eaau5872.
[25]
Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A Efros. 2017. Image-to-image Translation with Conditional Adversarial Networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1125--1134.
[26]
Lucas Kovar and Michael Gleicher. 2004. Automated Extraction and Parameterization of Motions in Large Data Sets. In ACM Transactions on Graphics (TOG), Vol. 23. ACM, 559--568.
[27]
Lucas Kovar, Michael Gleicher, and Frédéric Pighin. 2002. Motion Graphs. In Proceedings of the 29th Annual Conference on Computer Graphics and Interactive Techniques (SIGGRAPH '02). ACM, 473--482.
[28]
Jehee Lee, Jinxiang Chai, Paul SA Reitsma, Jessica K Hodgins, and Nancy S Pollard. 2002. Interactive control of avatars animated with human motion data. In ACM Transactions on Graphics (TOG), Vol. 21. ACM, 491--500.
[29]
Kyungho Lee, Seyoung Lee, and Jehee Lee. 2018. Interactive character animation by learning multi-objective control. ACM Transactions on Graphics (TOG) 37, 6 (2018), 1--10.
[30]
Yoonsang Lee, Sungeun Kim, and Jehee Lee. 2010. Data-driven biped control. In ACM Transactions on Graphics (TOG), Vol. 29. ACM, 129.
[31]
Sergey Levine and Jovan Popović. 2012. Physically Plausible Simulation for Character Animation. In Proceedings of the 11th ACM SIGGRAPH/Eurographics conference on Computer Animation. Eurographics Association, 221--230.
[32]
Sergey Levine, Jack M Wang, Alexis Haraux, Zoran Popović, and Vladlen Koltun. 2012. Continuous character control with low-dimensional embeddings. ACM Transactions on Graphics (TOG) 31, 4 (2012), 28.
[33]
Libin Liu and Jessica Hodgins. 2017. Learning to schedule control fragments for physics-based characters using deep q-learning. ACM Transactions on Graphics (TOG) 36, 3 (2017), 29.
[34]
Libin Liu, KangKang Yin, and Baining Guo. 2015. Improving Sampling-based Motion Control. In Computer Graphics Forum, Vol. 34. Wiley Online Library, 415--423.
[35]
Libin Liu, KangKang Yin, Michiel van de Panne, Tianjia Shao, and Weiwei Xu. 2010. Sampling-based contact-rich motion control. In ACM Transactions on Graphics (TOG), Vol. 29. ACM, 128.
[36]
Josh Merel, Arun Ahuja, Vu Pham, Saran Tunyasuvunakool, Siqi Liu, Dhruva Tirumala, Nicolas Heess, and Greg Wayne. 2019. Hierarchical Visuomotor Control of Humanoids. In International Conference on Learning Representations.
[37]
Josh Merel, Leonard Hasenclever, Alexandre Galashov, Arun Ahuja, Vu Pham, Greg Wayne, Yee Whye Teh, and Nicolas Heess. 2018. Neural probabilistic motor primitives for humanoid control. arXiv preprint arXiv:1811.11711 (2018).
[38]
Federico Lorenzo Moro, Nikos G Tsagarakis, and Darwin G Caldwell. 2012. On the Kinematic Motion Primitives (kMPs) - Theory and Application. Frontiers in neurorobotics 6 (2012), 10.
[39]
Soohwan Park, Hoseok Ryu, Seyoung Lee, Sunmin Lee, and Jehee Lee. 2019. Learning Predict-and-Simulate Policies From Unorganized Human Motion Data. ACM Transactions on Graphics (TOG) 38, 6, Article 205 (2019).
[40]
Xue Bin Peng, Pieter Abbeel, Sergey Levine, and Michiel van de Panne. 2018. DeepMimic: Example-Guided Deep Reinforcement Learning of Physics-Based Character Skills. ACM Transactions on Graphics (TOG) 37, 4 (2018), 143.
[41]
Xue Bin Peng, Glen Berseth, and Michiel Van de Panne. 2015. Dynamic terrain traversal skills using reinforcement learning. ACM Transactions on Graphics (TOG) 34, 4 (2015), 80.
[42]
Xue Bin Peng, Glen Berseth, and Michiel Van de Panne. 2016. Terrain-adaptive locomotion skills using deep reinforcement learning. ACM Transactions on Graphics (TOG) 35, 4 (2016), 81.
[43]
Xue Bin Peng, Michael Chang, Grace Zhang, Pieter Abbeel, and Sergey Levine. 2019. MCP: Learning Composable Hierarchical Control with Multiplicative Compositional Policies. In NeurIPS.
[44]
Alla Safonova and Jessica K Hodgins. 2007. Construction and optimal search of interpolated motion graphs. ACM Transactions on Graphics (TOG) 26, 3 (2007), 106.
[45]
John Schulman, Sergey Levine, Pieter Abbeel, Michael Jordan, and Philipp Moritz. 2015a. Trust region policy optimization. In International conference on machine learning. 1889--1897.
[46]
John Schulman, Philipp Moritz, Sergey Levine, Michael Jordan, and Pieter Abbeel. 2015b. High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:1506.02438 (2015).
[47]
John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. 2017. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017).
[48]
Hyun Joon Shin and Jehee Lee. 2006. Motion synthesis and editing in low-dimensional spaces. Computer Animation and Virtual Worlds 17, 3-4 (2006), 219--227.
[49]
Sebastian Starke, He Zhang, Taku Komura, and Jun Saito. 2019. Neural state machine for character-scene interactions. ACM Transactions on Graphics (TOG) 38, 6 (2019), 1--14.
[50]
Richard S Sutton, Andrew G Barto, et al. 1998. Introduction to reinforcement learning. Vol. 2. MIT press Cambridge.
[51]
Yuval Tassa, Tom Erez, and Emanuel Todorov. 2012. Synthesis and stabilization of complex behaviors through online trajectory optimization. In 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, 4906--4913.
[52]
Yuval Tassa, Nicolas Mansard, and Emo Todorov. 2014. Control-limited differential dynamic programming. In 2014 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 1168--1175.
[53]
Emanuel Todorov, Tom Erez, and Yuval Tassa. 2012. Mujoco: A Physics Engine for Model-Based Control. In 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, 5026--5033.
[54]
Kevin Wampler and Zoran Popović. 2009. Optimal gait and form for animal locomotion. In ACM Transactions on Graphics (TOG), Vol. 28. ACM, 60.
[55]
Kevin Wampler, Zoran Popović, and Jovan Popović. 2014. Generalizing locomotion style to new animals with inverse optimal regression. ACM Transactions on Graphics (TOG) 33, 4 (2014), 49.
[56]
Jungdam Won and Jehee Lee. 2019. Learning body shape variation in physics-based characters. ACM Transactions on Graphics (TOG) 38, 6 (2019), 1--12.
[57]
Jungdam Won, Jongho Park, Kwanyu Kim, and Jehee Lee. 2017. How to train your dragon: example-guided control of flapping flight. ACM Transactions on Graphics (TOG) 36, 6 (2017), 198.
[58]
Jungdam Won, Jungnam Park, and Jehee Lee. 2018. Aerobatics control of flying creatures via self-regulated learning. ACM Transactions on Graphics (TOG) 37, 6 (2018), 1--10.
[59]
Yuting Ye and C Karen Liu. 2010. Optimal feedback control for character animation using an abstract model. ACM Transactions on Graphics (TOG) 29, 4 (2010), 74.
[60]
KangKang Yin, Kevin Loken, and Michiel Van de Panne. 2007. Simbicon: Simple biped locomotion control. In ACM Transactions on Graphics (TOG), Vol. 26. ACM, 105.
[61]
He Zhang, Sebastian Starke, Taku Komura, and Jun Saito. 2018. Mode-adaptive neural networks for quadruped motion control. ACM Transactions on Graphics (TOG) 37, 4 (2018), 145.
[62]
Yi Zhou, Zimo Li, Shuangjiu Xiao, Chong He, Zeng Huang, and Hao Li. 2018. Auto-Conditioned Recurrent Networks for Extended Complex Human Motion Synthesis. (2018).
[63]
Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A Efros. 2017. Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks. In Proceedings of the IEEE international conference on computer vision. 2223--2232.

Cited By

View all
  • (2024)A Literature Survey on Quadruped AI Assistant: Integrating Image Processing and Natural Language Processing for Emotional IntelligenceInternational Journal of Advanced Research in Science, Communication and Technology10.48175/IJARSCT-15313(70-82)Online publication date: 5-Feb-2024
  • (2024)VMP: Versatile Motion Priors for Robustly Tracking Motion on Physical CharactersProceedings of the ACM SIGGRAPH/Eurographics Symposium on Computer Animation10.1111/cgf.15175(1-11)Online publication date: 21-Aug-2024
  • (2024)ADAPT: AI‐Driven Artefact Purging Technique for IMU Based Motion CaptureComputer Graphics Forum10.1111/cgf.15172Online publication date: 17-Oct-2024
  • Show More Cited By

Index Terms

  1. CARL: controllable agent with reinforcement learning for quadruped locomotion

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Transactions on Graphics
      ACM Transactions on Graphics  Volume 39, Issue 4
      August 2020
      1732 pages
      ISSN:0730-0301
      EISSN:1557-7368
      DOI:10.1145/3386569
      Issue’s Table of Contents
      This work is licensed under a Creative Commons Attribution-NonCommercial International 4.0 License.

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 12 August 2020
      Published in TOG Volume 39, Issue 4

      Check for updates

      Badges

      Author Tags

      1. deep reinforcement learning (DRL)
      2. generative adversarial network (GAN)
      3. locomotion
      4. motion synthesis
      5. quadruped

      Qualifiers

      • Research-article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)476
      • Downloads (Last 6 weeks)45
      Reflects downloads up to 30 Nov 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)A Literature Survey on Quadruped AI Assistant: Integrating Image Processing and Natural Language Processing for Emotional IntelligenceInternational Journal of Advanced Research in Science, Communication and Technology10.48175/IJARSCT-15313(70-82)Online publication date: 5-Feb-2024
      • (2024)VMP: Versatile Motion Priors for Robustly Tracking Motion on Physical CharactersProceedings of the ACM SIGGRAPH/Eurographics Symposium on Computer Animation10.1111/cgf.15175(1-11)Online publication date: 21-Aug-2024
      • (2024)ADAPT: AI‐Driven Artefact Purging Technique for IMU Based Motion CaptureComputer Graphics Forum10.1111/cgf.15172Online publication date: 17-Oct-2024
      • (2024)A Hierarchical Framework for Quadruped Omnidirectional Locomotion Based on Reinforcement LearningIEEE Transactions on Automation Science and Engineering10.1109/TASE.2023.331094521:4(5367-5378)Online publication date: Oct-2024
      • (2024)Mastering broom‐like tools for object transportation animation using deep reinforcement learningComputer Animation and Virtual Worlds10.1002/cav.225535:3Online publication date: 14-Jun-2024
      • (2023)Development of a Real-Time Quadruped Animal Character Rig SystemJournal of Digital Contents Society10.9728/dcs.2023.24.12.297124:12(2971-2980)Online publication date: 31-Dec-2023
      • (2023)FastMimic: Model-Based Motion Imitation for Agile, Diverse and Generalizable Quadrupedal LocomotionRobotics10.3390/robotics1203009012:3(90)Online publication date: 20-Jun-2023
      • (2023)Bidirectional GaitNet: A Bidirectional Prediction Model of Human Gait and Anatomical ConditionsACM SIGGRAPH 2023 Conference Proceedings10.1145/3588432.3591492(1-9)Online publication date: 23-Jul-2023
      • (2023)Solving Challenging Control Problems via Learning-based Motion Planning and Imitation2023 20th International Conference on Ubiquitous Robots (UR)10.1109/UR57808.2023.10202250(267-274)Online publication date: 25-Jun-2023
      • (2023)Expanding Versatility of Agile Locomotion through Policy Transitions Using Latent State Representation2023 IEEE International Conference on Robotics and Automation (ICRA)10.1109/ICRA48891.2023.10160776(5134-5140)Online publication date: 29-May-2023
      • Show More Cited By

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Login options

      Full Access

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media