research-article

Terrain-adaptive locomotion skills using deep reinforcement learning

Authors:

Michiel van de PanneAuthors Info & Claims

ACM Transactions on Graphics (TOG), Volume 35, Issue 4

Article No.: 81, Pages 1 - 12

https://doi.org/10.1145/2897824.2925881

Published: 11 July 2016 Publication History

Abstract

Reinforcement learning offers a promising methodology for developing skills for simulated characters, but typically requires working with sparse hand-crafted features. Building on recent progress in deep reinforcement learning (DeepRL), we introduce a mixture of actor-critic experts (MACE) approach that learns terrain-adaptive dynamic locomotion skills using high-dimensional state and terrain descriptions as input, and parameterized leaps or steps as output actions. MACE learns more quickly than a single actor-critic approach and results in actor-critic experts that exhibit specialization. Additional elements of our solution that contribute towards efficient learning include Boltzmann exploration and the use of initial actor biases to encourage specialization. Results are demonstrated for multiple planar characters and terrain classes.

Supplementary Material

ZIP File (a81-peng-supp.zip)

Supplemental files.

Download
57.25 MB

MP4 File (a81.mp4)

Download
309.52 MB

References

[1]

Assael, J.-A. M., Wahlström, N., Schön, T. B., and Deisenroth, M. P. 2015. Data-efficient learning of feedback policies from image pixels using deep dynamical models. arXiv preprint arXiv:1510.02173.

[2]

Bullet, 2015. Bullet physics library, Dec. https://bulletphysics.org.

[3]

Calinon, S., Kormushev, P., and Caldwell, D. G. 2013. Compliant skills acquisition and multi-optima policy search with em-based reinforcement learning. Robotics and Autonomous Systems 61, 4, 369--379.

Digital Library

[4]

Coros, S., Beaudoin, P., Yin, K. K., and van de Panne, M. 2008. Synthesis of constrained walking skills. ACM Trans. Graph. 27, 5, Article 113.

Digital Library

[5]

Coros, S., Beaudoin, P., and van de Panne, M. 2009. Robust task-based control policies for physics-based characters. ACM Transctions on Graphics 28, 5, Article 170.

Digital Library

[6]

Coros, S., Beaudoin, P., and van de Panne, M. 2010. Generalized biped walking control. ACM Transctions on Graphics 29, 4, Article 130.

Digital Library

[7]

Coros, S., Karpathy, A., Jones, B., Reveret, L., and van de Panne, M. 2011. Locomotion skills for simulated quadrupeds. ACM Transactions on Graphics 30, 4, Article 59.

Digital Library

[8]

da Silva, M., Abe, Y., and Popović, J. 2008. Interactive simulation of stylized human locomotion. ACM Trans. Graph. 27, 3, Article 82.

Digital Library

[9]

da Silva, M., Durand, F., and Popović, J. 2009. Linear bellman combination for control of character animation. ACM Trans. Graph. 28, 3, Article 82.

Digital Library

[10]

Doya, K., Samejima, K., Katagiri, K.-i., and Kawato, M. 2002. Multiple model-based reinforcement learning. Neural computation 14, 6, 1347--1369.

Digital Library

[11]

Faloutsos, P., van de Panne, M., and Terzopoulos, D. 2001. Composable controllers for physics-based character animation. In Proceedings of SIGGRAPH 2001, 251--260.

Digital Library

[12]

Featherstone, R. 2014. Rigid body dynamics algorithms. Springer.

Digital Library

[13]

Geijtenbeek, T., and Pronost, N. 2012. Interactive character animation using simulated physics: A state-of-the-art review. In Computer Graphics Forum, vol. 31, Wiley Online Library, 2492--2515.

Digital Library

[14]

Grzeszczuk, R., Terzopoulos, D., and Hinton, G. 1998. Neuroanimator: Fast neural network emulation and control of physics-based models. In Proc. ACM SIGGRAPH, ACM, 9--20.

Digital Library

[15]

Hansen, N. 2006. The cma evolution strategy: A comparing review. In Towards a New Evolutionary Computation, 75--102.

[16]

Haruno, M., Wolpert, D. H., and Kawato, M. 2001. Mosaic model for sensorimotor learning and control. Neural computation 13, 10, 2201--2220.

Digital Library

[17]

Hausknecht, M., and Stone, P. 2015. Deep reinforcement learning in parameterized action space. arXiv preprint arXiv:1511.04143.

[18]

Heess, N., Wayne, G., Silver, D., Lillicrap, T., Erez, T., and Tassa, Y. 2015. Learning continuous control policies by stochastic value gradients. In Advances in Neural Information Processing Systems, 2926--2934.

Digital Library

[19]

Hester, T., and Stone, P. 2013. Texplore: real-time sample-efficient reinforcement learning for robots. Machine Learning 90, 3, 385--429.

Digital Library

[20]

Hodgins, J. K., Wooten, W. L., Brogan, D. C., and O'Brien, J. F. 1995. Animating human athletics. In Proceedings of SIGGRAPH 1995, 71--78.

Digital Library

[21]

Jacobs, R. A., Jordan, M. I., Nowlan, S. J., and Hinton, G. E. 1991. Adaptive mixtures of local experts. Neural computation 3, 1, 79--87.

[22]

Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., and Darrell, T. 2014. Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the ACM International Conference on Multimedia, ACM, New York, NY, USA, MM '14, 675--678.

Digital Library

[23]

Laszlo, J., van de Panne, M., and Fiume, E. 1996. Limit cycle control and its application to the animation of balancing and walking. In Proc. ACM SIGGRAPH, 155--162.

Digital Library

[24]

Lee, J., and Lee, K. H. 2006. Precomputing avatar behavior from human motion data. Graphical Models 68, 2, 158--174.

Digital Library

[25]

Lee, Y., Lee, S. J., and Popović, Z. 2009. Compact character controllers. ACM Transctions on Graphics 28, 5, Article 169.

Digital Library

[26]

Lee, Y., Wampler, K., Bernstein, G., Popović, J., and Popović, Z. 2010. Motion fields for interactive character locomotion. ACM Transctions on Graphics 29, 6, Article 138.

Digital Library

[27]

Lee, Y., Kim, S., and Lee, J. 2010. Data-driven biped control. ACM Transctions on Graphics 29, 4, Article 129.

Digital Library

[28]

Levine, S., and Abbeel, P. 2014. Learning neural network policies with guided policy search under unknown dynamics. In Advances in Neural Information Processing Systems 27. 1071--1079.

Digital Library

[29]

Levine, S., and Koltun, V. 2014. Learning complex neural network policies with trajectory optimization. In Proceedings of the 31st International Conference on Machine Learning (ICML-14), 829--837.

[30]

Levine, S., Wang, J. M., Haraux, A., Popović, Z., and Koltun, V. 2012. Continuous character control with low-dimensional embeddings. ACM Transactions on Graphics (TOG) 31, 4, 28.

Digital Library

[31]

Levine, S., Finn, C., Darrell, T., and Abbeel, P. 2015. End-to-end training of deep visuomotor policies. arXiv preprint arXiv:1504.00702.

Digital Library

[32]

Lillicrap, T. P., Hunt, J. J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., and Wierstra, D. 2015. Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971.

[33]

Liu, L., Yin, K., va n d e Panne, M., and Guo, B. 2012. Terrain runner: control, parameterization, composition, and planning for highly dynamic motions. ACM Trans. Graph. 31, 6, 154.

Digital Library

[34]

Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., Graves, A., Riedmiller, M., Fidjeland, A. K., Ostrovski, G., et al. 2015. Human-level control through deep reinforcement learning. Nature 518, 7540, 529--533.

[35]

Mordatch, I., and Todorov, E. 2014. Combining the benefits of function approximation and trajectory optimization. In Robotics: Science and Systems (RSS).

[36]

Mordatch, I., de Lasa, M., and Hertzmann, A. 2010. Robust physics-based locomotion using low-dimensional planning. ACM Trans. Graph. 29, 4, Article 71.

Digital Library

[37]

Mordatch, I., Lowrey, K., Andrew, G., Popovic, Z., and Todorov, E. V. 2015. Interactive control of diverse complex characters with neural networks. In Advances in Neural Information Processing Systems, 3114--3122.

Digital Library

[38]

Muico, U., Lee, Y., Popović, J., and Popović, Z. 2009. Contact-aware nonlinear control of dynamic characters. ACM Trans. Graph. 28, 3, Article 81.

Digital Library

[39]

Muico, U., Popović, J., and Popović, Z. 2011. Composite control of physically simulated characters. ACM Trans. Graph. 30, 3, Article 16.

Digital Library

[40]

Nair, A., Srinivasan, P., Blackwell, S., Alcicek, C., Fearon, R., De Maria, A., Panneershelvam, V., Suley-man, M., Beattie, C., Petersen, S., et al. 2015. Massively parallel methods for deep reinforcement learning. arXiv preprint arXiv:1507.04296.

[41]

Parisotto, E., Ba, J. L., and Salakhutdinov, R. 2015. Actor-mimic: Deep multitask and transfer reinforcement learning. arXiv preprint arXiv:1511.06342.

[42]

Pastor, P., Kalakrishnan, M., Righetti, L., and Schaal, S. 2012. Towards associative skill memories. In Humanoid Robots (Humanoids), 2012 12th IEEE-RAS International Conference on, IEEE, 309--315.

[43]

Peng, X. B., Berseth, G., and van de Panne, M. 2015. Dynamic terrain traversal skills using reinforcement learning. ACM Transactions on Graphics 34, 4.

Digital Library

[44]

Rusu, A. A., Colmenarejo, S. G., Gulcehre, C., Desjardins, G., Kirkpatrick, J., Pascanu, R., Mnih, V., Kavukcuoglu, K., and Hadsell, R. 2015. Policy distillation. arXiv preprint arXiv:1511.06295.

[45]

Schaul, T., Quan, J., Antonoglou, I., and Silver, D. 2015. Prioritized experience replay. arXiv preprint arXiv:1511.05952.

[46]

Schulman, J., Levine, S., Moritz, P., Jordan, M. I., and Abbeel, P. 2015. Trust region policy optimization. CoRR abs/1502.05477.

[47]

Silver, D., Lever, G., Heess, N., Degris, T., Wierstra, D., and Riedmiller, M. 2014. Deterministic policy gradient algorithms. In ICML.

[48]

Sok, K. W., Kim, M., and Lee, J. 2007. Simulating biped behaviors from human motion data. ACM Trans. Graph. 26, 3, Article 107.

Digital Library

[49]

Stadie, B. C., Levine, S., and Abbeel, P. 2015. Incentiviz-ing exploration in reinforcement learning with deep predictive models. arXiv preprint arXiv:1507.00814.

[50]

Tan, J., Liu, K., and Turk, G. 2011. Stable proportional-derivative controllers. Computer Graphics and Applications, IEEE 31, 4, 34--44.

Digital Library

[51]

Tan, J., Gu, Y., Liu, C. K., and Turk, G. 2014. Learning bicycle stunts. ACM Transactions on Graphics (TOG) 33, 4, 50.

Digital Library

[52]

Treuille, A., Lee, Y., and Popović, Z. 2007. Near-optimal character animation with continuous control. ACM Transactions on Graphics (TOG) 26, 3, Article 7.

Digital Library

[53]

Uchibe, E., and Doya, K. 2004. Competitive-cooperative-concurrent reinforcement learning with importance sampling. In Proc. of International Conference on Simulation of Adaptive Behavior: From Animals and Animats, 287--296.

[54]

van der Maaten, L., and Hinton, G. E. 2008. Visualizing high-dimensional data using t-sne. Journal of Machine Learning Research 9, 2579--2605.

[55]

Van Hasselt, H., and Wiering, M. A. 2007. Reinforcement learning in continuous action spaces. In Approximate Dynamic Programming and Reinforcement Learning, 2007. ADPRL 2007. IEEE International Symposium on, IEEE, 272--279.

[56]

Van Hasselt, H., Guez, A., and Silver, D. 2015. Deep reinforcement learning with double q-learning. arXiv preprint arXiv:1509.06461.

[57]

Van Hasselt, H. 2012. Reinforcement learning in continuous state and action spaces. In Reinforcement Learning. Springer, 207--251.

[58]

Wang, J. M., Fleet, D. J., and Hertzmann, A. 2009. Optimizing walking controllers. ACM Transctions on Graphics 28, 5, Article 168.

Digital Library

[59]

Wiering, M., and Van Hasselt, H. 2008. Ensemble algorithms in reinforcement learning. Systems, Man, and Cybernetics, Part B: Cybernetics, IEEE Transactions on 38, 4, 930--936.

Digital Library

[60]

Ye, Y., and Liu, C. K. 2010. Optimal feedback control for character animation using an abstract model. ACM Trans. Graph. 29, 4, Article 74.

Digital Library

[61]

Yin, K., Loken, K., and van de Panne, M. 2007. Simbicon: Simple biped locomotion control. ACM Transctions on Graphics 26, 3, Article 105.

Digital Library

[62]

Yin, K., Coros, S., Beaudoin, P., and van de Panne, M. 2008. Continuation methods for adapting simulated skills. ACM Transctions on Graphics 27, 3, Article 81.

Digital Library

Cited By

Shi KJiang ZLiu BYang GJin M(2025)Synergistic Terrain-Adaptive Morphing and Trajectory Tracking in a Transformable-Wheeled RobotIEEE Robotics and Automation Letters10.1109/LRA.2024.352487610:2(1656-1663)Online publication date: Feb-2025
https://doi.org/10.1109/LRA.2024.3524876
Lin CTung SSu HHsu WWooldridge MDy JNatarajan S(2024)TelTransProceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence and Thirty-Sixth Conference on Innovative Applications of Artificial Intelligence and Fourteenth Symposium on Educational Advances in Artificial Intelligence10.1609/aaai.v38i21.30331(22927-22933)Online publication date: 20-Feb-2024
https://dl.acm.org/doi/10.1609/aaai.v38i21.30331
Shi ZDun XWei HDong SWang ZCheng XHeide FPeng Y(2024)Learned Multi-aperture Color-coded Optics for Snapshot Hyperspectral ImagingACM Transactions on Graphics10.1145/368797643:6(1-11)Online publication date: 19-Dec-2024
https://dl.acm.org/doi/10.1145/3687976
Show More Cited By

Index Terms

Terrain-adaptive locomotion skills using deep reinforcement learning
1. Computing methodologies
  1. Computer graphics
    1. Animation
      1. Physical simulation

Recommendations

Conversational Recommender System Using Deep Reinforcement Learning
RecSys '22: Proceedings of the 16th ACM Conference on Recommender Systems

Deep Reinforcement Learning (DRL) uses the best of both Reinforcement Learning and Deep Learning for solving problems which cannot be addressed by them individually. Deep Reinforcement Learning has been used widely for games, robotics etc. Limited work ...
Reward Shaping in Episodic Reinforcement Learning
AAMAS '17: Proceedings of the 16th Conference on Autonomous Agents and MultiAgent Systems

Recent advancements in reinforcement learning confirm that reinforcement learning techniques can solve large scale problems leading to high quality autonomous decision making. It is a matter of time until we will see large scale applications of ...
Deep reinforcement learning collision avoidance using policy gradient optimisation and Q-learning

Usage of trust region policy optimisation (TRPO) and proximal policy optimisation (PPO) 'children of policy gradient optimisation method' and deep Q-learning network (DQN) in Lidar-based differential robots are proposed using Turtlebot and OpenAI's ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Graphics

ACM Transactions on Graphics Volume 35, Issue 4

July 2016

1396 pages

ISSN:0730-0301

EISSN:1557-7368

DOI:10.1145/2897824

Issue’s Table of Contents

Copyright © 2016 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 July 2016

Published in TOG Volume 35, Issue 4

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

NSERC

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

145
Total Citations
View Citations
2,825
Total Downloads

Downloads (Last 12 months)193
Downloads (Last 6 weeks)31

Reflects downloads up to 17 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Shi KJiang ZLiu BYang GJin M(2025)Synergistic Terrain-Adaptive Morphing and Trajectory Tracking in a Transformable-Wheeled RobotIEEE Robotics and Automation Letters10.1109/LRA.2024.352487610:2(1656-1663)Online publication date: Feb-2025
https://doi.org/10.1109/LRA.2024.3524876
Lin CTung SSu HHsu WWooldridge MDy JNatarajan S(2024)TelTransProceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence and Thirty-Sixth Conference on Innovative Applications of Artificial Intelligence and Fourteenth Symposium on Educational Advances in Artificial Intelligence10.1609/aaai.v38i21.30331(22927-22933)Online publication date: 20-Feb-2024
https://dl.acm.org/doi/10.1609/aaai.v38i21.30331
Shi ZDun XWei HDong SWang ZCheng XHeide FPeng Y(2024)Learned Multi-aperture Color-coded Optics for Snapshot Hyperspectral ImagingACM Transactions on Graphics10.1145/368797643:6(1-11)Online publication date: 19-Dec-2024
https://dl.acm.org/doi/10.1145/3687976
Deng YLi ZXie NZhang WCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)PIMT: Physics-Based Interactive Motion Transition for Hybrid Character AnimationProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681582(10497-10505)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3681582
Liu JZhang SZhang CZhang SCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)Controllable Procedural Generation of LandscapesProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681129(6394-6403)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3681129
Zhan WRay L(2024)Enhancing the Adaptability of Hexapod Robots via Multi-Agent Reinforcement Learning and Value Function Decomposition2024 WRC Symposium on Advanced Robotics and Automation (WRC SARA)10.1109/WRCSARA64167.2024.10685805(1-8)Online publication date: 23-Aug-2024
https://doi.org/10.1109/WRCSARA64167.2024.10685805
Fu QHe SLi XFu H(2024)PlanNet: A Generative Model for Component-Based Plan SynthesisIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2023.327520030:8(4739-4751)Online publication date: 1-Aug-2024
https://dl.acm.org/doi/10.1109/TVCG.2023.3275200
Jiang QGao SGao YYang KYi ZShi HSun LWang K(2024)Minimalist and High-Quality Panoramic Imaging With PSF-Aware TransformersIEEE Transactions on Image Processing10.1109/TIP.2024.344137033(4568-4583)Online publication date: 16-Aug-2024
https://dl.acm.org/doi/10.1109/TIP.2024.3441370
He XLv C(2024)Robotic Control in Adversarial and Sparse Reward Environments: A Robust Goal-Conditioned Reinforcement Learning ApproachIEEE Transactions on Artificial Intelligence10.1109/TAI.2023.32376655:1(244-253)Online publication date: Jan-2024
https://doi.org/10.1109/TAI.2023.3237665
Roberts JDi J(2024)Projected Task-Specific Layers for Multi-Task Reinforcement Learning2024 IEEE International Conference on Robotics and Automation (ICRA)10.1109/ICRA57147.2024.10610483(2887-2893)Online publication date: 13-May-2024
https://doi.org/10.1109/ICRA57147.2024.10610483
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents