Skip to content
/ rl Public
forked from scott-moura/rl

Introduction to Reinforcement Learning: A Short Course

License

Notifications You must be signed in to change notification settings

Xinyi-0724/rl

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

40 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Introduction to Reinforcement Learning

Welcome! This course is jointly taught by UC Berkeley and the Tsinghua-Berkeley Shenzhen Institute (TBSI).

Instructors

  • Prof. Scott Moura (UC Berkeley) <smoura [at] berkeley.edu>
  • Co-Instructor Saehong Park (UC Berkeley) <sspark [at] berkeley.edu>
  • TA Xinyi Zhou (TBSI) <zxyyx48 [at] 163.com>

Course Schedule

China Time California Time
July 7, 8, 9, 10 (Tu-F) July 6, 7, 8, 9 (M-Th)
July 14, 15, 16, 17 (Tu-F); July 13, 14, 15, 16, 17 (M-Th)
all at 08:30-10:05 China Time all at 5:30pm PT - 7:05pm PT

Add to Google Calendar: ()

Day-by-Day Schedule

Day Topic Speaker Pre-recorded Lecture Slides / Notes Recordings
1 1a. Introduction - Course Org Scott Moura Zoom Recording PW: 1e*OV@Re LEC1a Slides Recording Link PW: 9L%JePa=
1b. Introduction – History of RL Scott Moura Zoom Recording PW: 1k.E69^o LEC1a Slides
1c. Optimal Control Intro Scott Moura Zoom Recording PW: 2B&=2@*@
2 2a. Dynamic Programming Scott Moura Zoom Recording PW: 3F*1rg%? LEC2a Notes Recording Link PW: 8Q?#51=J
2b. Case Study: Linear Quadratic Regulator (LQR) Scott Moura Zoom Recording PW: 5Y#4=58& LEC2b Notes
3 3a. Policy Evaluation & Policy Improvement Scott Moura Zoom Recording PW: 9N@%H4&@ LEC3a Notes Recording Link PW: 1A@@0G63
3b. Policy Iteration Algo Scott Moura Zoom Recording PW: 6y+!+6#9 LEC3b Notes
3c. Case Study: LQR Scott Moura Zoom Recording PW: 6D@YkC&= LEC3c Notes
4 4a. Approximate DP: TD Error & Value Function Approx. Scott Moura Zoom Recording PW: 6v&78$We LEC4a Notes Recording Link PW: 4t=#ye7T
4b. Case Study: LQR Scott Moura Zoom Recording PW: 1O^fh.8+ LEC4b Notes
4c. Online RL with ADP Scott Moura Zoom Recording PW: 0q=.4378 LEC4c Notes
5 5a. Actor-Critic Method Scott Moura
5b. Case Study: Offshore Wind Scott Moura
6 6a. Q-Learning Saehong Park
6b. Q-Learning / Policy Gradient Saehong Park
7 7a. Policy Gradient / Actor-Critic Saehong Park
7b. Actor-Critic Saehong Park
8 8a. RL for Energy Systems Saehong Park
8b. Case Study: Battery Fast-charging Saehong Park

Topic Outline

  1. Optimal Control

  2. Dynamic Programming

    1. Principal of Optimality & Value Functions
      • Case Study: Linear Quadratic Regulator (LQR)
  3. Policy Evaluation & Policy Improvement

    1. Policy Iteration Algo & Variants
    • Case Study: LQR
  4. Approximate Dynamic Programming (ADP)

    1. Temporal Difference (TD) Error
    2. Value Function Approximation
      • Case Study: LQR
    3. Online RL with ADP
    4. Actor-Critic Method
      • Case Study: Offshore Wind
  5. Q-Learning

    1. Q-learning algorithm
    2. Advanced Q-learning algorithm, i.e., DQN
  6. Policy Gradient

    1. Vanilla policy gradient (REINFORCE)
  7. Actor-Critic using Policy Gradient

    1. Actor-Critic using Policy Gradient
    2. Advanced Actor-Critic algorithm, i.e., DDPG
  8. RL for energy systems

    1. Case Study: Battery Fast-charging

Lectures Notes

About

Introduction to Reinforcement Learning: A Short Course

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published