GitHub - Xinyi-0724/rl: Introduction to Reinforcement Learning: A Short Course

Introduction to Reinforcement Learning

Welcome! This course is jointly taught by UC Berkeley and the Tsinghua-Berkeley Shenzhen Institute (TBSI).

Add to Google Calendar: ()

Day	Topic	Speaker	Pre-recorded Lecture	Slides / Notes	Recordings
1	1a. Introduction - Course Org	Scott Moura	Zoom Recording PW: 1e*OV@Re	LEC1a Slides	Recording Link PW: 9L%JePa=
	1b. Introduction – History of RL	Scott Moura	Zoom Recording PW: 1k.E69^o	LEC1a Slides
	1c. Optimal Control Intro	Scott Moura	Zoom Recording PW: 2B&=2@*@
2	2a. Dynamic Programming	Scott Moura	Zoom Recording PW: 3F*1rg%?	LEC2a Notes	Recording Link PW: 8Q?#51=J
	2b. Case Study: Linear Quadratic Regulator (LQR)	Scott Moura	Zoom Recording PW: 5Y#4=58&	LEC2b Notes
3	3a. Policy Evaluation & Policy Improvement	Scott Moura	Zoom Recording PW: 9N@%H4&@	LEC3a Notes	Recording Link PW: 1A@@0G63
	3b. Policy Iteration Algo	Scott Moura	Zoom Recording PW: 6y+!+6#9	LEC3b Notes
	3c. Case Study: LQR	Scott Moura	Zoom Recording PW: 6D@YkC&=	LEC3c Notes
4	4a. Approximate DP: TD Error & Value Function Approx.	Scott Moura	Zoom Recording PW: 6v&78$We	LEC4a Notes	Recording Link PW: 4t=#ye7T
	4b. Case Study: LQR	Scott Moura	Zoom Recording PW: 1O^fh.8+	LEC4b Notes
	4c. Online RL with ADP	Scott Moura	Zoom Recording PW: 0q=.4378	LEC4c Notes
5	5a. Actor-Critic Method	Scott Moura
	5b. Case Study: Offshore Wind	Scott Moura
6	6a. Q-Learning	Saehong Park
	6b. Q-Learning / Policy Gradient	Saehong Park
7	7a. Policy Gradient / Actor-Critic	Saehong Park
	7b. Actor-Critic	Saehong Park
8	8a. RL for Energy Systems	Saehong Park
	8b. Case Study: Battery Fast-charging	Saehong Park

Optimal Control
Dynamic Programming
1. Principal of Optimality & Value Functions
  - Case Study: Linear Quadratic Regulator (LQR)
Policy Evaluation & Policy Improvement
1. Policy Iteration Algo & Variants
- Case Study: LQR
Approximate Dynamic Programming (ADP)
1. Temporal Difference (TD) Error
2. Value Function Approximation
  - Case Study: LQR
3. Online RL with ADP
4. Actor-Critic Method
  - Case Study: Offshore Wind
Q-Learning
1. Q-learning algorithm
2. Advanced Q-learning algorithm, i.e., DQN
Policy Gradient
1. Vanilla policy gradient (REINFORCE)
Actor-Critic using Policy Gradient
1. Actor-Critic using Policy Gradient
2. Advanced Actor-Critic algorithm, i.e., DDPG
RL for energy systems
1. Case Study: Battery Fast-charging

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
LEC		LEC
Notes		Notes
.DS_Store		.DS_Store
LICENSE		LICENSE
LectureNotes_2020.pdf		LectureNotes_2020.pdf
README.md		README.md
_config.yml		_config.yml