Develop (#73)

* Refactored train model function * Refactored train model function * Refactored train model function * Refactored train model function * Refactored train model function * Refactored train model function * Refactored train model function * Update train model function * Update train model function * Update train model function * Update train model function * Update train model function * Update train model function * Update train model function * Update train model function * Update train model function * Update train model function * Update train model function * Update train model function * Update train model function * Update train model function * Update train model function * Update train model function * Update train model function * Update train model function * Update train model function * Update train model function * Update train model function * Update train model function * Update train model function * Update train model function * Update train model function * Update train model function * Update train model function * Update train model function * Update train model function * Update train model function * Update train model function * Update train model function * Update train model function * Update kl divergence function * Update kl divergence function * Update kl divergence function * Update kl divergence function * Update kl divergence function * Update kl divergence function * Update kl divergence function * Update kl divergence function * Update kl divergence function * Update kl divergence function * Update kl divergence function * Update kl divergence function * Update kl divergence function * Update kl divergence function * Update kl divergence function * Update train model * Update train model * Update train model * Update train model * Update train model * Update train model * Update train model * Update train model * Update train model * Update train model * Update train model * Update train model * Update train model * Update train model * Update train model * Update train model * Update train model * Update train model * Update train model * Update train model * Update train model * Update train model * Update train model * Update train model * Update train model * Update train model * Update train model * Update train model * Update train model * Update train model * Add repo base files * Update requirements * Update Makefile * Update base PEARL codes * Update Makefile * Update Makefile and requirements * Refactored trainer * Refactored PEARL code * Update license to md * Update pylintrc * Refactored PEARL variables * Refactored PEARL variables * Bugfix: sampler error * Bugfix: naming error * Bugfix: import error * Bugfix: import error * Bugfix: device error * Bugfix: device error * Bugfix: device error * Bugfix: naming error * Update PEARL * Modify default configs * Bugfix: meta-test error * Bugfix: seed * Add tensorboard elements * Add meta-test metric * Refactoring meta-test * Modify import paths * Update setup.config and env config * Refactored variable names and code location * Modify curr_obs name to cur_obs name * Modify env variable name * Refactored meta-test code and Modify variable name * Final refactoring * Divide into return_before_infer and return_after_infer * First, add rl^2 codes * Refactored buffer, networks, sampler * Refactored all RL^2 codes * Check runnability of RL^2 sampling * Add comment on flattening * Finally, add meta-train codes of RL^2 and check their runnability * Add RL^2 meta-test code * Add buffer clear * Complete RL^2 cheetah-dir * Modify requirements.txt and setup.cfg * Change python config files to yaml config files * RL^2 refactoring and minor bug fix * Remove empty lines * Minor bug of meta-test fix * Modify .pylintrc and Makefile * Modify config files * Refactor RL^2 codes * First, add rl^2 codes * rebase develop-rl2 * Change directory and rebase develop-rl2 * Modify requirements.txt * Modify Makefile * Change envs directory * Change the config files from py to yaml * Modify config names * Refactor pearl codes * Refactor rl^2 codes * Add README * Modify image size * Add image source * Add image source * Add text align code of image * Add text align code of image * Add link to image * image source link test * image source link test * image source link test * image source link test * image source link test * image source link test * image source link test * image source link test * image source link test * image source link test * image source link test * image source link test * image source link test * image source link test * image source link test * image source link test * image source link test * image source link test * Add all contributorsrc * Add all contributorsrc * docs: update README.md [skip ci] * docs: create .all-contributorsrc [skip ci] * Modify readme * Modify readme * docs: update README.md [skip ci] * docs: create .all-contributorsrc [skip ci] * Modify readme * Final commit * docs: update README.md [skip ci] * docs: create .all-contributorsrc [skip ci] * Modify README * Change tensorboard names * Modify image size * Modify num_iterations config * Refactor buffers, meta_learner, and sampler modules in PEARL * Refactor RL^2 code to avoid the bug of buffer * image size test * image size test * image size test * Fix image size * Modify the name of PPO variables * Add num_samples config, sampler log, and buffer log * Remove num_sample_tasks config * Add abs function to total_run_cost * Add abs function to total_run_cost * Refactor buffer and sampler * Add early stopping condition configs to PEARL config files * Add early stopping condition configs to RL^2 config files * Fix tanh bug to policy network in PEARL * Add early stopping condition to meta-learner * Fix the value to append to dq * Add early stopping condition configs to config files * Update early stopping condition to meta learner * Add list to range * Add type annotation to all codes of PEARL * Change dir name from assets to img * Refactor PEARL codes * Fix simple code * Update README because of changing directory from assets to img * Apply PR comment * Develop maml (#60) * test commit * Create base structure * add a high-level structure guide for development * add a high-level structure guide for development * add a high-level structure guide for development * sync with pearl by dongmin * Update MAML code * Refactored network variable * Bugfix: import error * Refactored all MAML codes * delete unused files * add pyYAML to requirements * add meta_train * define the number of tasks at envs * change a format of config files * change directory of files in the util folder * change agent.train to agent.compute_losses to implement MAML hessian structure * add pylint related version requirement * modify maml_trainer for yaml configs * Match some formats with RL^2 * move maml folder into src folder * add pytest PATH for MAML * Feature/maml_exp_baseline (#57) * Refactor buffers, meta_learner, and sampler modules in PEARL * Refactor RL^2 code to avoid the bug of buffer * image size test * image size test * image size test * Fix image size * Modify the name of PPO variables * Add num_samples config, sampler log, and buffer log * Remove num_sample_tasks config * Add abs function to total_run_cost * Add abs function to total_run_cost * put the get_action method into the PPO.py as an staticmathod * change hidden layer related codes and configuration * add meta-test and logging features * restore added codes for the assumed bug * test commit * test commit3 * add meta-test * change defalt configurations of MAML * Combine value function with policy as a set of meta-model * meta-train and meta-test baseline * Structure discussion * Fix repeated tanh when infer actions from the TanhGaussianPolicy network * Refactor buffer and sampler * Add early stopping condition configs to PEARL config files * Add early stopping condition configs to RL^2 config files * Fix tanh bug to policy network in PEARL * Add early stopping condition to meta-learner * Fix the value to append to dq * Change configs to what are used in the official repo of MAML * Fix tanh bug to policy network in PEARL * Add Linear-feature baseline * Modify to compute advantage based on newly fitted baseline * Add separated meta-update based on PPO algorithm * Add early stopping condition configs to config files * Update early stopping condition to meta learner * Add list to range * Add type annotation to all codes of PEARL * Change dir name from assets to img * Refactor PEARL codes * Fix simple code * Update README because of changing directory from assets to img * Seperate train tasks and test tasks * Set configuration based on references * Delete linear-feature baseline and modify get_log_prob * Remove static method feature from get_action and append None to log_probs to prevent buffer error * Add a method into the buffer to update a value function before compute GAE * Replace linear-feature baseline to value network and Add a variable to store old_policy * Remove redundant code for obtaining adaptation samples and Modify a structure to follow the reference while keeping the log format * Apply PR comment * Utilize num_tasks * Modify pylint statements * Re-arrange the order of methods in the MetaLearner class * Rename confused methods * Remove old_policy and change variable & argument name for enhanced intuition * Simplify log_values * Seperate visualizing method * Change argument name and add additional comments * Modify conditional statements of the sampler * Restore redundant commit of PEARL * Utilize num_tasks while assigning goals as dictionary type * Change argument name for logging * Simplify saving condition of log_prob * Transpose compute_gae and compute_value to ppo.py * Disjoin list compression * Reflect 2nd Review comments of PR57 * Reflect 3rd review comments of PR57 * Remove numpy conversion from cuda tensor * Add interoperability for CUDA * Reflect 4th review comments of PR57 * Change inner-optimizer to Adam * Change configs to match with those of the MAML paper Co-authored-by: dongminlee94 <[email protected]> Co-authored-by: dongminlee94 <[email protected]> Co-authored-by: seunghyun lee <[email protected]> * Change env from pybullet to mujoco (#61) * Feature/checkpoint saving and loading (#63) * Remove unnecessary variable in envs * Add checkpoint saving & loading to PEARL algorithm * Fix log_prob issue to RL^2 algorithm * Update PEARL configs (#65) * Feature/replace ppo with trpo (#67) * replace ppo with trpo * Add type-hint, saveing and loading, early stpping * gaussian policy cuda runnability modification * remove holdout test tasks and add test interval * change the number of test tasks to be sampled * combine train and test batchs in dir task * modify test-batch of dir task to be deterministic * change dir task config * restore heldout-test set * avoid out-of-memory error by reducing the number of adapation * modify early stop condition of vel task * Resolve code reviewer's comments * Refactoring deterministic condition line * Resolve missed code reviewer's comments * Feature/refactor rl2 (#71) * Change configurations of each algorithm * Add saving modules * Add type annotations * add codes for meta supervised learning (#72) Co-authored-by: Yoon, Seungje <[email protected]> Co-authored-by: Seunghyun Lee <[email protected]> Co-authored-by: allcontributors[bot] <46447321+allcontributors[bot]@users.noreply.github.com> Co-authored-by: seunghyun lee <[email protected]>
dongminlee94 · Jun 11, 2022 · 8e95eef · 8e95eef
1 parent a166af3
commit 8e95eef
Show file tree

Hide file tree

Showing 49 changed files with 3,736 additions and 8 deletions.
diff --git a/.all-contributorsrc b/.all-contributorsrc
@@ -0,0 +1,34 @@
+{
+ "files": [
+ "README.md"
+ ],
+ "imageSize": 100,
+ "commit": false,
+ "contributors": [
+ {
+ "login": "dongminlee94",
+ "name": "Dongmin Lee",
+ "avatar_url": "https://avatars.githubusercontent.com/u/29733842?v=4",
+ "profile": "https://github.com/dongminlee94/",
+ "contributions": [
+ "code",
+ "doc"
+ ]
+ },
+ {
+ "login": "Clyde21c",
+ "name": "Seunghyun Lee",
+ "avatar_url": "https://avatars.githubusercontent.com/u/35162035?v=4",
+ "profile": "https://github.com/Clyde21c/",
+ "contributions": [
+ "code"
+ ]
+ }
+ ],
+ "contributorsPerLine": 7,
+ "projectName": "meta-rl",
+ "projectOwner": "dongminlee94",
+ "repoType": "github",
+ "repoHost": "https://github.com",
+ "skipCi": true
+}
diff --git a/.gitignore b/.gitignore
@@ -105,4 +105,4 @@ venv.bak/
 runs/
 
 # results
-results/
+results/
diff --git a/.pylintrc b/.pylintrc
@@ -142,7 +142,9 @@ disable=print-statement,
  exception-escape,
  comprehension-escape,
  no-member,
- no-name-in-module
+ no-name-in-module,
+ import-error,
+ duplicate-code,
 
 # Enable the message, report, category or checker with the given id(s). You can
 # either give multiple identifier separated by comma (,) or put this option
@@ -202,7 +204,7 @@ logging-modules=logging
 [SPELLING]
 
 # Limits count of emitted suggestions for spelling mistakes.
-max-spelling-suggestions=4
+max-spelling-suggestions=15
 
 # Spelling dictionary name. Available dictionaries: none. To make it work,
 # install the python-enchant package.
@@ -331,7 +333,7 @@ indent-after-paren=4
 indent-string=' '
 
 # Maximum number of characters on a single line.
-max-line-length=100
+max-line-length=104
 
 # Maximum number of lines in a module.
 max-module-lines=1000

diff --git a/LICENSE → LICENSE.md b/LICENSE → LICENSE.md
diff --git a/Makefile b/Makefile
@@ -1,9 +1,11 @@
 format:
- black .
+ black . --line-length 104
  isort .
 
 lint:
- env PYTHONPATH=. pytest --pylint --flake8 --mypy
+ env PYTHONPATH=src/rl2 pytest src/rl2 --pylint --flake8 --mypy
+ env PYTHONPATH=src/maml pytest src/maml --pylint --flake8 --mypy
+ env PYTHONPATH=src/pearl pytest src/pearl --pylint --flake8 --mypy
 
 setup:
  pip install -r requirements.txt

diff --git a/README.md b/README.md
@@ -1 +1,126 @@
+<div align="center">
+ <br>
+ <img src="./img/meta-rl.png" width="500">
+</div>
+
+[Image source](https://cs330.stanford.edu/slides/cs330_lifelonglearning_karol.pdf)
+
+<br>
+
+[![License: MIT](https://img.shields.io/badge/License-MIT-green.svg)](https://opensource.org/licenses/MIT)
+[![Python 3.8.8](https://img.shields.io/badge/python-3.8.8-blue.svg)](https://www.python.org/downloads/release/python-388/)
+[![PyTorch 1.8.0](https://img.shields.io/badge/pytorch-1.8.0-red.svg)](https://pytorch.org/blog/pytorch-1.8-released/)
+[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
+[![Imports: isort](https://img.shields.io/badge/imports-isort-white)](https://pycqa.github.io/isort/)
+[![Linting: flake8 & mypy & pylint](https://img.shields.io/badge/linting-flake8%20%26%20mypy%20%26%20pylint-deepblue)](https://pypi.org/project/pytest-pylint/)
+[![All Contributors](https://img.shields.io/badge/all_contributors-2-orange.svg?style=flat-square)](#contributors-)
+
 # Meta-Reinforcement Learning Algorithms with PyTorch
+
+This repository contains PyTorch implementations of meta-reinforcement learning algorithms.
+
+## Prerequisites
+
+This repository is implemented and verified on **python 3.8.8**
+
+## Installation
+
+To run on **pytorch 1.8.0**, enter the [pytorch version link](https://pytorch.org/get-started/previous-versions/#wheel) and run the installation command to desired specifications.
+
+Next, clone this repository and run the following command.
+
+```shell
+$ make setup
+```
+
+## Python Path
+
+To set python path, move to `meta-rl/`.
+
+```shell
+$ cd meta-rl
+```
+
+If setting python path on `bashrc`:
+
+```shell
+$ echo "export META_HOME=$(pwd)" >> ~/.bashrc
+$ echo 'export PYTHONPATH=$META_HOME:$PYTHONPATH' >> ~/.bashrc
+```
+
+If setting python path on `zshrc`:
+
+```shell
+$ echo "export META_HOME=$(pwd)" >> ~/.zshrc
+$ echo 'export PYTHONPATH=$META_HOME:$PYTHONPATH' >> ~/.zshrc
+```
+
+## Usages
+
+The repository's high-level structure is:
+
+ └── src
+  ├── envs
+  ├── rl2
+  ├── algorithm
+  ├── configs
+  └── results
+  ├── maml
+  ├── algorithm
+  ├── configs
+  └── results
+  └── pearl
+  ├── algorithm
+  ├── configs
+  └── results
+
+### RL^2
+
+TBU
+
+### MAML
+
+TBU
+
+### PEARL
+
+TBU
+
+### Development
+
+We have setup automatic formatters and linters for this repository.
+
+To run the formatters:
+
+```shell
+$ make format
+```
+
+To run the linters:
+
+```shell
+$ make lint
+```
+
+New code should pass the formatters and the linters before being submitted as a PR.
+
+## Contributors ✨
+
+Thanks goes to these wonderful people ([emoji key](https://allcontributors.org/docs/en/emoji-key)):
+
+<!-- ALL-CONTRIBUTORS-LIST:START - Do not remove or modify this section -->
+<!-- prettier-ignore-start -->
+<!-- markdownlint-disable -->
+<table>
+ <tr>
+ <td align="center"><a href="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/dongminlee94/"><img src="https://avatars.githubusercontent.com/u/29733842?v=4?s=100" width="100px;" alt=""/><br /><sub><b>Dongmin Lee</b></sub></a><br /><a href="https://github.com/dongminlee94/meta-rl/commits?author=dongminlee94" title="Code">💻</a> <a href="https://github.com/dongminlee94/meta-rl/commits?author=dongminlee94" title="Documentation">📖</a</td>
+ <td align="center"><a href="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/Clyde21c/"><img src="https://avatars.githubusercontent.com/u/35162035?v=4?s=100" width="100px;" alt=""/><br /><sub><b>Seunghyun Lee</b></sub></a><br /><a href="https://github.com/dongminlee94/meta-rl/commits?author=Clyde21c" title="Code">💻</a></td>
+ </tr>
+</table>
+
+<!-- markdownlint-restore -->
+<!-- prettier-ignore-end -->
+
+<!-- ALL-CONTRIBUTORS-LIST:END -->
+
+This project follows the [all-contributors](https://github.com/all-contributors/all-contributors) specification. Contributions of any kind welcome!
diff --git a/img/meta-rl.png b/img/meta-rl.png
diff --git a/setup.cfg b/setup.cfg
@@ -1,10 +1,11 @@
 [isort]
-line_length = 88
+profile = black
+line_length = 104
 
 [mypy]
 ignore_missing_imports = True
 follow_imports = skip
 
 [flake8]
-max-line-length = 88
+max-line-length = 104
 ignore = E203,W503
diff --git a/src/envs/__init__.py b/src/envs/__init__.py
@@ -0,0 +1,29 @@
+"""
+Registration code for Half-cheetah environments
+"""
+
+import importlib
+import os
+
+ENVS = {}
+
+
+def register_env(name):
+ """Register an environment"""
+
+ def register_env_fn(filename):
+ if name in ENVS:
+ raise ValueError("Cannot register duplicate env {}".format(name))
+ if not callable(filename):
+ raise TypeError("env {} must be callable".format(name))
+ ENVS[name] = filename
+ return filename
+
+ return register_env_fn
+
+
+# automatically import any envs in the envs/ directory
+for file in os.listdir(os.path.dirname(__file__)):
+ if file.endswith(".py") and not file.startswith("_"):
+ module = file[: file.find(".py")]
+ importlib.import_module("src.envs." + module)
diff --git a/src/envs/half_cheetah.py b/src/envs/half_cheetah.py
@@ -0,0 +1,44 @@
+"""
+Modified half-cheetah environment
+
+Reference:
+ https://github.com/katerakelly/oyster/blob/master/rlkit/envs/half_cheetah.py
+"""
+
+from typing import List, Union
+
+import numpy as np
+from gym.envs.mujoco import HalfCheetahEnv as HalfCheetahEnv_
+
+
+class HalfCheetahEnv(HalfCheetahEnv_):
+ def _get_obs(self) -> np.ndarray:
+ return (
+ np.concatenate(
+ [
+ self.sim.data.qpos.flat[1:],
+ self.sim.data.qvel.flat,
+ self.get_body_com("torso").flat,
+ ]
+ )
+ .astype(np.float32)
+ .flatten()
+ )
+
+ def viewer_setup(self) -> None:
+ camera_id = self.model.camera_name2id("track")
+ self.viewer.cam.type = 2
+ self.viewer.cam.fixedcamid = camera_id
+ self.viewer.cam.distance = self.model.stat.extent * 0.35
+ # Hide the overlay
+ self.viewer._hide_overlay = True
+
+ def render(self, mode: str = "human") -> Union[List[float], None]:
+ if mode == "rgb_array":
+ self._get_viewer().render()
+ # Window size used for old mujoco-py:
+ width, height = 500, 500
+ data = self._get_viewer().read_pixels(width, height, depth=False)
+ return data
+ elif mode == "human":
+ self._get_viewer().render()
diff --git a/src/envs/half_cheetah_dir.py b/src/envs/half_cheetah_dir.py
@@ -0,0 +1,65 @@
+"""
+Half-cheetah environment with direction target reward
+
+Reference:
+ https://github.com/katerakelly/oyster/blob/master/rlkit/envs/half_cheetah_dir.py
+"""
+
+from typing import Any, Dict, List, Tuple
+
+import numpy as np
+
+from src.envs import register_env
+from src.envs.half_cheetah import HalfCheetahEnv
+
+
+@register_env("cheetah-dir")
+class HalfCheetahDirEnv(HalfCheetahEnv):
+ """
+ Half-cheetah environment class with direction target reward, as described in [1].
+
+ The code is adapted from
+ https://github.com/cbfinn/maml_rl/blob/master/rllab/envs/mujoco/half_cheetah_env_rand_direc.py
+
+ The half-cheetah follows the dynamics from MuJoCo [2], and receives at each
+ time step a reward composed of a control cost and a reward equal to its
+ velocity in the target direction.
+
+ [1] Chelsea Finn, Pieter Abbeel, Sergey Levine, "Model-Agnostic
+ Meta-Learning for Fast Adaptation of Deep Networks", 2017
+ (https://arxiv.org/abs/1703.03400)
+ [2] Emanuel Todorov, Tom Erez, Yuval Tassa, "MuJoCo: A physics engine for
+ model-based control", 2012
+ (https://homes.cs.washington.edu/~todorov/papers/TodorovIROS12.pdf)
+ """
+
+ def __init__(self, num_tasks: int) -> None:
+ directions = [-1, 1, -1, 1]
+ self.tasks = [{"direction": direction} for direction in directions]
+ assert num_tasks == len(self.tasks)
+ self._task = self.tasks[0]
+ self._goal_dir = self._task["direction"]
+ super().__init__()
+
+ def step(self, action: np.ndarray) -> Tuple[np.ndarray, np.float64, bool, Dict[str, Any]]:
+ xposbefore = self.sim.data.qpos[0]
+ self.do_simulation(action, self.frame_skip)
+ xposafter = self.sim.data.qpos[0]
+
+ progress = (xposafter - xposbefore) / self.dt
+ run_cost = self._goal_dir * progress
+ control_cost = 0.5 * 1e-1 * np.sum(np.square(action))
+
+ observation = self._get_obs()
+ reward = run_cost - control_cost
+ done = False
+ info = dict(run_cost=run_cost, control_cost=-control_cost, task=self._task)
+ return observation, reward, done, info
+
+ def get_all_task_idx(self) -> List[int]:
+ return list(range(len(self.tasks)))
+
+ def reset_task(self, idx: int) -> None:
+ self._task = self.tasks[idx]
+ self._goal_dir = self._task["direction"]
+ self.reset()