Skip to content

Commit

Permalink
Develop (#73)
Browse files Browse the repository at this point in the history
* Refactored train model function

* Refactored train model function

* Refactored train model function

* Refactored train model function

* Refactored train model function

* Refactored train model function

* Refactored train model function

* Update train model function

* Update train model function

* Update train model function

* Update train model function

* Update train model function

* Update train model function

* Update train model function

* Update train model function

* Update train model function

* Update train model function

* Update train model function

* Update train model function

* Update train model function

* Update train model function

* Update train model function

* Update train model function

* Update train model function

* Update train model function

* Update train model function

* Update train model function

* Update train model function

* Update train model function

* Update train model function

* Update train model function

* Update train model function

* Update train model function

* Update train model function

* Update train model function

* Update train model function

* Update train model function

* Update train model function

* Update train model function

* Update train model function

* Update kl divergence function

* Update kl divergence function

* Update kl divergence function

* Update kl divergence function

* Update kl divergence function

* Update kl divergence function

* Update kl divergence function

* Update kl divergence function

* Update kl divergence function

* Update kl divergence function

* Update kl divergence function

* Update kl divergence function

* Update kl divergence function

* Update kl divergence function

* Update kl divergence function

* Update train model

* Update train model

* Update train model

* Update train model

* Update train model

* Update train model

* Update train model

* Update train model

* Update train model

* Update train model

* Update train model

* Update train model

* Update train model

* Update train model

* Update train model

* Update train model

* Update train model

* Update train model

* Update train model

* Update train model

* Update train model

* Update train model

* Update train model

* Update train model

* Update train model

* Update train model

* Update train model

* Update train model

* Update train model

* Update train model

* Add repo base files

* Update requirements

* Update Makefile

* Update base PEARL codes

* Update Makefile

* Update Makefile and requirements

* Refactored trainer

* Refactored PEARL code

* Update license to md

* Update pylintrc

* Refactored PEARL variables

* Refactored PEARL variables

* Bugfix: sampler error

* Bugfix: naming error

* Bugfix: import error

* Bugfix: import error

* Bugfix: device error

* Bugfix: device error

* Bugfix: device error

* Bugfix: naming error

* Update PEARL

* Modify default configs

* Bugfix: meta-test error

* Bugfix: seed

* Add tensorboard elements

* Add meta-test metric

* Refactoring meta-test

* Modify import paths

* Update setup.config and env config

* Refactored variable names and code location

* Modify curr_obs name to cur_obs name

* Modify env variable name

* Refactored meta-test code and Modify variable name

* Final refactoring

* Divide into return_before_infer and return_after_infer

* First, add rl^2 codes

* Refactored buffer, networks, sampler

* Refactored all RL^2 codes

* Check runnability of RL^2 sampling

* Add comment on flattening

* Finally, add meta-train codes of RL^2 and check their runnability

* Add RL^2 meta-test code

* Add buffer clear

* Complete RL^2 cheetah-dir

* Modify requirements.txt and setup.cfg

* Change python config files to yaml config files

* RL^2 refactoring and minor bug fix

* Remove empty lines

* Minor bug of meta-test fix

* Modify .pylintrc and Makefile

* Modify config files

* Refactor RL^2 codes

* First, add rl^2 codes

* rebase develop-rl2

* Change directory and rebase develop-rl2

* Modify requirements.txt

* Modify Makefile

* Change envs directory

* Change the config files from py to yaml

* Modify config names

* Refactor pearl codes

* Refactor rl^2 codes

* Add README

* Modify image size

* Add image source

* Add image source

* Add text align code of image

* Add text align code of image

* Add link to image

* image source link test

* image source link test

* image source link test

* image source link test

* image source link test

* image source link test

* image source link test

* image source link test

* image source link test

* image source link test

* image source link test

* image source link test

* image source link test

* image source link test

* image source link test

* image source link test

* image source link test

* image source link test

* Add all contributorsrc

* Add all contributorsrc

* docs: update README.md [skip ci]

* docs: create .all-contributorsrc [skip ci]

* Modify readme

* Modify readme

* docs: update README.md [skip ci]

* docs: create .all-contributorsrc [skip ci]

* Modify readme

* Final commit

* docs: update README.md [skip ci]

* docs: create .all-contributorsrc [skip ci]

* Modify README

* Change tensorboard names

* Modify image size

* Modify num_iterations config

* Refactor buffers, meta_learner, and sampler modules in PEARL

* Refactor RL^2 code to avoid the bug of buffer

* image size test

* image size test

* image size test

* Fix image size

* Modify the name of PPO variables

* Add num_samples config, sampler log, and buffer log

* Remove num_sample_tasks config

* Add abs function to total_run_cost

* Add abs function to total_run_cost

* Refactor buffer and sampler

* Add early stopping condition configs to PEARL config files

* Add early stopping condition configs to RL^2 config files

* Fix tanh bug to policy network in PEARL

* Add early stopping condition to meta-learner

* Fix the value to append to dq

* Add early stopping condition configs to config files

* Update early stopping condition to meta learner

* Add list to range

* Add type annotation to all codes of PEARL

* Change dir name from assets to img

* Refactor PEARL codes

* Fix simple code

* Update README because of changing directory from assets to img

* Apply PR comment

* Develop maml (#60)

* test commit

* Create base structure

* add a high-level structure guide for development

* add a high-level structure guide for development

* add a high-level structure guide for development

* sync with pearl by dongmin

* Update MAML code

* Refactored network variable

* Bugfix: import error

* Refactored all MAML codes

* delete unused files

* add pyYAML to requirements

* add meta_train

* define the number of tasks at envs

* change a format of config files

* change directory of files in the  util folder

* change agent.train to agent.compute_losses to implement  MAML hessian structure

* add pylint related version requirement

* modify maml_trainer for yaml configs

* Match some formats with RL^2

* move maml folder into src folder

* add pytest PATH for MAML

* Feature/maml_exp_baseline (#57)

* Refactor buffers, meta_learner, and sampler modules in PEARL

* Refactor RL^2 code to avoid the bug of buffer

* image size test

* image size test

* image size test

* Fix image size

* Modify the name of PPO variables

* Add num_samples config, sampler log, and buffer log

* Remove num_sample_tasks config

* Add abs function to total_run_cost

* Add abs function to total_run_cost

* put the get_action method into the PPO.py as an staticmathod

* change hidden layer related codes and configuration

* add meta-test and logging features

* restore added codes for the assumed bug

* test commit

* test commit3

* add meta-test

* change defalt configurations of MAML

* Combine value function with policy as a set of meta-model

* meta-train and meta-test baseline

* Structure discussion

* Fix repeated tanh when infer actions from the TanhGaussianPolicy network

* Refactor buffer and sampler

* Add early stopping condition configs to PEARL config files

* Add early stopping condition configs to RL^2 config files

* Fix tanh bug to policy network in PEARL

* Add early stopping condition to meta-learner

* Fix the value to append to dq

* Change configs to what are used in the official repo of MAML

* Fix tanh bug to policy network in PEARL

* Add Linear-feature baseline

* Modify to compute advantage based on newly fitted baseline

* Add separated meta-update based on PPO algorithm

* Add early stopping condition configs to config files

* Update early stopping condition to meta learner

* Add list to range

* Add type annotation to all codes of PEARL

* Change dir name from assets to img

* Refactor PEARL codes

* Fix simple code

* Update README because of changing directory from assets to img

* Seperate train tasks and test tasks

* Set configuration based on references

* Delete linear-feature baseline and modify get_log_prob

* Remove static method feature from get_action and append None to log_probs to prevent buffer error

* Add a method into the buffer to update a value function before compute GAE

* Replace linear-feature baseline to value network and Add a variable to store old_policy

* Remove redundant code for obtaining adaptation samples and Modify a structure to follow the reference while keeping the log format

* Apply PR comment

* Utilize num_tasks

* Modify pylint statements

* Re-arrange the order of methods in the MetaLearner class

* Rename confused methods

* Remove old_policy and change variable & argument name for enhanced intuition

* Simplify log_values

* Seperate visualizing method

* Change argument name and add additional comments

* Modify conditional statements of the sampler

* Restore redundant commit of PEARL

* Utilize num_tasks while assigning goals as dictionary type

* Change argument name for logging

* Simplify saving  condition of log_prob

* Transpose compute_gae and compute_value to ppo.py

* Disjoin list compression

* Reflect 2nd Review comments of PR57

* Reflect 3rd review comments of PR57

* Remove numpy conversion from cuda tensor

* Add interoperability for CUDA

* Reflect 4th review comments of PR57

* Change inner-optimizer to Adam

* Change configs to match with those of the MAML paper

Co-authored-by: dongminlee94 <[email protected]>

Co-authored-by: dongminlee94 <[email protected]>
Co-authored-by: seunghyun lee <[email protected]>

* Change env from pybullet to mujoco (#61)

* Feature/checkpoint saving and loading (#63)

* Remove unnecessary variable in envs

* Add checkpoint saving & loading to PEARL algorithm

* Fix log_prob issue to RL^2 algorithm

* Update PEARL configs (#65)

* Feature/replace ppo with trpo (#67)

* replace ppo with trpo

* Add type-hint, saveing and loading, early stpping

* gaussian policy cuda runnability modification

* remove holdout test tasks and add test interval

* change the number of test tasks to be sampled

* combine train and test batchs in dir task

* modify test-batch of dir task to be  deterministic

* change dir task config

* restore heldout-test set

* avoid out-of-memory error by reducing the number of adapation

* modify early stop condition of vel task

* Resolve code reviewer's comments

* Refactoring deterministic condition line

* Resolve missed code reviewer's comments

* Feature/refactor rl2 (#71)

* Change configurations of each algorithm

* Add saving modules

* Add type annotations

* add codes for meta supervised learning (#72)

Co-authored-by: Yoon, Seungje <[email protected]>
Co-authored-by: Seunghyun Lee <[email protected]>
Co-authored-by: allcontributors[bot] <46447321+allcontributors[bot]@users.noreply.github.com>
Co-authored-by: seunghyun lee <[email protected]>
  • Loading branch information
5 people committed Jun 11, 2022
1 parent a166af3 commit 8e95eef
Show file tree
Hide file tree
Showing 49 changed files with 3,736 additions and 8 deletions.
34 changes: 34 additions & 0 deletions .all-contributorsrc
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
{
"files": [
"README.md"
],
"imageSize": 100,
"commit": false,
"contributors": [
{
"login": "dongminlee94",
"name": "Dongmin Lee",
"avatar_url": "https://avatars.githubusercontent.com/u/29733842?v=4",
"profile": "https://github.com/dongminlee94/",
"contributions": [
"code",
"doc"
]
},
{
"login": "Clyde21c",
"name": "Seunghyun Lee",
"avatar_url": "https://avatars.githubusercontent.com/u/35162035?v=4",
"profile": "https://github.com/Clyde21c/",
"contributions": [
"code"
]
}
],
"contributorsPerLine": 7,
"projectName": "meta-rl",
"projectOwner": "dongminlee94",
"repoType": "github",
"repoHost": "https://github.com",
"skipCi": true
}
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -105,4 +105,4 @@ venv.bak/
runs/

# results
results/
results/
8 changes: 5 additions & 3 deletions .pylintrc
Original file line number Diff line number Diff line change
Expand Up @@ -142,7 +142,9 @@ disable=print-statement,
exception-escape,
comprehension-escape,
no-member,
no-name-in-module
no-name-in-module,
import-error,
duplicate-code,

# Enable the message, report, category or checker with the given id(s). You can
# either give multiple identifier separated by comma (,) or put this option
Expand Down Expand Up @@ -202,7 +204,7 @@ logging-modules=logging
[SPELLING]

# Limits count of emitted suggestions for spelling mistakes.
max-spelling-suggestions=4
max-spelling-suggestions=15

# Spelling dictionary name. Available dictionaries: none. To make it work,
# install the python-enchant package.
Expand Down Expand Up @@ -331,7 +333,7 @@ indent-after-paren=4
indent-string=' '

# Maximum number of characters on a single line.
max-line-length=100
max-line-length=104

# Maximum number of lines in a module.
max-module-lines=1000
Expand Down
File renamed without changes.
6 changes: 4 additions & 2 deletions Makefile
Original file line number Diff line number Diff line change
@@ -1,9 +1,11 @@
format:
black .
black . --line-length 104
isort .

lint:
env PYTHONPATH=. pytest --pylint --flake8 --mypy
env PYTHONPATH=src/rl2 pytest src/rl2 --pylint --flake8 --mypy
env PYTHONPATH=src/maml pytest src/maml --pylint --flake8 --mypy
env PYTHONPATH=src/pearl pytest src/pearl --pylint --flake8 --mypy

setup:
pip install -r requirements.txt
Expand Down
125 changes: 125 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -1 +1,126 @@
<div align="center">
<br>
<img src="./img/meta-rl.png" width="500">
</div>

[Image source](https://cs330.stanford.edu/slides/cs330_lifelonglearning_karol.pdf)

<br>

[![License: MIT](https://img.shields.io/badge/License-MIT-green.svg)](https://opensource.org/licenses/MIT)
[![Python 3.8.8](https://img.shields.io/badge/python-3.8.8-blue.svg)](https://www.python.org/downloads/release/python-388/)
[![PyTorch 1.8.0](https://img.shields.io/badge/pytorch-1.8.0-red.svg)](https://pytorch.org/blog/pytorch-1.8-released/)
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
[![Imports: isort](https://img.shields.io/badge/imports-isort-white)](https://pycqa.github.io/isort/)
[![Linting: flake8 & mypy & pylint](https://img.shields.io/badge/linting-flake8%20%26%20mypy%20%26%20pylint-deepblue)](https://pypi.org/project/pytest-pylint/)
[![All Contributors](https://img.shields.io/badge/all_contributors-2-orange.svg?style=flat-square)](#contributors-)

# Meta-Reinforcement Learning Algorithms with PyTorch

This repository contains PyTorch implementations of meta-reinforcement learning algorithms.

## Prerequisites

This repository is implemented and verified on **python 3.8.8**

## Installation

To run on **pytorch 1.8.0**, enter the [pytorch version link](https://pytorch.org/get-started/previous-versions/#wheel) and run the installation command to desired specifications.

Next, clone this repository and run the following command.

```shell
$ make setup
```

## Python Path

To set python path, move to `meta-rl/`.

```shell
$ cd meta-rl
```

If setting python path on `bashrc`:

```shell
$ echo "export META_HOME=$(pwd)" >> ~/.bashrc
$ echo 'export PYTHONPATH=$META_HOME:$PYTHONPATH' >> ~/.bashrc
```

If setting python path on `zshrc`:

```shell
$ echo "export META_HOME=$(pwd)" >> ~/.zshrc
$ echo 'export PYTHONPATH=$META_HOME:$PYTHONPATH' >> ~/.zshrc
```

## Usages

The repository's high-level structure is:

└── src
├── envs
├── rl2
├── algorithm
├── configs
└── results
├── maml
├── algorithm
├── configs
└── results
└── pearl
├── algorithm
├── configs
└── results

### RL^2

TBU

### MAML

TBU

### PEARL

TBU

### Development

We have setup automatic formatters and linters for this repository.

To run the formatters:

```shell
$ make format
```

To run the linters:

```shell
$ make lint
```

New code should pass the formatters and the linters before being submitted as a PR.

## Contributors ✨

Thanks goes to these wonderful people ([emoji key](https://allcontributors.org/docs/en/emoji-key)):

<!-- ALL-CONTRIBUTORS-LIST:START - Do not remove or modify this section -->
<!-- prettier-ignore-start -->
<!-- markdownlint-disable -->
<table>
<tr>
<td align="center"><a href="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/dongminlee94/"><img src="https://avatars.githubusercontent.com/u/29733842?v=4?s=100" width="100px;" alt=""/><br /><sub><b>Dongmin Lee</b></sub></a><br /><a href="https://github.com/dongminlee94/meta-rl/commits?author=dongminlee94" title="Code">💻</a> <a href="https://github.com/dongminlee94/meta-rl/commits?author=dongminlee94" title="Documentation">📖</a</td>
<td align="center"><a href="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/Clyde21c/"><img src="https://avatars.githubusercontent.com/u/35162035?v=4?s=100" width="100px;" alt=""/><br /><sub><b>Seunghyun Lee</b></sub></a><br /><a href="https://github.com/dongminlee94/meta-rl/commits?author=Clyde21c" title="Code">💻</a></td>
</tr>
</table>

<!-- markdownlint-restore -->
<!-- prettier-ignore-end -->

<!-- ALL-CONTRIBUTORS-LIST:END -->

This project follows the [all-contributors](https://github.com/all-contributors/all-contributors) specification. Contributions of any kind welcome!
Binary file added img/meta-rl.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
5 changes: 3 additions & 2 deletions setup.cfg
Original file line number Diff line number Diff line change
@@ -1,10 +1,11 @@
[isort]
line_length = 88
profile = black
line_length = 104

[mypy]
ignore_missing_imports = True
follow_imports = skip

[flake8]
max-line-length = 88
max-line-length = 104
ignore = E203,W503
29 changes: 29 additions & 0 deletions src/envs/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
"""
Registration code for Half-cheetah environments
"""

import importlib
import os

ENVS = {}


def register_env(name):
"""Register an environment"""

def register_env_fn(filename):
if name in ENVS:
raise ValueError("Cannot register duplicate env {}".format(name))
if not callable(filename):
raise TypeError("env {} must be callable".format(name))
ENVS[name] = filename
return filename

return register_env_fn


# automatically import any envs in the envs/ directory
for file in os.listdir(os.path.dirname(__file__)):
if file.endswith(".py") and not file.startswith("_"):
module = file[: file.find(".py")]
importlib.import_module("src.envs." + module)
44 changes: 44 additions & 0 deletions src/envs/half_cheetah.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
"""
Modified half-cheetah environment
Reference:
https://github.com/katerakelly/oyster/blob/master/rlkit/envs/half_cheetah.py
"""

from typing import List, Union

import numpy as np
from gym.envs.mujoco import HalfCheetahEnv as HalfCheetahEnv_


class HalfCheetahEnv(HalfCheetahEnv_):
def _get_obs(self) -> np.ndarray:
return (
np.concatenate(
[
self.sim.data.qpos.flat[1:],
self.sim.data.qvel.flat,
self.get_body_com("torso").flat,
]
)
.astype(np.float32)
.flatten()
)

def viewer_setup(self) -> None:
camera_id = self.model.camera_name2id("track")
self.viewer.cam.type = 2
self.viewer.cam.fixedcamid = camera_id
self.viewer.cam.distance = self.model.stat.extent * 0.35
# Hide the overlay
self.viewer._hide_overlay = True

def render(self, mode: str = "human") -> Union[List[float], None]:
if mode == "rgb_array":
self._get_viewer().render()
# Window size used for old mujoco-py:
width, height = 500, 500
data = self._get_viewer().read_pixels(width, height, depth=False)
return data
elif mode == "human":
self._get_viewer().render()
65 changes: 65 additions & 0 deletions src/envs/half_cheetah_dir.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
"""
Half-cheetah environment with direction target reward
Reference:
https://github.com/katerakelly/oyster/blob/master/rlkit/envs/half_cheetah_dir.py
"""

from typing import Any, Dict, List, Tuple

import numpy as np

from src.envs import register_env
from src.envs.half_cheetah import HalfCheetahEnv


@register_env("cheetah-dir")
class HalfCheetahDirEnv(HalfCheetahEnv):
"""
Half-cheetah environment class with direction target reward, as described in [1].
The code is adapted from
https://github.com/cbfinn/maml_rl/blob/master/rllab/envs/mujoco/half_cheetah_env_rand_direc.py
The half-cheetah follows the dynamics from MuJoCo [2], and receives at each
time step a reward composed of a control cost and a reward equal to its
velocity in the target direction.
[1] Chelsea Finn, Pieter Abbeel, Sergey Levine, "Model-Agnostic
Meta-Learning for Fast Adaptation of Deep Networks", 2017
(https://arxiv.org/abs/1703.03400)
[2] Emanuel Todorov, Tom Erez, Yuval Tassa, "MuJoCo: A physics engine for
model-based control", 2012
(https://homes.cs.washington.edu/~todorov/papers/TodorovIROS12.pdf)
"""

def __init__(self, num_tasks: int) -> None:
directions = [-1, 1, -1, 1]
self.tasks = [{"direction": direction} for direction in directions]
assert num_tasks == len(self.tasks)
self._task = self.tasks[0]
self._goal_dir = self._task["direction"]
super().__init__()

def step(self, action: np.ndarray) -> Tuple[np.ndarray, np.float64, bool, Dict[str, Any]]:
xposbefore = self.sim.data.qpos[0]
self.do_simulation(action, self.frame_skip)
xposafter = self.sim.data.qpos[0]

progress = (xposafter - xposbefore) / self.dt
run_cost = self._goal_dir * progress
control_cost = 0.5 * 1e-1 * np.sum(np.square(action))

observation = self._get_obs()
reward = run_cost - control_cost
done = False
info = dict(run_cost=run_cost, control_cost=-control_cost, task=self._task)
return observation, reward, done, info

def get_all_task_idx(self) -> List[int]:
return list(range(len(self.tasks)))

def reset_task(self, idx: int) -> None:
self._task = self.tasks[idx]
self._goal_dir = self._task["direction"]
self.reset()
Loading

0 comments on commit 8e95eef

Please sign in to comment.