Skip to content

Commit

Permalink
Enhance/update src setup (#80)
Browse files Browse the repository at this point in the history
* add required package for win32

* modify commands in README

* add one-line install shell scripts for windows

* .

* ignore warnings"

* ignore warnings

* .

* .

* .

* .

* ...

* .

* .

* .

Co-authored-by: shlee <[email protected]>
  • Loading branch information
dongminlee94 and Clyde21c committed Jun 22, 2022
1 parent a0820f7 commit 8975569
Show file tree
Hide file tree
Showing 12 changed files with 119 additions and 79 deletions.
2 changes: 1 addition & 1 deletion .flake8
Original file line number Diff line number Diff line change
Expand Up @@ -3,4 +3,4 @@ max-line-length = 104
max-complexity = 18
select = B,C,E,F,W,T4,B9
extend-ignore = E203, W503
ignore = E203, E226, E266, E501, W503, E265
ignore = E203, E226, E266, E501, W503, E265, E402
1 change: 1 addition & 0 deletions .pylintrc
Original file line number Diff line number Diff line change
Expand Up @@ -154,6 +154,7 @@ disable=print-statement,
used-before-assignment,
line-too-long,
too-few-public-methods,
wrong-import-position,

# Enable the message, report, category or checker with the given id(s). You can
# either give multiple identifier separated by comma (,) or put this option
Expand Down
2 changes: 1 addition & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -38,4 +38,4 @@ init:
init-dev:
make init
pip install -r requirements-dev.txt
bash ./hooks/install.sh
bash ./scripts/install.sh
42 changes: 33 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,45 +39,69 @@ https://www.anaconda.com/

이어서, 이 레포지토리를 clone한 뒤, 다음의 명령어를 실행하여 필요한 패키지들을 설치해주세요.

**MacOS 및 Linux 사용자**

```bash
# 사용자
$ make init
(meta) $ make init

# 개발자
$ make init-dev
(meta) $ make init-dev
```

### 4. 모델 학습 및 결과 확인
**Windows 사용자**

Meta-SL은 각 알고리즘 폴더로 이동하여 `jupyter notebook`을 이용하여 해당 알고리즘을 실행해주시고 결과를 확인해주세요.
git bash에서 다음 명령어를 실행시켜 conda 명령어가 동작하도록 합니다.

```bash
$ jupyter notebook
$ echo ". /c/Users/{유저이름}/anaconda3/etc/profile.d/conda.sh" >> ~/.profile
```

git bash를 재실행하고 다음 명령어를 차례대로 실행시켜 주세요.

```bash
$ conda activate meta

(meta) $ sh ./scripts/window-init.sh
```

**Colab 사용자**

Colab을 이용하실 경우, 아래의 명령어를 cell에 입력하여 PyTorch 관련 패키지들을 설치하고 이용해주세요.

```python
!pip install torchmeta torchtext==0.10.1 torch==1.9.1+cu111 torchvision==0.10.1+cu111 torchaudio==0.9.1 -f https://download.pytorch.org/whl/torch_stable.html
```

### 4. 모델 학습 및 결과 확인

**Meta-SL**

Meta-SL은 각 알고리즘 폴더로 이동하여 `jupyter notebook`을 이용하여 해당 알고리즘을 실행해주시고 결과를 확인해주세요.

```bash
(meta) $ jupyter notebook
```

**Meta-RL**

Meta-RL은 각 알고리즘 폴더로 이동하여 아래의 명령어들을 이용하여 실행해주세요.

```bash
# RL^2
$ rl2_trainer.py
(meta) $ python rl2_trainer.py

# MAML
$ maml_trainer.py
(meta) $ python maml_trainer.py

# PEARL
$ pearl_trainer.py
(meta) $ python pearl_trainer.py
```

Meta-RL의 경우, 텐서보드를 이용하여 학습 결과를 확인해주세요.

```bash
$ tensorboard --logdir=./results
(meta) $ tensorboard --logdir=./results
```

## Contributors ✨
Expand Down
1 change: 1 addition & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
autopep8==1.5.0
GPUtil==1.4.0
gym>=0.24.1
imageio>=2.1.2
jupyter==1.0.0
jupyter-contrib-nbextensions==0.5.1
jupyter-nbextensions-configurator==0.4.1
Expand Down
2 changes: 1 addition & 1 deletion hooks/install.sh → scripts/install.sh
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# -*- coding: utf-8 -*-
#!/bin/bash

REPO_ROOT=$(git rev-parse --show-toplevel)

Expand Down
9 changes: 9 additions & 0 deletions scripts/window-init.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
#!/bin/bash

pip install -e .
pip install -r requirements.txt
python ./scripts/download-torch.py
conda install -y tensorboard
jupyter contrib nbextension install --user
jupyter nbextensions_configurator enable --user
python -m ipykernel install --user
127 changes: 63 additions & 64 deletions src/meta_rl/maml/algorithm/meta_learner.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,10 +6,13 @@
import datetime
import os
import time
import warnings
from collections import deque
from copy import deepcopy
from typing import Any, Dict, List, Tuple

warnings.filterwarnings("ignore")

import numpy as np
import torch
from gym.envs.mujoco.half_cheetah import HalfCheetahEnv
Expand All @@ -34,7 +37,6 @@ def __init__(
action_dim: int,
train_tasks: List[int],
test_tasks: List[int],
test_interval: int,
save_exp_name: str,
save_file_name: str,
load_exp_name: str,
Expand All @@ -49,7 +51,6 @@ def __init__(
self.agent = agent
self.train_tasks = train_tasks
self.test_tasks = test_tasks
self.test_interval = test_interval

self.num_iterations = config["num_iterations"]
self.meta_batch_size = config["meta_batch_size"]
Expand Down Expand Up @@ -278,39 +279,38 @@ def visualize_within_tensorboard(self, results_summary: Dict[str, Any], iteratio
self.writer.add_scalar("train/kl_after", results_summary["kl_after"], iteration)
self.writer.add_scalar("train/policy_entropy", results_summary["policy_entropy"], iteration)

if iteration % self.test_interval == 0:
self.writer.add_scalar(
"test/return_before_grad",
results_summary["return_before_grad"],
iteration,
)
self.writer.add_scalar(
"test/return_after_grad",
results_summary["return_after_grad"],
iteration,
)
if self.env_name == "vel":
self.writer.add_scalar(
"test/return_before_grad",
results_summary["return_before_grad"],
"test/sum_run_cost_before_grad",
results_summary["sum_run_cost_before_grad"],
iteration,
)
self.writer.add_scalar(
"test/return_after_grad",
results_summary["return_after_grad"],
"test/sum_run_cost_after_grad",
results_summary["sum_run_cost_after_grad"],
iteration,
)
if self.env_name == "vel":
for step in range(len(results_summary["run_cost_before_grad"])):
self.writer.add_scalar(
"test/sum_run_cost_before_grad",
results_summary["sum_run_cost_before_grad"],
iteration,
"run_cost_before_grad/iteration_" + str(iteration),
results_summary["run_cost_before_grad"][step],
step,
)
self.writer.add_scalar(
"test/sum_run_cost_after_grad",
results_summary["sum_run_cost_after_grad"],
iteration,
"run_cost_after_grad/iteration_" + str(iteration),
results_summary["run_cost_after_grad"][step],
step,
)
for step in range(len(results_summary["run_cost_before_grad"])):
self.writer.add_scalar(
"run_cost_before_grad/iteration_" + str(iteration),
results_summary["run_cost_before_grad"][step],
step,
)
self.writer.add_scalar(
"run_cost_after_grad/iteration_" + str(iteration),
results_summary["run_cost_after_grad"][step],
step,
)

self.writer.add_scalar("time/total_time", results_summary["total_time"], iteration)
self.writer.add_scalar("time/time_per_iter", results_summary["time_per_iter"], iteration)
Expand All @@ -336,52 +336,51 @@ def meta_test(
results_summary["total_time"] = time.time() - total_start_time
results_summary["time_per_iter"] = time.time() - start_time

if iteration % self.test_interval == 0:
self.collect_train_data(np.array(self.test_tasks), is_eval=True)

for task in range(len(self.test_tasks)):
batch_before_grad = self.buffers.get_trajs(task, 0)
batch_after_grad = self.buffers.get_trajs(task, self.num_adapt_epochs)

rewards_before_grad = batch_before_grad["rewards"][: self.max_steps]
rewards_after_grad = batch_after_grad["rewards"][: self.max_steps]
returns_before_grad.append(torch.sum(rewards_before_grad).item())
returns_after_grad.append(torch.sum(rewards_after_grad).item())

if self.env_name == "vel":
run_costs_before_grad.append(
batch_before_grad["infos"][: self.max_steps].cpu().numpy(),
)
run_costs_after_grad.append(
batch_after_grad["infos"][: self.max_steps].cpu().numpy(),
)
self.collect_train_data(np.array(self.test_tasks), is_eval=True)

run_cost_before_grad = np.sum(run_costs_before_grad, axis=0)
run_cost_after_grad = np.sum(run_costs_after_grad, axis=0)
for task in range(len(self.test_tasks)):
batch_before_grad = self.buffers.get_trajs(task, 0)
batch_after_grad = self.buffers.get_trajs(task, self.num_adapt_epochs)

self.buffers.clear()
rewards_before_grad = batch_before_grad["rewards"][: self.max_steps]
rewards_after_grad = batch_after_grad["rewards"][: self.max_steps]
returns_before_grad.append(torch.sum(rewards_before_grad).item())
returns_after_grad.append(torch.sum(rewards_after_grad).item())

# Collect meta-test results
results_summary["return_before_grad"] = sum(returns_before_grad) / len(self.test_tasks)
results_summary["return_after_grad"] = sum(returns_after_grad) / len(self.test_tasks)
if self.env_name == "vel":
results_summary["run_cost_before_grad"] = run_cost_before_grad / len(self.test_tasks)
results_summary["run_cost_after_grad"] = run_cost_after_grad / len(self.test_tasks)
results_summary["sum_run_cost_before_grad"] = sum(
abs(run_cost_before_grad / len(self.test_tasks)),
run_costs_before_grad.append(
batch_before_grad["infos"][: self.max_steps].cpu().numpy(),
)
results_summary["sum_run_cost_after_grad"] = sum(
abs(run_cost_after_grad / len(self.test_tasks)),
run_costs_after_grad.append(
batch_after_grad["infos"][: self.max_steps].cpu().numpy(),
)

# Check if each element of self.dq satisfies early stopping condition
self.dq.append(results_summary["return_after_grad"])
if all(list(map((lambda x: x >= self.stop_goal), self.dq))):
self.is_early_stopping = True
run_cost_before_grad = np.sum(run_costs_before_grad, axis=0)
run_cost_after_grad = np.sum(run_costs_after_grad, axis=0)

# Save the trained models
if self.is_early_stopping:
ckpt_path = os.path.join(self.result_path, "checkpoint_" + str(iteration) + ".pt")
torch.save({"policy": self.agent.policy.state_dict()}, ckpt_path)
self.buffers.clear()

# Collect meta-test results
results_summary["return_before_grad"] = sum(returns_before_grad) / len(self.test_tasks)
results_summary["return_after_grad"] = sum(returns_after_grad) / len(self.test_tasks)
if self.env_name == "vel":
results_summary["run_cost_before_grad"] = run_cost_before_grad / len(self.test_tasks)
results_summary["run_cost_after_grad"] = run_cost_after_grad / len(self.test_tasks)
results_summary["sum_run_cost_before_grad"] = sum(
abs(run_cost_before_grad / len(self.test_tasks)),
)
results_summary["sum_run_cost_after_grad"] = sum(
abs(run_cost_after_grad / len(self.test_tasks)),
)

# Check if each element of self.dq satisfies early stopping condition
self.dq.append(results_summary["return_after_grad"])
if all(list(map((lambda x: x >= self.stop_goal), self.dq))):
self.is_early_stopping = True

# Save the trained models
if self.is_early_stopping:
ckpt_path = os.path.join(self.result_path, "checkpoint_" + str(iteration) + ".pt")
torch.save({"policy": self.agent.policy.state_dict()}, ckpt_path)

self.visualize_within_tensorboard(results_summary, iteration)
3 changes: 1 addition & 2 deletions src/meta_rl/maml/maml_trainer.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@
tasks: List[int] = env.get_all_task_idx()

# Set a random seed
env.seed(experiment_config["seed"])
env.reset(seed=experiment_config["seed"])
np.random.seed(experiment_config["seed"])
torch.manual_seed(experiment_config["seed"])

Expand Down Expand Up @@ -65,7 +65,6 @@
action_dim=action_dim,
train_tasks=tasks[: env_target_config["train_tasks"]],
test_tasks=tasks[-env_target_config["test_tasks"] :],
test_interval=experiment_config["test_interval"],
save_exp_name=experiment_config["save_exp_name"],
save_file_name=experiment_config["save_file_name"],
load_exp_name=experiment_config["load_exp_name"],
Expand Down
4 changes: 4 additions & 0 deletions src/meta_rl/pearl/algorithm/meta_learner.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,12 @@
import datetime
import os
import time
import warnings
from collections import deque
from typing import Any, Dict, List

warnings.filterwarnings("ignore")

import numpy as np
import torch
from gym.envs.mujoco.half_cheetah import HalfCheetahEnv
Expand Down Expand Up @@ -80,6 +83,7 @@ def __init__(
if not save_file_name:
save_file_name = datetime.datetime.now().strftime("%Y-%m-%d-%H-%M-%S")
self.result_path = os.path.join("results", save_exp_name, save_file_name)

self.writer = SummaryWriter(log_dir=self.result_path)

if load_exp_name and load_file_name:
Expand Down
2 changes: 1 addition & 1 deletion src/meta_rl/pearl/pearl_trainer.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@
tasks: List[int] = env.get_all_task_idx()

# Set a random seed
env.seed(experiment_config["seed"])
env.reset(seed=experiment_config["seed"])
np.random.seed(experiment_config["seed"])
torch.manual_seed(experiment_config["seed"])

Expand Down
3 changes: 3 additions & 0 deletions src/meta_rl/rl2/algorithm/meta_learner.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,12 @@
import datetime
import os
import time
import warnings
from collections import deque
from typing import Any, Dict, List

warnings.filterwarnings("ignore")

import numpy as np
import torch
from gym.envs.mujoco.half_cheetah import HalfCheetahEnv
Expand Down

0 comments on commit 8975569

Please sign in to comment.