Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* Refactored train model function * Refactored train model function * Refactored train model function * Refactored train model function * Refactored train model function * Refactored train model function * Refactored train model function * Update train model function * Update train model function * Update train model function * Update train model function * Update train model function * Update train model function * Update train model function * Update train model function * Update train model function * Update train model function * Update train model function * Update train model function * Update train model function * Update train model function * Update train model function * Update train model function * Update train model function * Update train model function * Update train model function * Update train model function * Update train model function * Update train model function * Update train model function * Update train model function * Update train model function * Update train model function * Update train model function * Update train model function * Update train model function * Update train model function * Update train model function * Update train model function * Update train model function * Update kl divergence function * Update kl divergence function * Update kl divergence function * Update kl divergence function * Update kl divergence function * Update kl divergence function * Update kl divergence function * Update kl divergence function * Update kl divergence function * Update kl divergence function * Update kl divergence function * Update kl divergence function * Update kl divergence function * Update kl divergence function * Update kl divergence function * Update train model * Update train model * Update train model * Update train model * Update train model * Update train model * Update train model * Update train model * Update train model * Update train model * Update train model * Update train model * Update train model * Update train model * Update train model * Update train model * Update train model * Update train model * Update train model * Update train model * Update train model * Update train model * Update train model * Update train model * Update train model * Update train model * Update train model * Update train model * Update train model * Update train model * Add repo base files * Update requirements * Update Makefile * Update base PEARL codes * Update Makefile * Update Makefile and requirements * Refactored trainer * Refactored PEARL code * Update license to md * Update pylintrc * Refactored PEARL variables * Refactored PEARL variables * Bugfix: sampler error * Bugfix: naming error * Bugfix: import error * Bugfix: import error * Bugfix: device error * Bugfix: device error * Bugfix: device error * Bugfix: naming error * Update PEARL * Modify default configs * Bugfix: meta-test error * Bugfix: seed * Add tensorboard elements * Add meta-test metric * Refactoring meta-test * Modify import paths * Update setup.config and env config * Refactored variable names and code location * Modify curr_obs name to cur_obs name * Modify env variable name * Refactored meta-test code and Modify variable name * Final refactoring * Divide into return_before_infer and return_after_infer * First, add rl^2 codes * Refactored buffer, networks, sampler * Refactored all RL^2 codes * Check runnability of RL^2 sampling * Add comment on flattening * Finally, add meta-train codes of RL^2 and check their runnability * Add RL^2 meta-test code * Add buffer clear * Complete RL^2 cheetah-dir * Modify requirements.txt and setup.cfg * Change python config files to yaml config files * RL^2 refactoring and minor bug fix * Remove empty lines * Minor bug of meta-test fix * Modify .pylintrc and Makefile * Modify config files * Refactor RL^2 codes * First, add rl^2 codes * rebase develop-rl2 * Change directory and rebase develop-rl2 * Modify requirements.txt * Modify Makefile * Change envs directory * Change the config files from py to yaml * Modify config names * Refactor pearl codes * Refactor rl^2 codes * Add README * Modify image size * Add image source * Add image source * Add text align code of image * Add text align code of image * Add link to image * image source link test * image source link test * image source link test * image source link test * image source link test * image source link test * image source link test * image source link test * image source link test * image source link test * image source link test * image source link test * image source link test * image source link test * image source link test * image source link test * image source link test * image source link test * Add all contributorsrc * Add all contributorsrc * docs: update README.md [skip ci] * docs: create .all-contributorsrc [skip ci] * Modify readme * Modify readme * docs: update README.md [skip ci] * docs: create .all-contributorsrc [skip ci] * Modify readme * Final commit * docs: update README.md [skip ci] * docs: create .all-contributorsrc [skip ci] * Modify README * Change tensorboard names * Modify image size * Modify num_iterations config * Refactor buffers, meta_learner, and sampler modules in PEARL * Refactor RL^2 code to avoid the bug of buffer * image size test * image size test * image size test * Fix image size * Modify the name of PPO variables * Add num_samples config, sampler log, and buffer log * Remove num_sample_tasks config * Add abs function to total_run_cost * Add abs function to total_run_cost * Refactor buffer and sampler * Add early stopping condition configs to PEARL config files * Add early stopping condition configs to RL^2 config files * Fix tanh bug to policy network in PEARL * Add early stopping condition to meta-learner * Fix the value to append to dq * Add early stopping condition configs to config files * Update early stopping condition to meta learner * Add list to range * Add type annotation to all codes of PEARL * Change dir name from assets to img * Refactor PEARL codes * Fix simple code * Update README because of changing directory from assets to img * Apply PR comment * Develop maml (#60) * test commit * Create base structure * add a high-level structure guide for development * add a high-level structure guide for development * add a high-level structure guide for development * sync with pearl by dongmin * Update MAML code * Refactored network variable * Bugfix: import error * Refactored all MAML codes * delete unused files * add pyYAML to requirements * add meta_train * define the number of tasks at envs * change a format of config files * change directory of files in the util folder * change agent.train to agent.compute_losses to implement MAML hessian structure * add pylint related version requirement * modify maml_trainer for yaml configs * Match some formats with RL^2 * move maml folder into src folder * add pytest PATH for MAML * Feature/maml_exp_baseline (#57) * Refactor buffers, meta_learner, and sampler modules in PEARL * Refactor RL^2 code to avoid the bug of buffer * image size test * image size test * image size test * Fix image size * Modify the name of PPO variables * Add num_samples config, sampler log, and buffer log * Remove num_sample_tasks config * Add abs function to total_run_cost * Add abs function to total_run_cost * put the get_action method into the PPO.py as an staticmathod * change hidden layer related codes and configuration * add meta-test and logging features * restore added codes for the assumed bug * test commit * test commit3 * add meta-test * change defalt configurations of MAML * Combine value function with policy as a set of meta-model * meta-train and meta-test baseline * Structure discussion * Fix repeated tanh when infer actions from the TanhGaussianPolicy network * Refactor buffer and sampler * Add early stopping condition configs to PEARL config files * Add early stopping condition configs to RL^2 config files * Fix tanh bug to policy network in PEARL * Add early stopping condition to meta-learner * Fix the value to append to dq * Change configs to what are used in the official repo of MAML * Fix tanh bug to policy network in PEARL * Add Linear-feature baseline * Modify to compute advantage based on newly fitted baseline * Add separated meta-update based on PPO algorithm * Add early stopping condition configs to config files * Update early stopping condition to meta learner * Add list to range * Add type annotation to all codes of PEARL * Change dir name from assets to img * Refactor PEARL codes * Fix simple code * Update README because of changing directory from assets to img * Seperate train tasks and test tasks * Set configuration based on references * Delete linear-feature baseline and modify get_log_prob * Remove static method feature from get_action and append None to log_probs to prevent buffer error * Add a method into the buffer to update a value function before compute GAE * Replace linear-feature baseline to value network and Add a variable to store old_policy * Remove redundant code for obtaining adaptation samples and Modify a structure to follow the reference while keeping the log format * Apply PR comment * Utilize num_tasks * Modify pylint statements * Re-arrange the order of methods in the MetaLearner class * Rename confused methods * Remove old_policy and change variable & argument name for enhanced intuition * Simplify log_values * Seperate visualizing method * Change argument name and add additional comments * Modify conditional statements of the sampler * Restore redundant commit of PEARL * Utilize num_tasks while assigning goals as dictionary type * Change argument name for logging * Simplify saving condition of log_prob * Transpose compute_gae and compute_value to ppo.py * Disjoin list compression * Reflect 2nd Review comments of PR57 * Reflect 3rd review comments of PR57 * Remove numpy conversion from cuda tensor * Add interoperability for CUDA * Reflect 4th review comments of PR57 * Change inner-optimizer to Adam * Change configs to match with those of the MAML paper Co-authored-by: dongminlee94 <[email protected]> Co-authored-by: dongminlee94 <[email protected]> Co-authored-by: seunghyun lee <[email protected]> * Change env from pybullet to mujoco (#61) * Feature/checkpoint saving and loading (#63) * Remove unnecessary variable in envs * Add checkpoint saving & loading to PEARL algorithm * Fix log_prob issue to RL^2 algorithm * Update PEARL configs (#65) * Feature/replace ppo with trpo (#67) * replace ppo with trpo * Add type-hint, saveing and loading, early stpping * gaussian policy cuda runnability modification * remove holdout test tasks and add test interval * change the number of test tasks to be sampled * combine train and test batchs in dir task * modify test-batch of dir task to be deterministic * change dir task config * restore heldout-test set * avoid out-of-memory error by reducing the number of adapation * modify early stop condition of vel task * Resolve code reviewer's comments * Refactoring deterministic condition line * Resolve missed code reviewer's comments * Feature/refactor rl2 (#71) * Change configurations of each algorithm * Add saving modules * Add type annotations * add codes for meta supervised learning (#72) Co-authored-by: Yoon, Seungje <[email protected]> Co-authored-by: Seunghyun Lee <[email protected]> Co-authored-by: allcontributors[bot] <46447321+allcontributors[bot]@users.noreply.github.com> Co-authored-by: seunghyun lee <[email protected]>
- Loading branch information