-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Develop #73
Merged
Merged
Develop #73
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…_in_buffer Bugfix/fix number of samples in buffer
…ondition Feature/add early stopping condition
Feature/remove torch utils
* test commit * Create base structure * add a high-level structure guide for development * add a high-level structure guide for development * add a high-level structure guide for development * sync with pearl by dongmin * Update MAML code * Refactored network variable * Bugfix: import error * Refactored all MAML codes * delete unused files * add pyYAML to requirements * add meta_train * define the number of tasks at envs * change a format of config files * change directory of files in the util folder * change agent.train to agent.compute_losses to implement MAML hessian structure * add pylint related version requirement * modify maml_trainer for yaml configs * Match some formats with RL^2 * move maml folder into src folder * add pytest PATH for MAML * Feature/maml_exp_baseline (#57) * Refactor buffers, meta_learner, and sampler modules in PEARL * Refactor RL^2 code to avoid the bug of buffer * image size test * image size test * image size test * Fix image size * Modify the name of PPO variables * Add num_samples config, sampler log, and buffer log * Remove num_sample_tasks config * Add abs function to total_run_cost * Add abs function to total_run_cost * put the get_action method into the PPO.py as an staticmathod * change hidden layer related codes and configuration * add meta-test and logging features * restore added codes for the assumed bug * test commit * test commit3 * add meta-test * change defalt configurations of MAML * Combine value function with policy as a set of meta-model * meta-train and meta-test baseline * Structure discussion * Fix repeated tanh when infer actions from the TanhGaussianPolicy network * Refactor buffer and sampler * Add early stopping condition configs to PEARL config files * Add early stopping condition configs to RL^2 config files * Fix tanh bug to policy network in PEARL * Add early stopping condition to meta-learner * Fix the value to append to dq * Change configs to what are used in the official repo of MAML * Fix tanh bug to policy network in PEARL * Add Linear-feature baseline * Modify to compute advantage based on newly fitted baseline * Add separated meta-update based on PPO algorithm * Add early stopping condition configs to config files * Update early stopping condition to meta learner * Add list to range * Add type annotation to all codes of PEARL * Change dir name from assets to img * Refactor PEARL codes * Fix simple code * Update README because of changing directory from assets to img * Seperate train tasks and test tasks * Set configuration based on references * Delete linear-feature baseline and modify get_log_prob * Remove static method feature from get_action and append None to log_probs to prevent buffer error * Add a method into the buffer to update a value function before compute GAE * Replace linear-feature baseline to value network and Add a variable to store old_policy * Remove redundant code for obtaining adaptation samples and Modify a structure to follow the reference while keeping the log format * Apply PR comment * Utilize num_tasks * Modify pylint statements * Re-arrange the order of methods in the MetaLearner class * Rename confused methods * Remove old_policy and change variable & argument name for enhanced intuition * Simplify log_values * Seperate visualizing method * Change argument name and add additional comments * Modify conditional statements of the sampler * Restore redundant commit of PEARL * Utilize num_tasks while assigning goals as dictionary type * Change argument name for logging * Simplify saving condition of log_prob * Transpose compute_gae and compute_value to ppo.py * Disjoin list compression * Reflect 2nd Review comments of PR57 * Reflect 3rd review comments of PR57 * Remove numpy conversion from cuda tensor * Add interoperability for CUDA * Reflect 4th review comments of PR57 * Change inner-optimizer to Adam * Change configs to match with those of the MAML paper Co-authored-by: dongminlee94 <[email protected]> Co-authored-by: dongminlee94 <[email protected]> Co-authored-by: seunghyun lee <[email protected]>
* Remove unnecessary variable in envs * Add checkpoint saving & loading to PEARL algorithm * Fix log_prob issue to RL^2 algorithm
* replace ppo with trpo * Add type-hint, saveing and loading, early stpping * gaussian policy cuda runnability modification * remove holdout test tasks and add test interval * change the number of test tasks to be sampled * combine train and test batchs in dir task * modify test-batch of dir task to be deterministic * change dir task config * restore heldout-test set * avoid out-of-memory error by reducing the number of adapation * modify early stop condition of vel task * Resolve code reviewer's comments * Refactoring deterministic condition line * Resolve missed code reviewer's comments
* Change configurations of each algorithm * Add saving modules * Add type annotations
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
PR Description
Related Issues
Checklist
Optional section (e.g., code usage, experimental results, TODO)