- Updated Distribution class to include dimensions.
- Updated Mujoco to v1.31
- Fixed
tensor_utils.concat_tensor_dict_list
to handle nested situations properly.
- Default nonlinearity for
CategoricalMLPPolicy
changed totanh
as well, for consistency. - Add
flatten_n
,unflatten_n
support forDiscrete
andProduct
spaces. - Changed
dist_info_sym
anddist_info
interface for policies. Previously it takes both the observations and actions as input arguments, where actions are needed for recurrent policies when the policy takes both the current state and the previous action into account. However this is rather artificial. The interface is now changed to take in the observation plus a dictionary of state-related information. An extra propertystate_info_keys
is added to specify the list of keys used for state-related information. By default this is an empty list. - Removed
lasagne_recurrent.py
since it's not used anywhere, and its functionality is replaced byGRUNetwork
implemented inrllab.core.network
.
- Restored the default value of the
whole_paths
parameter inBatchPolopt
back toTrue
. This is more consistent with previous configurations.
- Removed the helper method
rllab.misc.ext.merge_dict
. Turns out Python'sdict
constructor already supports this functionality:merge_dict(dict1, dict2) == dict(dict1, **dict2)
. - Added a
min_std
option toGaussianMLPPolicy
. This avoids the gradients being unstable near deterministic policies.
- Added a method
truncate_paths
to therllab.sampler.parallel_sampler
module. This should be sufficient to replace the old configurable parameterwhole_paths
which has been removed during refactoring.
- Known issues:
- TRPO does not work well with relu since the hessian is undefined at 0, causing NaN sometimes. This issue of Theano is tracked here: Theano/Theano#4353). If relu must be used, try using
theano.tensor.maximum(x, 0.)
as opposed totheano.tensor.nnet.relu
.
- TRPO does not work well with relu since the hessian is undefined at 0, causing NaN sometimes. This issue of Theano is tracked here: Theano/Theano#4353). If relu must be used, try using
- Fixed bug of TNPG (max_backtracks should be set to 1 instead of 0)
- Neural network policies now use tanh nonlinearities by default
- Refactored interface for
rllab.sampler.parallel_sampler
. Extracted new modulerllab.sampler.stateful_pool
containing general parallelization utilities. - Fixed numerous issues in tests causing too long to run.
- Merged release branch onto master and removed the release branch, to avoid potential confusions.
Features:
- Upgraded Mujoco interface to accomodate v1.30