-
Notifications
You must be signed in to change notification settings - Fork 235
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Download experts from hf inside tutorials and docs #766
Download experts from hf inside tutorials and docs #766
Conversation
Codecov Report
@@ Coverage Diff @@
## master #766 +/- ##
=======================================
Coverage 96.33% 96.33%
=======================================
Files 93 93
Lines 8789 8789
=======================================
Hits 8467 8467
Misses 322 322 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, these changes mostly looks good but there's a type error as the code is currently written.
The tutorials are typically passing in venv=env
where env
is a gym.Env
created by gym.make
. But load_policy
expects a vec_env.VecEnv
.
Clearly the notebooks work with this as they're passing tests, so one option to fix this would be to loosen up the type annotation on load_policy
to a stable_baselines3.common.type_aliases.GymEnv = Union[gym.Env, vec_env.VecEnv]
since this is what BaseAlgorithm.load
accepts, although there's a chance this will cause type checking errors elsewhere (e.g. do any of our other policy loaders strictly require a vec_env.VecEnv
)?
Alternatively we could stick to the vec_env.VecEnv
convention we've mostly used in imitation
(reasoning that VecEnv
is strictly more general than a gym.Env
, you can always have a VecEnv
with a single gym.Env
) and change your env
's to a venv
by e.g. using the util.make_vec_env
function.
Thanks for spotting the inconsistent typing @AdamGleave . |
Neither This is because the typing of the policy registry is insufficient agent_loader = policy_registry.get(policy_type)
reveal_type(agent_loader) gives There is an open issue for that already: #574 |
Given how widespread the assumption of working with
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@AdamGleave it's ready for a final review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the changes! Looks like we might be able to reduce code duplication by re-using some VecEnv objects; WDYT?
Good catch! I grepped and did not find other instances of redundant env creations in tutorials or docs. So should be good for final review and merge @AdamGleave |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Description
This updates the following files to download pretrained experts from the HuggingFace Model hub (instead of training them from scratch):
docs/tutorials
docs/algorithms
examples/quickstart.py
This saves computation time and allows users to experiment with competent, fully trained experts.
Note that this required changing the default example environment from "CartPole-v1" to "seals/CartPole-v0" in some places (since https://huggingface.co/HumanCompatibleAI only contains pretrained experts for the latter).
The only remaining calls to
expert.train
are now indocs/tutorials/8_train_custom_env.ipynb
where we train on a custom environment (hence, no pretrained expert available)tests/data/test_rollout.py
where experts are trained for a single timestep (hence, negligible overhead)Solves #764
Testing
I have run all notebooks and the example script and checked that they still go through.