Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Download experts from hf inside tutorials and docs #766

Conversation

jas-ho
Copy link
Contributor

@jas-ho jas-ho commented Aug 4, 2023

Description

This updates the following files to download pretrained experts from the HuggingFace Model hub (instead of training them from scratch):

  • most notebooks in docs/tutorials
  • some algorithm doc files in docs/algorithms
  • the example script examples/quickstart.py

This saves computation time and allows users to experiment with competent, fully trained experts.

Note that this required changing the default example environment from "CartPole-v1" to "seals/CartPole-v0" in some places (since https://huggingface.co/HumanCompatibleAI only contains pretrained experts for the latter).

The only remaining calls to expert.train are now in

  • docs/tutorials/8_train_custom_env.ipynb where we train on a custom environment (hence, no pretrained expert available)
  • tests/data/test_rollout.py where experts are trained for a single timestep (hence, negligible overhead)

Solves #764

Testing

I have run all notebooks and the example script and checked that they still go through.

@jas-ho jas-ho linked an issue Aug 4, 2023 that may be closed by this pull request
@codecov
Copy link

codecov bot commented Aug 4, 2023

Codecov Report

Merging #766 (5bf9e13) into master (19c7f35) will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master     #766   +/-   ##
=======================================
  Coverage   96.33%   96.33%           
=======================================
  Files          93       93           
  Lines        8789     8789           
=======================================
  Hits         8467     8467           
  Misses        322      322           

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@jas-ho jas-ho changed the title 764 dont train experts in the tutorials download from hf instead Download experts from hf inside tutorials and docs Aug 4, 2023
Copy link
Member

@AdamGleave AdamGleave left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, these changes mostly looks good but there's a type error as the code is currently written.

The tutorials are typically passing in venv=env where env is a gym.Env created by gym.make. But load_policy expects a vec_env.VecEnv.

Clearly the notebooks work with this as they're passing tests, so one option to fix this would be to loosen up the type annotation on load_policy to a stable_baselines3.common.type_aliases.GymEnv = Union[gym.Env, vec_env.VecEnv] since this is what BaseAlgorithm.load accepts, although there's a chance this will cause type checking errors elsewhere (e.g. do any of our other policy loaders strictly require a vec_env.VecEnv)?

Alternatively we could stick to the vec_env.VecEnv convention we've mostly used in imitation (reasoning that VecEnv is strictly more general than a gym.Env, you can always have a VecEnv with a single gym.Env) and change your env's to a venv by e.g. using the util.make_vec_env function.

docs/algorithms/airl.rst Show resolved Hide resolved
docs/algorithms/bc.rst Show resolved Hide resolved
docs/algorithms/dagger.rst Show resolved Hide resolved
docs/algorithms/dagger.rst Outdated Show resolved Hide resolved
docs/algorithms/gail.rst Show resolved Hide resolved
docs/tutorials/2_train_dagger.ipynb Show resolved Hide resolved
docs/tutorials/3_train_gail.ipynb Show resolved Hide resolved
docs/tutorials/4_train_airl.ipynb Show resolved Hide resolved
docs/tutorials/9_compare_baselines.ipynb Outdated Show resolved Hide resolved
examples/quickstart.py Show resolved Hide resolved
@jas-ho jas-ho mentioned this pull request Aug 7, 2023
8 tasks
@jas-ho
Copy link
Contributor Author

jas-ho commented Aug 7, 2023

Thanks for spotting the inconsistent typing @AdamGleave .

@jas-ho
Copy link
Contributor Author

jas-ho commented Aug 7, 2023

one option to fix this would be to loosen up the type annotation on load_policy to a stable_baselines3.common.type_aliases.GymEnv = Union[gym.Env, vec_env.VecEnv] since this is what BaseAlgorithm.load accepts, although there's a chance this will cause type checking errors elsewhere (e.g. do any of our other policy loaders strictly require a vec_env.VecEnv)?

Neither mypy nor pytype complain if the annotation is changed to GymEnv. Looking over the loader factories in registry.py and serialize.py however shows that this is actually inconsistent with the type annotations in those loaders (which also expect VecEnv acc. to the type annotation).

This is because the typing of the policy registry is insufficient

agent_loader = policy_registry.get(policy_type)
reveal_type(agent_loader)

gives Revealed type is "def (*Any, **Any) -> stable_baselines3.common.policies.BasePolicy" where actually it should be def (vec_env.VecEnv, *Any, **Any) -> stable_baselines3.common.policies.BasePolicy

There is an open issue for that already: #574

@jas-ho
Copy link
Contributor Author

jas-ho commented Aug 7, 2023

Given how widespread the assumption of working with vec_env.VecEnv is atm the simpler solution you suggested seems safer to me:

stick to the vec_env.VecEnv convention we've mostly used in imitation (reasoning that VecEnv is strictly more general than a gym.Env, you can always have a VecEnv with a single gym.Env) and change your env's to a venv by e.g. using the util.make_vec_env function.

docs/algorithms/dagger.rst Outdated Show resolved Hide resolved
@jas-ho jas-ho requested a review from AdamGleave August 7, 2023 11:27
Copy link
Contributor Author

@jas-ho jas-ho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AdamGleave it's ready for a final review.

docs/tutorials/3_train_gail.ipynb Show resolved Hide resolved
docs/tutorials/3_train_gail.ipynb Show resolved Hide resolved
docs/tutorials/3_train_gail.ipynb Show resolved Hide resolved
docs/algorithms/dagger.rst Outdated Show resolved Hide resolved
Copy link
Member

@AdamGleave AdamGleave left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes! Looks like we might be able to reduce code duplication by re-using some VecEnv objects; WDYT?

docs/algorithms/gail.rst Outdated Show resolved Hide resolved
docs/algorithms/airl.rst Outdated Show resolved Hide resolved
docs/tutorials/3_train_gail.ipynb Outdated Show resolved Hide resolved
docs/tutorials/4_train_airl.ipynb Outdated Show resolved Hide resolved
@jas-ho
Copy link
Contributor Author

jas-ho commented Aug 9, 2023

Thanks for the changes! Looks like we might be able to reduce code duplication by re-using some VecEnv objects; WDYT?

Good catch! I grepped and did not find other instances of redundant env creations in tutorials or docs. So should be good for final review and merge @AdamGleave

@jas-ho jas-ho requested a review from AdamGleave August 9, 2023 08:23
Copy link
Member

@AdamGleave AdamGleave left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@AdamGleave AdamGleave merged commit 60d8686 into master Aug 9, 2023
15 checks passed
@AdamGleave AdamGleave deleted the 764-dont-train-experts-in-the-tutorials-download-from-hf-instead branch August 9, 2023 23:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Don't train experts in the tutorials, download from HF instead
2 participants