Add CLI for SQIL #784

lukasberglund · 2023-09-12T15:09:42Z

Description

This PR adds an option to src/imitation/scripts to train using SQIL as addressed in #780. This is still a work in progress, but I would welcome feedback. Some uncertainties I have include:

What arguments to explicitly add to the sacred config for SQIL -- Right now I'm only including the keyword-args for SQIL as well as log_interval, total_timesteps, and progress_bar, similarly to how it's done for BC and DAgger. The user set can set all the other arguments too, but including them would make them more salient.
Adding warmstart capabilities -- BC and DAgger both allow to continue training a previously trained model, but the SQIL API doesn't. Consequently I didn't include in the SQIL CLI. I could add this feature pretty easily though.
Logging -- Currently, I don't see any logs to stdout when training with SQIL, except at the end. I'm not sure if I'm using the wrong hyperparameters or this is a bug. I will look into this more tomorrow.

Testing

I added three tests that mirror those for DAgger and BC. Right now I have more tests for SQIL than for DAgger, which might be overkill. Let me know if I should remove some.

AdamGleave

Tests are currently failing (looks like a missing dependency for tqdm), please address.

Some small comments on the code itself.

setup.py

src/imitation/algorithms/base.py

src/imitation/scripts/config/train_imitation.py

AdamGleave · 2023-09-12T20:31:02Z

src/imitation/scripts/train_imitation.py

+ sqil_trainer = SQIL(
+ venv=venv,
+ demonstrations=expert_trajs,
+ policy=sqil["policy_model"],


This duplicates the policy ingredient: https://github.com/HumanCompatibleAI/imitation/blob/master/src/imitation/scripts/ingredients/policy.py

I think can replace with policy.make_policy(venv) and remove the policy_model config parameter.

I think can replace with policy.make_policy(venv) and remove the policy_model config parameter.

This wouldn't work because policy.make_policy(venv) returns policies.BasePolicy, whereas SQIL requires type[policies.BasePolicy] (i.e. a constructor, not an instance).

But you are right, I could use policy["policy_cls"]. One issue is that SQIL uses a DQN by, default which requires a DQNPolicy and the default for policy_cls is base.FeedForward32Policy, which is incompatible. This is bit unfortunate, but I can't think of a way around it currently. To make it easy I've made a named_config called sqil.dqn which lets users set a dqn policy.

I currently can't think of a way to make sqil work by default. We would somehow want to override a config conditional on the sqil command being used. Not sure if that's possible/desirable.

This wouldn't work because policy.make_policy(venv) returns policies.BasePolicy, whereas SQIL requires type[policies.BasePolicy] (i.e. a constructor, not an instance).

Ah, good point, yes the difference between SQIL expecting classes v.s. the rest of our code expecting objects bites again.

But you are right, I could use policy["policy_cls"]. One issue is that SQIL uses a DQN by, default which requires a DQNPolicy and the default for policy_cls is base.FeedForward32Policy, which is incompatible. This is bit unfortunate, but I can't think of a way around it currently. To make it easy I've made a named_config called sqil.dqn which lets users set a dqn policy.

I currently can't think of a way to make sqil work by default. We would somehow want to override a config conditional on the sqil command being used. Not sure if that's possible/desirable.

Mm indeed this is messy and highlights a design flaw in Sacred. It is possible to have a different default depending on the context using a config hook, https://github.com/HumanCompatibleAI/imitation/blob/master/src/imitation/scripts/ingredients/rl.py#L41 is an example of this. You can add a hook in SQIL that checks if policy_cls == base.FeedForward32Policy and if so changes it to a DQNPolicy. This is a bit nasty since if for some reason the user manually sets the policy to a FeedForward32Policy we'll just override that silently, but it seems OK.

This seems good. I've made the change.

src/imitation/scripts/train_imitation.py

tests/scripts/test_scripts.py

Co-authored-by: Adam Gleave <[email protected]>

setup.py

src/imitation/scripts/ingredients/sqil.py

src/imitation/scripts/config/train_imitation.py

src/imitation/scripts/train_imitation.py

Co-authored-by: Adam Gleave <[email protected]>

setup.py

AdamGleave

Thanks for making these changes! I think this is almost ready, a few minor comments

src/imitation/scripts/ingredients/rl.py

src/imitation/scripts/ingredients/sqil.py

AdamGleave · 2023-09-14T20:56:14Z

src/imitation/scripts/ingredients/sqil.py

+ locals() # quieten flake8 unused variable warning
+
+
+@rl.rl_ingredient.config_hook


Why is this a config hook on the RL ingredient? I think all ingredients can modify any part of the config, so this could be a config hook on the SQIL ingredient directly. This would avoid mutating other ingredients (remember this is what caused issues with the tests previously), and would let you combine this with override_policy_cls.

I tried doing this originally, but it didn't work. For some reason it would set the variables inside of sqil. E.g. it would set config["sqil"]["rl"]["rl_cls"] instead of config["rl"]["rl_cls"] as intended.

Can confirm having it in SQIL ingredient will set variables inside of SQIL. More problematically moving hook to train_imitation experiment seems to then have no effect on sub-ingredients. So, ugly though it is, I think we probably do need to keep it here. Good news is these hooks are no-op when command name is not sqil.

tests/test_benchmarking.py

Co-authored-by: Adam Gleave <[email protected]>

AdamGleave

LGTM

AdamGleave · 2023-09-16T00:34:26Z

src/imitation/scripts/ingredients/sqil.py

+ locals() # quieten flake8 unused variable warning
+
+
+@rl.rl_ingredient.config_hook


Can confirm having it in SQIL ingredient will set variables inside of SQIL. More problematically moving hook to train_imitation experiment seems to then have no effect on sub-ingredients. So, ugly though it is, I think we probably do need to keep it here. Good news is these hooks are no-op when command name is not sqil.

lukasberglund added 4 commits September 12, 2023 16:41

Add sqil cli

9858281

Lints

556e642

More lints

b6d1a6b

Add shine requirement, used for DQN progress bar.

c7340d3

lukasberglund requested review from AdamGleave and ernestum September 12, 2023 15:09

lukasberglund added 2 commits September 12, 2023 17:11

Undo removal of src.policy

81e0f60

Remove old comment

8eafb12

lukasberglund added the enhancement New feature or request label Sep 12, 2023

lukasberglund linked an issue Sep 12, 2023 that may be closed by this pull request

Add CLI for SQIL #780

Closed

Add trailing commas

12293ca

AdamGleave requested changes Sep 12, 2023

View reviewed changes

lukasberglund and others added 8 commits September 13, 2023 10:06

change dependencies

f6852e2

Update src/imitation/scripts/config/train_imitation.py

ebcf1f0

Co-authored-by: Adam Gleave <[email protected]>

Move save_policy and reconstruct_policy"

837809e

Respond to fix save_policy issue

f00f44c

Remove some boilerplate

7ae8891

Merge remote-tracking branch 'origin' into sqil_cli

34e2d5e

fix use of save_policy

e6c8d63

Fix bug in sqil

3a13ef3

lukasberglund requested a review from AdamGleave September 13, 2023 17:45

AdamGleave requested changes Sep 14, 2023

View reviewed changes

lukasberglund and others added 5 commits September 14, 2023 03:02

Update src/imitation/scripts/ingredients/sqil.py

cd35a59

Co-authored-by: Adam Gleave <[email protected]>

address PR

a8e5866

fix typing error

d83ba64

fix typing error

4f3bdc1

change shine to rich

00e9ab8

lukasberglund commented Sep 14, 2023

View reviewed changes

setup.py Show resolved Hide resolved

remove line

5cd4d41

lukasberglund requested a review from AdamGleave September 14, 2023 16:59

AdamGleave requested changes Sep 14, 2023

View reviewed changes

lukasberglund and others added 3 commits September 15, 2023 09:52

Update src/imitation/scripts/ingredients/sqil.py

d5f5e77

Co-authored-by: Adam Gleave <[email protected]>

respond to adam comments

6309d7a

make line shorter

b1e8a86

lukasberglund requested a review from AdamGleave September 15, 2023 18:45

Simplify RL hook

e55ab9d

AdamGleave approved these changes Sep 16, 2023

View reviewed changes

AdamGleave merged commit 885beff into master Sep 16, 2023
1 of 8 checks passed

AdamGleave deleted the sqil_cli branch September 16, 2023 00:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add CLI for SQIL #784

Add CLI for SQIL #784

lukasberglund commented Sep 12, 2023

AdamGleave left a comment

AdamGleave Sep 12, 2023

lukasberglund Sep 13, 2023 •

edited

Loading

AdamGleave Sep 14, 2023

lukasberglund Sep 14, 2023

AdamGleave left a comment

AdamGleave Sep 14, 2023

lukasberglund Sep 15, 2023

AdamGleave Sep 16, 2023

AdamGleave left a comment

AdamGleave Sep 16, 2023

		locals() # quieten flake8 unused variable warning


		@rl.rl_ingredient.config_hook

Add CLI for SQIL #784

Add CLI for SQIL #784

Conversation

lukasberglund commented Sep 12, 2023

Description

Testing

AdamGleave left a comment

Choose a reason for hiding this comment

AdamGleave Sep 12, 2023

Choose a reason for hiding this comment

lukasberglund Sep 13, 2023 • edited Loading

Choose a reason for hiding this comment

AdamGleave Sep 14, 2023

Choose a reason for hiding this comment

lukasberglund Sep 14, 2023

Choose a reason for hiding this comment

AdamGleave left a comment

Choose a reason for hiding this comment

AdamGleave Sep 14, 2023

Choose a reason for hiding this comment

lukasberglund Sep 15, 2023

Choose a reason for hiding this comment

AdamGleave Sep 16, 2023

Choose a reason for hiding this comment

AdamGleave left a comment

Choose a reason for hiding this comment

AdamGleave Sep 16, 2023

Choose a reason for hiding this comment

lukasberglund Sep 13, 2023 •

edited

Loading