-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How can I make action sampling within the range specified by my environment when using onpolicy_trainer? #1142
Comments
`
` |
`# continuous actions:
|
sorry, I don't know what's wrong with the code T_T |
I can take a look soon. Could you pls
|
ok, thanks, I will re-upload code |
Hi, I am new to tianshou and RL. I created a env and used ppo in tianshou to run. But I found the action sampling is out of range. So I searched for, and I found map_action. But it seem not used in trainer
So, how can I solve this problem. Thanks a lot
# continuous actions: orn_low = np.array([-30, -30, -30]) * np.pi / 180 orn_high = np.array([30, 30, 30]) * np.pi / 180 v_low = np.array([0.001]) v_high = np.array([0.1]) distance_low = np.array([0.01]) distance_high = np.array([0.5]) act_low = np.concatenate((orn_low,v_low,distance_low)) act_high = np.concatenate((orn_high, v_high,distance_high)) bias = () self.action_space = spaces.Box(low = act_low, high = act_high, dtype = np.float64) self.action = np.zeros(self.action_space.shape, dtype = self.action_space)
`#model
net_a = Net(
args.state_shape,
hidden_sizes=args.hidden_sizes,
activation=nn.Tanh,
device=args.device,
)
actor = ActorProb(
net_a,
args.action_shape,
unbounded=True,
device=args.device,
).to(args.device)
net_c = Net(
args.state_shape,
hidden_sizes=args.hidden_sizes,
activation=nn.Tanh,
device=args.device,
)
critic = Critic(net_c, device=args.device).to(args.device)
actor_critic = ActorCritic(actor, critic)
The text was updated successfully, but these errors were encountered: