[RLlib]: PPO agent training error: Invalid NaN values in Normal distribution parameters #46442

InigoGastesi · 2024-07-05T11:00:46Z

What happened + What you expected to happen

Hello,

I am encountering an error while training a PPO agent using RLlib. During training, I receive the following error message:
File "/opt/conda/envs/prueba3.11/lib/python3.11/site-packages/ray/rllib/algorithms/ppo/ppo_torch_policy.py", line 85, in loss curr_action_dist = dist_class(logits, model) ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/conda/envs/prueba3.11/lib/python3.11/site-packages/ray/rllib/models/torch/torch_action_dist.py", line 250, in __init__ self.dist = torch.distributions.normal.Normal(mean, torch.exp(log_std)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/conda/envs/prueba3.11/lib/python3.11/site-packages/torch/distributions/normal.py", line 56, in __init__ super().__init__(batch_shape, validate_args=validate_args) File "/opt/conda/envs/prueba3.11/lib/python3.11/site-packages/torch/distributions/distribution.py", line 68, in __init__ raise ValueError( ValueError: Expected parameter loc (Tensor of shape (128, 2)) of distribution Normal(loc: torch.Size([128, 2]), scale: torch.Size([128, 2])) to satisfy the constraint Real(), but found invalid values: tensor([[nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan], [nan, nan]], grad_fn=<SplitBackward0>)
I have checked all observations to ensure there are no NaN values, but the error persists. Can you please help me identify the cause of this issue and how to resolve it?

Thank you for your assistance.

Versions / Dependencies

ray: 2.31
Python: 3.11
torch: 2.3.1

Reproduction script

tuner = Tuner(
trainable=PPO,
param_space=...,
run_config=...
)
tuner.fit()

Issue Severity

Low: It annoys or frustrates me.

The text was updated successfully, but these errors were encountered:

simonsays1980 · 2024-07-09T10:47:11Z

@InigoGastesi Thanks for raising this issue. I guess this behavior is not error, but due to the training process. Could you check, if your KL divergence is very high? I guess the reason for the logits turning NaN are too extreme gradients resulting in numerical problems.

You could try to increase the kl_coeff and decrease the learning rate. If this behavior remains, we will need a reproducable example to analyse the code.

man2machine · 2024-07-19T23:49:50Z

I am getting the same error as well.

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
File ~/anaconda3/envs/edge/lib/python3.11/site-packages/ray/rllib/policy/torch_policy_v2.py:1348, in TorchPolicyV2._multi_gpu_parallel_grad_calc.<locals>._worker(shard_idx, model, sample_batch, device)
   1344 with NullContextManager() if device.type == "cpu" else torch.cuda.device(  # noqa: E501
   1345     device
   1346 ):
   1347     loss_out = force_list(
-> 1348         self.loss(model, self.dist_class, sample_batch)
   1349     )
   1351     # Call Model's custom-loss with Policy loss outputs and
   1352     # train_batch.

File ~/anaconda3/envs/edge/lib/python3.11/site-packages/ray/rllib/algorithms/ppo/ppo_torch_policy.py:85, in PPOTorchPolicy.loss(self, model, dist_class, train_batch)
     84 logits, state = model(train_batch)
---> 85 curr_action_dist = dist_class(logits, model)
     87 # RNN case: Mask away 0-padded chunks at end of time axis.

File ~/anaconda3/envs/edge/lib/python3.11/site-packages/ray/rllib/models/torch/torch_action_dist.py:512, in TorchMultiActionDistribution.__init__(self, inputs, model, child_distributions, input_lens, action_space)
    511 split_inputs = torch.split(inputs, self.input_lens, dim=1)
--> 512 self.flat_child_distributions = tree.map_structure(
    513     lambda dist, input_: dist(input_, model),
    514     flat_child_distributions,
    515     list(split_inputs),
    516 )

File ~/anaconda3/envs/edge/lib/python3.11/site-packages/tree/__init__.py:430, in map_structure(func, *structures, **kwargs)
    428   assert_same_structure(structures[0], other, check_types=check_types)
    429 return unflatten_as(structures[0],
--> 430                     [func(*args) for args in zip(*map(flatten, structures))])

File ~/anaconda3/envs/edge/lib/python3.11/site-packages/tree/__init__.py:430, in <listcomp>(.0)
    428   assert_same_structure(structures[0], other, check_types=check_types)
    429 return unflatten_as(structures[0],
--> 430                     [func(*args) for args in zip(*map(flatten, structures))])

File ~/anaconda3/envs/edge/lib/python3.11/site-packages/ray/rllib/models/torch/torch_action_dist.py:513, in TorchMultiActionDistribution.__init__.<locals>.<lambda>(dist, input_)
    511 split_inputs = torch.split(inputs, self.input_lens, dim=1)
    512 self.flat_child_distributions = tree.map_structure(
--> 513     lambda dist, input_: dist(input_, model),
    514     flat_child_distributions,
    515     list(split_inputs),
    516 )

File ~/anaconda3/envs/edge/lib/python3.11/site-packages/ray/rllib/models/torch/torch_action_dist.py:250, in TorchDiagGaussian.__init__(self, inputs, model, action_space)
    249 self.log_std = log_std
--> 250 self.dist = torch.distributions.normal.Normal(mean, torch.exp(log_std))
    251 # Remember to squeeze action samples in case action space is Box(shape)

File ~/anaconda3/envs/edge/lib/python3.11/site-packages/torch/distributions/normal.py:56, in Normal.__init__(self, loc, scale, validate_args)
     55     batch_shape = self.loc.size()
---> 56 super().__init__(batch_shape, validate_args=validate_args)

File ~/anaconda3/envs/edge/lib/python3.11/site-packages/torch/distributions/distribution.py:68, in Distribution.__init__(self, batch_shape, event_shape, validate_args)
     67         if not valid.all():
---> 68             raise ValueError(
     69                 f"Expected parameter {param} "
     70                 f"({type(value).__name__} of shape {tuple(value.shape)}) "
     71                 f"of distribution {repr(self)} "
     72                 f"to satisfy the constraint {repr(constraint)}, "
     73                 f"but found invalid values:\n{value}"
     74             )
     75 super().__init__()

ValueError: Expected parameter loc (Tensor of shape (32, 1)) of distribution Normal(loc: torch.Size([32, 1]), scale: torch.Size([32, 1])) to satisfy the constraint Real(), but found invalid values:
tensor([[nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan],
        [nan]], device='cuda:0', grad_fn=<SplitBackward0>)

InigoGastesi · 2024-07-22T08:35:07Z

@simonsays1980 Sorry for not answering sooner, I had not received the notification. On tensorboard I have two parameters of kl, cur_kl_coeff and kl. In my last training, I set lr=0.00001 with these params.

model:
fcnet_hiddens:
- 256
- 256
fcnet_activation: "tanh"
framework: "torch"
batch_mode: "complete_episodes"
train_batch_size: 4000

Before getting the error again, kl was at 0.01

InigoGastesi added bug Something that is supposed to be working; but isn't triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Jul 5, 2024

anyscalesam added the rllib RLlib related issues label Jul 8, 2024

simonsays1980 self-assigned this Jul 9, 2024

simonsays1980 added P3 Issue moderate in impact or severity rllib-oldstack-cleanup Issues related to cleaning up classes, utilities on the old API stack and removed triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Jul 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RLlib]: PPO agent training error: Invalid NaN values in Normal distribution parameters #46442

[RLlib]: PPO agent training error: Invalid NaN values in Normal distribution parameters #46442

InigoGastesi commented Jul 5, 2024

simonsays1980 commented Jul 9, 2024

man2machine commented Jul 19, 2024

InigoGastesi commented Jul 22, 2024

[RLlib]: PPO agent training error: Invalid NaN values in Normal distribution parameters #46442

[RLlib]: PPO agent training error: Invalid NaN values in Normal distribution parameters #46442

Comments

InigoGastesi commented Jul 5, 2024

What happened + What you expected to happen

Versions / Dependencies

Reproduction script

Issue Severity

simonsays1980 commented Jul 9, 2024

man2machine commented Jul 19, 2024

InigoGastesi commented Jul 22, 2024