Matteo's Notes

Notes from Matteo Hessel based on personal correspondence:

I have come across two versions of the Noisy Net algorithm for DQN on arXiv: v1) where independent noise is sampled for each transition in the minibatch and v2) where the same noise is used per batch but another set of noise is used to perform action selection for the target network - which one is used in Rainbow/are they interchangeable?

v2

In the original Distributional RL paper, the loss used is the cross-entropy: -m.log(p(s, a)) . However, the standard KL loss is: m(log(m) - p(s, a)) . Which is used for Rainbow (considering the scaling difference effects Prioritised Experience Replay)?

cross entropy

Are losses summed or averaged across a minibatch?

averaged

Could you provide more info on the importance of σ_0 and how noise generation differs for you between CPU and GPU?

it's not critical in many games, however it can affect quite significantly a handful, and median scores can be quite sensible to games who's performance is close to the median itself. It's something I observed only on some GPUs, placing the noise generation on CPU seems the safer way to go.

The data-efficient Rainbow paper mentioned using a max gradient norm of 10 for both the normal and data-efficient versions, but I didn't see this in the original paper. Can you confirm if it is/isn't used?

Yes, a max gradient norm of 10 was used indeed in both papers. This is not however a particularly sensitive hyper-parameter.

Matteo also noted that using the importance weights in the experience replay priorities worked better for Rainbow.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Matteo's Notes

Notes from Matteo Hessel based on personal correspondence:

Clone this wiki locally