-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
关于Actor更新问题 #3
Comments
你好,感谢你对RMTL的兴趣。对第一个问题,实际上我们是先训练actor网络,之后固定actor网络参数 开始训练critic网络到收敛。第二个问题你理解的没错,我们使用更新后的total loss对actor进行fine-tune |
@LZR-S 感谢回复。 再请教一下细节: |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
请教几个细节问题:
1、论文中公式15上方有一段关于描述:
Before the TD error 𝛿 converges to threshold 𝜖, we update the current actor networks parameterized by 𝜃𝑘 through the gradients back-propagation of loss function for each tower layer after the forward process of each batch transitions from B。
意思是在使用TD算法更新Critic的同时也使用公式15更新Actor是吧?
但是3.2部分的实现细节中,说“In order to improve data efficiency, we initialize the Actor with pretrained parameters from the MTL models and keep them frozen until the Critics converge. ” 固定Actor预训练的参数直到Critics收敛?这两个地方是不是有矛盾?请教具体做法。
2、3.2部分实现细节中说到:Then we multiply the critic value in the total loss and retrain the MTL model.
这句话的意思是将Q值代入公式10,对Actor进行fine-tune吗?
期待回复,谢谢~
The text was updated successfully, but these errors were encountered: