关于Actor更新问题 #3

Coastchb · 2023-07-04T06:48:27Z

请教几个细节问题：
1、论文中公式15上方有一段关于描述：
Before the TD error 𝛿 converges to threshold 𝜖, we update the current actor networks parameterized by 𝜃𝑘 through the gradients back-propagation of loss function for each tower layer after the forward process of each batch transitions from B。
意思是在使用TD算法更新Critic的同时也使用公式15更新Actor是吧？
但是3.2部分的实现细节中，说“In order to improve data efficiency, we initialize the Actor with pretrained parameters from the MTL models and keep them frozen until the Critics converge. ” 固定Actor预训练的参数直到Critics收敛？这两个地方是不是有矛盾？请教具体做法。

2、3.2部分实现细节中说到：Then we multiply the critic value in the total loss and retrain the MTL model.
这句话的意思是将Q值代入公式10，对Actor进行fine-tune吗？

期待回复，谢谢~

LZR-S · 2023-07-07T01:50:53Z

你好，感谢你对RMTL的兴趣。对第一个问题，实际上我们是先训练actor网络，之后固定actor网络参数开始训练critic网络到收敛。第二个问题你理解的没错，我们使用更新后的total loss对actor进行fine-tune

Coastchb · 2023-07-10T12:32:32Z

@LZR-S 感谢回复。

再请教一下细节：
（1）样本的话，是类似 {用户性别、用户年龄...等用户侧属性特征；session内第1个item的属性特征；session内第2个item的属性特征；...；用户对该session内各个item的反馈（是否点击、是否转化）}这样的嘛？
（2）如果模型输入没有使用序列特征（用户历史行为序列），那对同一个session内的预估，对应的模型输入是完全相同的？

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

关于Actor更新问题 #3

关于Actor更新问题 #3

Coastchb commented Jul 4, 2023

LZR-S commented Jul 7, 2023

Coastchb commented Jul 10, 2023

关于Actor更新问题 #3

关于Actor更新问题 #3

Comments

Coastchb commented Jul 4, 2023

LZR-S commented Jul 7, 2023

Coastchb commented Jul 10, 2023