How Paper input matches the code state s(t)? #141

ahmad-hl · 2021-12-09T07:24:48Z

Dear Hongzi,

I was trying to figure out the matching between the RL agent's state s(t) in the code and the input info in the paper.

Input: After the download of each chunk t, Pensieve’s learning
agent takes state inputs st = (xt, τt, nt, bt ,ct ,lt) to its neural networks. xt
is the network throughput measurements for the past k video chunks; τt
is the download time of the past k video chunks, nt is a vector of m available sizes for the next video chunk; bt is
the current buffer level; ct is the number of chunks remaining in the video; and lt is the bitrate at which the last chunk was downloaded.
First of all, which code package we need to look at, multi-video_sim or sim?

When I look at sim, I see in def agent that the input state is

0: last quality ?
1: buffer_size ( bt)
2: chunk_size ?
3: delay ? is it download time (τt)?
4: next_chunk_sizes (nt)
5: remain_chunks (ct )

Could you please illustrates the matching, and the actor & critic networks (figure 5) if possible?

The text was updated successfully, but these errors were encountered:

hongzimao · 2021-12-11T18:16:39Z

multi-video sim is for agents that can generalize to videos with different numbers and different level of bitrate encoding.

It looks to me in the above writing your understanding of the code and the paper is correct.

ahmad-hl · 2021-12-14T14:20:00Z

I have upgraded the code to work on python 3.8 and used cooked_traces to train the multiagent RL model in sim dir.
Given that I'm using a computer with 2 GPU and tensorboard to monitor, What is the time required for the model to converge?
How do you know if the model converged?

Can you also explain the main components in the objective function?

# Compute the objective (log action_vector and entropy)
self.obj = tf.reduce_sum(tf.multiply(tf.log(tf.reduce_sum(tf.multiply(self.out, self.acts), axis=1, keepdims=True) - self.act_grad_weights)) 
+ ENTROPY_WEIGHT * tf.reduce_sum(tf.multiply(self.out, tf.log(self.out + ENTROPY_EPS)))

hongzimao · 2021-12-14T21:31:46Z

Thanks again for upgrading the codebase. The training wall time really depends on your physical hardware. You can monitor the learning curve and see when the performance on validation set is stabilized. To determine if the model is converged, you can use some heuristic like relative performance didn't improve much for the past xxx iteration or something. At our time, we just eyeballed it.

The main objective is just the policy gradient expression (the expression after the gradient operator). It's basically log pi_t * (R_t - baseline_t) + entropy regulator, sum over the training batch.

Hope these help.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How Paper input matches the code state s(t)? #141

How Paper input matches the code state s(t)? #141

ahmad-hl commented Dec 9, 2021

hongzimao commented Dec 11, 2021

ahmad-hl commented Dec 14, 2021 •

edited

Loading

hongzimao commented Dec 14, 2021

How Paper input matches the code state s(t)? #141

How Paper input matches the code state s(t)? #141

Comments

ahmad-hl commented Dec 9, 2021

hongzimao commented Dec 11, 2021

ahmad-hl commented Dec 14, 2021 • edited Loading

hongzimao commented Dec 14, 2021

ahmad-hl commented Dec 14, 2021 •

edited

Loading