Reuse gpt2_graph #697

datduonguva · 2024-01-14T06:46:32Z

On the gpt-2 example, I can see that during inference, each step invokes a gpt2_eval() function, which in turn, invokes gpt2_graph() function to recreate the graph.

Why can't we create just once and reuse it?

ggerganov · 2024-01-14T08:01:47Z

Even though the nodes in the graph are the same type for each invocation (there are exceptions though), the tensor dimensions do change. Mainly because the size of the tensors in the attention are a function of the number of tokens and also the input number of tokens in the batch can be different. In certain scenarios, it could be beneficial to pre-init the graphs (lets say, a graph for each possible n_past and a single new token) at the start in order to avoid creating them at runtime.

datduonguva · 2024-01-14T15:19:49Z

thank you!

datduonguva closed this as completed Jan 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reuse gpt2_graph #697

Reuse gpt2_graph #697

datduonguva commented Jan 14, 2024

ggerganov commented Jan 14, 2024

datduonguva commented Jan 14, 2024

Reuse gpt2_graph #697

Reuse gpt2_graph #697

Comments

datduonguva commented Jan 14, 2024

ggerganov commented Jan 14, 2024

datduonguva commented Jan 14, 2024