Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ggml : remove GGML_MAX_NODES limit #567

Closed
ggerganov opened this issue Oct 10, 2023 · 10 comments · Fixed by #586
Closed

ggml : remove GGML_MAX_NODES limit #567

ggerganov opened this issue Oct 10, 2023 · 10 comments · Fixed by #586
Labels
refactoring Refactoring

Comments

@ggerganov
Copy link
Owner

We should be able to create graphs with large number of nodes. One application is support for LSTM networks:

#467

The exact approach for implementing this is not currently clear, but some ideas and discussion is available in the linked issue above

@ggerganov ggerganov added the refactoring Refactoring label Oct 10, 2023
@datduonguva
Copy link

my approach for this would be to create the graph for just 1 cell. For each time step, will simply update the input and states, call the ggml_graph_compute_with_ctx and get the output. That way, the graph's size doesn't grow with the number of timestep.

Example:

https://github.com/datduonguva/ggml-experiments/blob/master/rnn_text_gen/rnn_text_generation.cpp

@ggerganov
Copy link
Owner Author

my approach for this would be to create the graph for just 1 cell.

cc @PABannier

@PABannier
Copy link
Contributor

Yes we discussed that in #467 ! But for large sequences, isn't the overhead of building the graph per cell slowing this down dramatically?

@datduonguva Have you tried by building a general graph by setting GGML_MAX_NODES to 100,000 for instance?

@ggerganov
Copy link
Owner Author

ggerganov commented Oct 11, 2023

@PABannier They build the graph only once and then for each iteration, they put different input in the input tensor and read the output from the output tensor and so on. I don't know if this is valid for all LSTM models, but this specific one has a fixed graph that does not change it's shape with time.

This approach does not have the overhead of building the graph each time, but it does have an overhead of starting and stopping the thread pool for every iteration.

@ggerganov
Copy link
Owner Author

ggerganov commented Oct 19, 2023

Let's fix the ggml_cgraph MAX_NODES limit. We should implement the second option from #586 (comment)

If we take the ideas from #467 and drop the requirement for having graphs on the stack.
So basically have:

    struct ggml_cgraph {
        int n_nodes;
        int n_leafs;

        // by default we work on the stack
        struct ggml_tensor * nodes;
        struct ggml_tensor * grads;
        struct ggml_tensor * leafs;

        void ** visited_hash_table;

        // performance
        int     perf_runs;
        int64_t perf_cycles;
        int64_t perf_time_us;
    };

We allow creation of new graphs only through ggml_new_graph(). Extend the interface with number of nodes:

GGML_API struct ggml_cgraph * ggml_new_graph        (struct ggml_context * ctx, int n_nodes_max);
GGML_API struct ggml_cgraph * ggml_build_forward_ctx(struct ggml_context * ctx, struct ggml_tensor * tensor, int n_nodes_max);

I.e. all graphs are always create inside a ggml_context - is this a good idea?

Deprecate ggml_build_forward (struct ggml_tensor * tensor);. Instead, we should use:

// old
gf = ggml_build_forward(x);

// new
gf = ggml_build_forward_ctx(ctx, x, n_nodes_max);

The proposed struct ggml_cgraph can be used as a view to a graph allocated in a context.
Add API for creating views:

struct ggml_cgraph ggml_graph_view(struct ggml_cgraph * cgraph, int n0, int n1);

I think leafs always have to be "viewed in full", i.e only the nodes/grads can be split? The hash table can be reused by the views I think? Probably OK to return graph views on the stack?

Probably need to extend:

GGML_API size_t ggml_graph_overhead(int n_nodes_max);

@slaren
Copy link
Collaborator

slaren commented Oct 19, 2023

Deprecate ggml_build_forward (struct ggml_tensor * tensor);

Deprecate usually means that there would be a transition period where both APIs would work, but I am not sure that would be possible here. Do you actually mean that, or just removing ggml_build_forward entirely?

I think leafs always have to be "viewed in full", i.e only the nodes/grads can be split?

For the purposes of ggml-backend, I don't care about leafs at all. I don't think that backends ever use them while computing a graph either. So I would be fine with just setting it to NULL. I don't know in what cases a list of leaf is required.

The hash table can be reused by the views I think?

The hashtable is only used to avoid revisiting nodes while expanding the graph, and views shouldn't be allowed to be expanded since they would just overwrite the parent's memory. I think it would be ok to just set the hash table to NULL.

Probably OK to return graph views on the stack?

It's probably OK, but it doesn't really solve anything and it may reduce options in the future. Ie. we may need to change something that would require dynamic memory allocation even on views. So I think the safer route would be to also allocate views in a context.

Should we do something to avoid allocating the list of grads on forward-only graphs? It is just wasted memory during inference.

@ggerganov
Copy link
Owner Author

Do you actually mean that, or just removing ggml_build_forward entirely?

Remove it entirely. We will add comment in the header how to migrate so it is easier for 3rd party projects.

I don't know in what cases a list of leaf is required.

Hm, interesting. leafs do seem like an unnecessary categorization. I guess I must have been thinking about some use case when I separated the nodes and leafs, but as it stands - this seems not needed, just doubling the logic of handling nodes. We should remove the leafs from the graph - can do it in a separate issue.

So I think the safer route would be to also allocate views in a context.

OK

Should we do something to avoid allocating the list of grads on forward-only graphs? It is just wasted memory during inference.

Yes, probably a flag that the user sets when they know that backward passes will be computed.

@slaren
Copy link
Collaborator

slaren commented Oct 20, 2023

Most of the changes to ggml.c/h are done, but I have not updated the tests and examples yet because I am not entirely sure about the interface. ggml_new_graph should be easy to use and shouldn't have too many parameters for basic cases, and these parameters should not need to be replicated in ggml_build_xxx_ctx.

I propose keeping a simple ggml_new_graph function that doesn't take any parameters (other than a context). More complex initialization can be done with another function, such as ggml_new_graph_custom. The ggml_build_xxx_ctx functions are removed. The complete interface would be something like this:

// Graph interface
GGML_API void ggml_build_forward_expand (struct ggml_cgraph * cgraph, struct ggml_tensor * tensor);
GGML_API void ggml_build_backward_expand(struct ggml_context * ctx, struct ggml_cgraph * gf, struct ggml_cgraph * gb, bool keep);

GGML_API struct ggml_cgraph * ggml_new_graph(struct ggml_context * ctx); // = ggml_new_graph_custom(GGML_GRAPH_DEFAULT_SIZE, true);
GGML_API struct ggml_cgraph * ggml_new_graph_custom(struct ggml_context * ctx, size_t size, bool grads);

GGML_API void ggml_graph_reset(struct ggml_cgraph * graph);

GGML_API struct ggml_cgraph * ggml_graph_view(struct ggml_context * ctx, struct ggml_cgraph * cgraph, int i0, int i1);
// Previously, graphs could be copied simply with the operator =, but this does not work anymore.
// Restore this functionality with these functions:
GGML_API void ggml_graph_cpy (struct ggml_cgraph * src, struct ggml_cgraph * dst);
GGML_API struct ggml_cgraph * ggml_graph_dup (struct ggml_context * ctx, struct ggml_cgraph * src); // ggml_graph_cpy(src, ggml_new_graph(...));

GGML_API size_t ggml_graph_overhead(); // = ggml_graph_overhead_custom(GGML_GRAPH_DEFAULT_SIZE, true);
GGML_API size_t ggml_graph_overhead_custom(size_t size, bool grads);

// Removed:
// GGML_API struct ggml_cgraph * ggml_build_forward_ctx (struct ggml_context * ctx, struct ggml_tensor * tensor, size_t size);
// GGML_API struct ggml_cgraph * ggml_build_backward_ctx(struct ggml_context * ctx, struct ggml_cgraph * gf, bool keep);

// Instead:
// struct ggml_cgraph * gf = ggml_new_graph(ctx);
// ggml_graph_reset(gf); // ggml_build_forward() also resets the graph, add this function to support this
// ggml_build_forward_expand(gf, tensor);
//
// structr ggml_cgraph * gb = ggml_new_graph(ctx);
// ggml_build_backward_expand(ctx, gf, gb, keep);

I noticed that the graph reset functionality in ggml_build_forward isn't really working currently because the hash table is not reset, only n_nodes and n_leafs, so I imagine nobody depends on this behavior at the moment. Nonetheless, a ggml_graph_reset function would be easy to implement and could be useful in some cases.

@ggerganov
Copy link
Owner Author

GGML_API struct ggml_cgraph * ggml_new_graph(struct ggml_context * ctx); // = ggml_new_graph_custom(GGML_GRAPH_DEFAULT_SIZE, true);

By default, we probably don't want to have grads as 99% of the use-cases are currently just forward pass, so maybe change to false.

For the graph resetting in ggml_build_forward we probably need a new ggml_graph_reset or ggml_graph_clear that zeros the nodes and clears the hash table.

Below is a short description I wrote about the existing ggml_graph_reset(), before I realized you were referring to the resetting in ggml_build_forward(), so not very relevant, but keeping it for FYI because I already wrote it anyway:

GGML_API void ggml_graph_reset(struct ggml_cgraph * graph);

The purpose of this function is to just set the gradient contents to 0. The test1.c test demonstrates sample use-cases. We first set the input against which we will do automatic differentiation with ggml_set_param and then we build the forward graph. This will cause each node in the graph that depends directly or indirectly on any of the inputs to have a grad tensor allocated. For a given forward graph, we can build the backward graph. Here we have 2 options - keep == true means that the backward graph will make a copy of the original grad tensors and if keep == false it will reuse them. In the latter option, we save memory, but we can no longer build a backward graph of the backward graph (if we wanted to compute the second derivative for example).
Back to ggml_graph_reset, lets assume we have already build the forward and backward graph and now we want to compute the gradients for different values of the input. Before each ggml_graph_compute, we zero-out the gradients and set the values of the inputs. This mechanism allows us to evaluate the function (i.e. forward graph) and it's derivatives (i.e. backward graph) many times without recreating the graphs.
In that sense, probably we should rename the function to ggml_graph_reset_grads.

Other than that, I think the API is good.

@slaren
Copy link
Collaborator

slaren commented Oct 21, 2023

I didn't realize that there is already a function called ggml_graph_reset, sorry for the confusion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
refactoring Refactoring
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

4 participants