Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ggml_allocr_alloc_graph allocated overlapping tensor memory #700

Closed
bssrdf opened this issue Jan 18, 2024 · 4 comments
Closed

ggml_allocr_alloc_graph allocated overlapping tensor memory #700

bssrdf opened this issue Jan 18, 2024 · 4 comments

Comments

@bssrdf
Copy link
Contributor

bssrdf commented Jan 18, 2024

Hi, I have encountered a strange issue using ggml_allocr_alloc_graph to allocate tensor data. When building the graph, I used no-alloca context and later used ggml_allocr_alloc_graph to allocate all tensors' data. However, I noticed two particular tensors have exactly the same memory address for their data member. Is this a bug?

You can replicate the issue using my branch here. After building ggml, run ./bin/test-alloc-graph.

The graph is a simple one:
test-alloc-graph-forward dot

@slaren
Copy link
Collaborator

slaren commented Jan 18, 2024

This is not bug, it is actually the main function of ggml-alloc. The memory of the tensors with intermediate results is reused as soon as they aren't needed anymore to reduce the size of the compute buffers. If you want every tensor to have a different address, you can use a context without no_alloc, or ggml_backend_alloc_ctx_tensors.
If you only want to inspect the results of intermediate computations, you can also compute the graph one node at a time, such as:

    for (int i = 0; i < g1->n_nodes; i++) {
        struct ggml_tensor * t1 = g1->nodes[i];
        struct ggml_cgraph g1v = ggml_graph_view(g1, i, i + 1);
        ggml_backend_graph_compute(backend, &g1v);
    }

There was also a callback added to ggml_backend_sched for this purpose in ggerganov/llama.cpp#4935.
If you want to keep some of the intermediate results, the recommended approach would be pre-allocate some tensors in a different buffer and use ggml_cpy to copy the result there. Technically it is also possible to add a dependency at the end of the graph with a no-op such as ggml_scale(ctx, a, 1), but I wouldn't recommend that.

@bssrdf
Copy link
Contributor Author

bssrdf commented Jan 18, 2024

Thanks for the quick response.

Sorry I am new to ggml. I understand this memory overwrite is fine for inference (i.e., forward compute). How about backward compute? Won't this memory overwrite defeat the purpose of backpropagation for training? I found out this behavior when trainning a VAE.

@slaren
Copy link
Collaborator

slaren commented Jan 18, 2024

I don't know much about training, but I believe that the way the training examples in llama.cpp handle this is by adding dependencies at the end of the graph with ggml_scale(ctx, a, 1), which may be the best way to do this at the moment if you need to keep a lot of the intermediate results.

@bssrdf
Copy link
Contributor Author

bssrdf commented Jan 18, 2024

Thanks for the suggestions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants