Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[QUESTION] Implementing RNN/LSTM with ggml #136

Closed
PABannier opened this issue Oct 2, 2023 · 4 comments
Closed

[QUESTION] Implementing RNN/LSTM with ggml #136

PABannier opened this issue Oct 2, 2023 · 4 comments

Comments

@PABannier
Copy link

Hi @saharNooby !

I did not know where to post this message to reach you out. So I opened an issue :)

We currently have this issue on ggml (ggerganov/ggml#467) . I'm trying to implement a LSTM layer with ggml. However, since the computational graph of the LSTM layer grows with the sequence length, I am often limited by the GGML_MAX_NODES constant. I was told rwkv.cpp implemented a serial graph which closely resembles the graph a RNN would have.

Diving into your code, I realize that you're storing the computational graph on the heap to avoid stack overflows with these very large graphs. What is less clear is how you're making sure your computational graph is not made up of more than GGML_MAX_NODES nodes.

Could you explain to me how you designed your forward pass? Do you build a computational graph per time point of the sequence?

Thanks in advance for your answer

@saharNooby
Copy link
Collaborator

Hi!

I realize that you're storing the computational graph on the heap to avoid stack overflows with these very large graphs

That's correct; please also note that other structures (cplan at least) need also be allocated on the heap, because they grow with GGML_MAX_NODES just like cgraphs.

What is less clear is how you're making sure your computational graph is not made up of more than GGML_MAX_NODES nodes

It's simple -- I don't! I just use a fork of ggml with GGML_MAX_NODES changed from 4096 to 80K. This allows inference of 14B models (maximum available size of RWKV for the moment) for sequence length 64 (it was experimentally checked by @LoganDark that after this length performance gains are minimal).

See this single commit in my fork of ggml -- aside from upping GGML_MAX_NODES, some APIs also need to be changed to support heap allocation.

Since ggml is already pinned to a specific commit when used in rwkv.cpp, by introducing a fork with a small commit I don't add much development overhead.

Do you build a computational graph per time point of the sequence?

I don't really know; sequence eval code was contributed by @LoganDark. As far as I know, for some operations ggml tensor count grows with sequence length, for other operations it stays constant (for example, there is only one op for head matmul).


I'll close the issue so it's not taking space in Open, but feel free to continue the conversation here.

@LoganDark
Copy link
Contributor

Do you build a computational graph per time point of the sequence?

The computation graph is only built for the first time the sequence length changes. Then the existing sequence mode computation graph is re-used as long as the sequence length is the same.

@LoganDark
Copy link
Contributor

What is less clear is how you're making sure your computational graph is not made up of more than GGML_MAX_NODES nodes.

we simply increase that constant before building :)

@PABannier
Copy link
Author

@saharNooby alright thanks for the detailed answer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants