-
Notifications
You must be signed in to change notification settings - Fork 77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[QUESTION] Implementing RNN/LSTM with ggml #136
Comments
Hi!
That's correct; please also note that other structures (
It's simple -- I don't! I just use a fork of See this single commit in my fork of Since
I don't really know; sequence eval code was contributed by @LoganDark. As far as I know, for some operations ggml tensor count grows with sequence length, for other operations it stays constant (for example, there is only one op for head matmul). I'll close the issue so it's not taking space in Open, but feel free to continue the conversation here. |
The computation graph is only built for the first time the sequence length changes. Then the existing sequence mode computation graph is re-used as long as the sequence length is the same. |
we simply increase that constant before building :) |
@saharNooby alright thanks for the detailed answer. |
Hi @saharNooby !
I did not know where to post this message to reach you out. So I opened an issue :)
We currently have this issue on ggml (ggerganov/ggml#467) . I'm trying to implement a LSTM layer with ggml. However, since the computational graph of the LSTM layer grows with the sequence length, I am often limited by the
GGML_MAX_NODES
constant. I was toldrwkv.cpp
implemented a serial graph which closely resembles the graph a RNN would have.Diving into your code, I realize that you're storing the computational graph on the heap to avoid stack overflows with these very large graphs. What is less clear is how you're making sure your computational graph is not made up of more than
GGML_MAX_NODES
nodes.Could you explain to me how you designed your forward pass? Do you build a computational graph per time point of the sequence?
Thanks in advance for your answer
The text was updated successfully, but these errors were encountered: