-
Notifications
You must be signed in to change notification settings - Fork 959
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Suggestion: Interactive Demo (second example code) #31
Comments
Hi @ggerganov would you be open to contribution on this / a slightly more advanced use case. We basically want to run GGML in an interactive process that can accept new prompts via a socket and writes them. For demo purposes we could also add the ability to answer multiple prompts via stdin. I could have a basic draft of what I am thinking this week if that makes sense to figure out if we are on the right track. Basically I want to add |
I think at some point https://github.com/ggerganov/llama.cpp will start supporting most of the available LLMs (not just LLaMA) so it will serve as a good "interactive" example. But, maybe we can also have a simple |
* Apply fixes suggested to build on windows Issue: ggerganov/llama.cpp#22 * Remove unsupported VLAs * MSVC: Remove features that are only available on MSVC C++20. * Fix zero initialization of the other fields. * Change the use of vector for stack allocations.
Is it possible to modify the C++ to create a second example source code file, that loads the model once, before sending new prompts read in a loop from STDIN?
After backing up the original C++ source code file, I modified the code to read a prompt from STDIN in a loop, instead of argv. There were no errors going through the loop, generating responses, except I seem to be processing new responses from the first prompt read from STDIN, over again, instead processing the new subsequent prompts read from STDIN.
The funny part, is these unintended results may be useful for prompt engineering in the future, to keep the context. But first, the goal would be to try save time by avoiding reloads of the model for generating responses to each new prompt in a loop. Lastly, this is a suggestion for a second separate example source code file. The first example code source code file is correct and very useful.
The text was updated successfully, but these errors were encountered: