Alternative for instruct mode (Vicuna support, etc.) #863

aroidzap · 2023-04-09T04:23:35Z

Added support for --config (loading arguments from .txt file).
Added support for multiline mode, disabled by --disable-multiline (allows to input newlines - mode is toggled by sending EOF Ctrl+D on Linux/Mac, Ctrl+Z and Return on Windows)
Marked -ins mode as deprecated. Now it could be replaced with user defined --ins-prefix, --ins-suffix.
Added --stop-prompt alternative to --reverse-prompt - works same, but without adding instruct prefixes and suffixes (useful, when you're talking to agent and you want to force its response to something)
Added flexibility to --reverse-prompt and --stop-prompt, to stop, even when trailing space character is missing (because tokens usually starts with space character, and stop/reverse prompt won't trigger otherwise), enabled with --rm-trailing-space-workaround

There are some code duplicities, some todos and It's overall a bit messy.
But it should be working and could be merged, if you don't mind.

examples/common.cpp

examples/main/main.cpp

aroidzap · 2023-04-09T04:26:55Z

examples/main/main.cpp

 set_console_color(con_st, CONSOLE_COLOR_USER_INPUT);
 fflush(stdout);
 break;
 }
 }
+ if (!antiprompt.any) {


Unnecessary code duplication.

examples/main/main.cpp

prusnak · 2023-04-09T10:10:51Z

I like this approach!

One idea, I think it would be nice if configs had only one option per line. This would make future changes more easy to review. Since you already allow empty lines in the config files, we can group options by separating them with empty lines. No need to specify multiple options on one line.

Also it would be nice for review purposes if we used only long options in the config files - i.e. --prompt instead of -p

Last point: it's fine that --prompt spans on multiple lines.

anzz1 · 2023-04-09T10:10:54Z

While good as an idea, input prefix/suffix and prompt/reverse prompt are not the same thing and should not be put together or it messes with the tokenization.

-r "User:" --in-prefix " " and -r "User: " are different things with different results.

More info about this in the original --in-prefix PR #426

Basically this means that having prefix/suffix tokens and prefix/suffix strings means that you'd need 4 argument options. --reverse-prompt already acts exactly as stop-prompt, so that doesn't need to be changed, just understand the "consumed" part on instruct mode, instruct mode only changes whether these tokens are "consumed" and thus invisible to the user. With a custom instruct mode like this, the -ins option could simply be changed to mean whether the prefix/suffix tokens are visible to the user or not.

e: the functionality you're looking for could be easily implemented by simply replacing the hardcoded "\n\n### Instruction:\n\n" and \n\n### Response:\n\n with configurable arguments right here:

llama.cpp/main.cpp

Lines 243 to 244 in c96a80a

 const auto inp_pfx = ::llama_tokenize(ctx, "\n\n### Instruction:\n\n", true); 

 const auto inp_sfx = ::llama_tokenize(ctx, "\n\n### Response:\n\n", false);

to be honest the current implementation of instruct mode ain't perfect and could definitely use a rework but the relation between tokens and strings need to be understood when implementing the change.

aroidzap · 2023-04-09T18:04:06Z

@anzz1 I have tried to update a bit my implementation to revert changes to input-prefix.

aroidzap · 2023-04-09T18:10:55Z

I have added also "multiline" mode, that does not send input, until EOF is sent (Ctrl+D on Linux/Mac, Ctrl+Z on Windows). I was also trying to look into something like Shift+Enter and Ctrl+V to keep multi-lines (it somehow worked for Windows, but was a bit complicated for other plaftorms). In the end, I have dropped it - it is just not worth hacking CLI terminal to support fancy features. It it would be probably best, if llama.cpp would work as server with http and rest api later (possibly with multiple agents running in parallel as it is possible with mmap approach now).

aroidzap · 2023-04-09T18:15:47Z

examples/main/main.cpp

@@ -25,7 +25,8 @@ static bool is_interacting = false;
 #if defined (__unix__) || (defined (__APPLE__) && defined (__MACH__)) || defined (_WIN32)
 void sigint_handler(int signo) {
 set_console_color(con_st, CONSOLE_COLOR_DEFAULT);
- printf("\n"); // this also force flush stdout.
+ fflush(stdout);
+ fflush(stderr);


Maybe not needed, but it seemed better. Before I had sometimes colored terminal after killing llama.cpp. (Not sure if that is fixed by this though)

examples/common.cpp

aroidzap · 2023-04-09T18:17:26Z

examples/main/main.cpp

@@ -147,22 +150,20 @@ int main(int argc, char ** argv) {
 }

 // number of tokens to keep when resetting context
- if (params.n_keep < 0 || params.n_keep > (int)embd_inp.size() || params.instruct) {
+ if (params.n_keep < 0 || params.n_keep > (int)embd_inp.size()) {


FYI, user now should explicitly call --keep -1. This is added to configs, so it should not obstacle for new users.

examples/main/main.cpp

aroidzap · 2023-04-09T19:06:54Z

Changed multiline mode to be toggled by sending EOF.

…ate instruct mode, add stop prompt

slaren · 2023-04-10T21:15:44Z

Should we split main.cpp into two examples, one basic generation example that only takes a prompt and lets the LLM generate uninterrupted, and one chat example with all of this? I am afraid that main.cpp is becoming too complex and it is not really a good example for getting started with the library anymore.

prusnak · 2023-04-10T21:21:12Z

Should we split main.cpp into two examples

I think this is a great idea, but I'd probably save it after this PR is merged. No strong opinions here, though.

niltonvasques · 2023-04-11T18:51:33Z

It is really making the tests more easy with the new --config option. I tested it here and it working fine.

ggerganov

main.cpp could use some refactoring like moving small parts of the code in utility functions.
Could also make a separate minimal example just for text generation based on given prompt. But this can be done later

aroidzap · 2023-04-14T15:15:54Z

Can I merge it now? Or is there anything other I should address? @ggerganov

rabidcopy · 2023-04-14T18:53:04Z

I'm not sure what is going on but I can't seem to use basic interactive mode with a reverse prompt anymore. It refuses to give any output and continuously hands back control to me. Disabling multiline mode doesn't seem to change this. I'm thoroughly confused. This is me repeatedly pressing enter when it keeps giving me back input and I don't know why.
./main --keep -1 -c 2048 -n -1 -m [model] -t 6 --seed 5 -b 512 -r "Human:" -i -p "a conversation between a human and an assistant."

 a conversation between a human and an assistant.\
Human: Hello?

Ass
istant
:
 Hi







!






 How




 can I assist you today?

And even without the --interactive-first flag, it doesn't even generate and immediately gives me control at the end of the starting prompt.

As a sanity test I compiled with cmake over make and still the same behavior and compile warnings. It seems to give one token of output, then gives me back control. Almost as if it thinks an antiprompt is being detected every token.

 a conversation between a human and an assistant.\
Human: hello?

Ass
istant
:
 Hello
!
 How
 can
 I
 help
 you
 today
?
 Is
 there
 something
 you
 would
 like
 to
 talk
 about
 or
 ask
 me
 a
 question
?
 I
'
m
 here
 to
 assist
 you
 with
 anything
 you
 need
.

aroidzap · 2023-04-14T18:57:10Z

That's weird, we could probably revert this MR until it's solved.

ggerganov · 2023-04-14T19:05:37Z

If needed, it's OK to revert. I won't be able to look at the examples code soon, so fully counting on the collaborators to assist with this

This reverts commit f4d277a.

prusnak · 2023-04-14T19:24:48Z

Revert in #982

@aroidzap please open a new pull request when you are ready

…" (#982) This reverts commit f4d277a.

KevinColemanInc · 2023-04-14T21:48:36Z

Could we break this MR into smaller ones for each feature? A bug with one feature is preventing all the other features from landing.

deadprogram · 2023-04-14T21:53:58Z

That is interesting, the following command worked well for me from the master branch before this was reverted:

./main -m ~/Downloads/alpaca2/ggml-model-q4_1.bin --config ./configs/alpaca.txt --ctx_size 2048 -n -1 -b 256 --top_k 10000 --temp 0.2 --repeat_penalty 1 -t 7 -p "A continuación hay una instrucción que describe una tarea. Escriba una respuesta que complete adecuadamente la solicitud."

I did modify config/alpaca.txt by changing ---reverse-prompt to --stop-prompt

$ git diff configs/alpaca.txt
diff --git a/configs/alpaca.txt b/configs/alpaca.txt
index 99a3ab4..3a65e61 100644
--- a/configs/alpaca.txt
+++ b/configs/alpaca.txt
@@ -4,6 +4,6 @@
 --ins-prefix-bos
 --ins-prefix "\n\n### Instruction:\n\n"
 --ins-suffix "\n\n### Response:\n\n"
---reverse-prompt "### Instruction:\n\n"
+--stop-prompt "### Instruction:\n\n"

Perhaps a session with git bisect could help? @rabidcopy which model did you use when you ran into this issue?

rabidcopy · 2023-04-14T22:27:11Z

Perhaps a session with git bisect could help? @rabidcopy which model did you use when you ran into this issue?

All of them. Vicuna, Vicuna 1.1, Alpaca, gpt4all, gpt4-x-alpaca, etc. They all dole out a token per line and give back control.

However I think I'm on to something. When using the --config option instead of manually setting options in the terminal. The new instruct mode works. The only difference being that --rm-trailing-space-workaround was passed this time from vicuna.txt. I put --rm-trailing-space-workaround into my terminal command that previously had issues. Now the issue is gone and things seem to work. New observed issue, pressing Ctrl+D to toggle multi-line mode only works to enable it, and can't seem to be disabled after that. (thus the session is effectively over because you will only be able to input new lines and not get output)

So two issues are here + a unforeseen one.

Interactive mode + reverse prompt/ new alternative instruct mode doesn't work without using --rm-trailing-space-workaround. (why is this an optional flag then? that's probably why this issue was missed because it was always being passed in the config files by default)
Multi-line can be enabled with Ctrl+D but not disabled on Linux.

This section in particular seems to no longer function. (Just the inject reverse prompt on EOS part, EOS is still ignored and replaced with a newline when in interactive mode.)

llama.cpp/examples/main/main.cpp

Lines 310 to 312 in c85e03d

 // tokenize and inject first reverse prompt 

 const auto first_antiprompt = ::llama_tokenize(ctx, params.antiprompt.front(), false); 

 embd_inp.insert(embd_inp.end(), first_antiprompt.begin(), first_antiprompt.end());

jxy · 2023-04-15T03:54:30Z

It looks like just a few really wrong shell scripts, in which the author falsely think a shell string "\n" represent a single newline, 0x0a. Typically it's the shell builtin echo does the translation of \. So just change those to something like

    --ins-prefix '

### Instruction:

'