Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alternative for instruct mode (Vicuna support, etc.) #863

Merged
merged 14 commits into from
Apr 14, 2023

Conversation

aroidzap
Copy link
Collaborator

@aroidzap aroidzap commented Apr 9, 2023

  • Added support for --config (loading arguments from .txt file).
  • Added support for multiline mode, disabled by --disable-multiline (allows to input newlines - mode is toggled by sending EOF Ctrl+D on Linux/Mac, Ctrl+Z and Return on Windows)
  • Marked -ins mode as deprecated. Now it could be replaced with user defined --ins-prefix, --ins-suffix.
  • Added --stop-prompt alternative to --reverse-prompt - works same, but without adding instruct prefixes and suffixes (useful, when you're talking to agent and you want to force its response to something)
  • Added flexibility to --reverse-prompt and --stop-prompt, to stop, even when trailing space character is missing (because tokens usually starts with space character, and stop/reverse prompt won't trigger otherwise), enabled with --rm-trailing-space-workaround

There are some code duplicities, some todos and It's overall a bit messy.
But it should be working and could be merged, if you don't mind.

examples/common.cpp Outdated Show resolved Hide resolved
examples/main/main.cpp Outdated Show resolved Hide resolved
set_console_color(con_st, CONSOLE_COLOR_USER_INPUT);
fflush(stdout);
break;
}
}
if (!antiprompt.any) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unnecessary code duplication.

examples/main/main.cpp Outdated Show resolved Hide resolved
@prusnak
Copy link
Sponsor Collaborator

prusnak commented Apr 9, 2023

I like this approach!

One idea, I think it would be nice if configs had only one option per line. This would make future changes more easy to review. Since you already allow empty lines in the config files, we can group options by separating them with empty lines. No need to specify multiple options on one line.

Also it would be nice for review purposes if we used only long options in the config files - i.e. --prompt instead of -p

Last point: it's fine that --prompt spans on multiple lines.

@anzz1
Copy link
Contributor

anzz1 commented Apr 9, 2023

While good as an idea, input prefix/suffix and prompt/reverse prompt are not the same thing and should not be put together or it messes with the tokenization.

-r "User:" --in-prefix " " and -r "User: " are different things with different results.

More info about this in the original --in-prefix PR #426

Basically this means that having prefix/suffix tokens and prefix/suffix strings means that you'd need 4 argument options. --reverse-prompt already acts exactly as stop-prompt, so that doesn't need to be changed, just understand the "consumed" part on instruct mode, instruct mode only changes whether these tokens are "consumed" and thus invisible to the user. With a custom instruct mode like this, the -ins option could simply be changed to mean whether the prefix/suffix tokens are visible to the user or not.

e: the functionality you're looking for could be easily implemented by simply replacing the hardcoded "\n\n### Instruction:\n\n" and \n\n### Response:\n\n with configurable arguments right here:

llama.cpp/main.cpp

Lines 243 to 244 in c96a80a

const auto inp_pfx = ::llama_tokenize(ctx, "\n\n### Instruction:\n\n", true);
const auto inp_sfx = ::llama_tokenize(ctx, "\n\n### Response:\n\n", false);

to be honest the current implementation of instruct mode ain't perfect and could definitely use a rework but the relation between tokens and strings need to be understood when implementing the change.

@aroidzap
Copy link
Collaborator Author

aroidzap commented Apr 9, 2023

@anzz1 I have tried to update a bit my implementation to revert changes to input-prefix.

@aroidzap
Copy link
Collaborator Author

aroidzap commented Apr 9, 2023

I have added also "multiline" mode, that does not send input, until EOF is sent (Ctrl+D on Linux/Mac, Ctrl+Z on Windows). I was also trying to look into something like Shift+Enter and Ctrl+V to keep multi-lines (it somehow worked for Windows, but was a bit complicated for other plaftorms). In the end, I have dropped it - it is just not worth hacking CLI terminal to support fancy features. It it would be probably best, if llama.cpp would work as server with http and rest api later (possibly with multiple agents running in parallel as it is possible with mmap approach now).

@@ -25,7 +25,8 @@ static bool is_interacting = false;
#if defined (__unix__) || (defined (__APPLE__) && defined (__MACH__)) || defined (_WIN32)
void sigint_handler(int signo) {
set_console_color(con_st, CONSOLE_COLOR_DEFAULT);
printf("\n"); // this also force flush stdout.
fflush(stdout);
fflush(stderr);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe not needed, but it seemed better. Before I had sometimes colored terminal after killing llama.cpp. (Not sure if that is fixed by this though)

examples/common.cpp Outdated Show resolved Hide resolved
@@ -147,22 +150,20 @@ int main(int argc, char ** argv) {
}

// number of tokens to keep when resetting context
if (params.n_keep < 0 || params.n_keep > (int)embd_inp.size() || params.instruct) {
if (params.n_keep < 0 || params.n_keep > (int)embd_inp.size()) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, user now should explicitly call --keep -1. This is added to configs, so it should not obstacle for new users.

examples/main/main.cpp Outdated Show resolved Hide resolved
@aroidzap
Copy link
Collaborator Author

aroidzap commented Apr 9, 2023

Changed multiline mode to be toggled by sending EOF.

@aroidzap aroidzap changed the title Draft: Alternative for instruct mode (Vicuna support, etc.) Alternative for instruct mode (Vicuna support, etc.) Apr 9, 2023
@slaren
Copy link
Collaborator

slaren commented Apr 10, 2023

Should we split main.cpp into two examples, one basic generation example that only takes a prompt and lets the LLM generate uninterrupted, and one chat example with all of this? I am afraid that main.cpp is becoming too complex and it is not really a good example for getting started with the library anymore.

@prusnak
Copy link
Sponsor Collaborator

prusnak commented Apr 10, 2023

Should we split main.cpp into two examples

I think this is a great idea, but I'd probably save it after this PR is merged. No strong opinions here, though.

@niltonvasques
Copy link

It is really making the tests more easy with the new --config option. I tested it here and it working fine.

Copy link
Owner

@ggerganov ggerganov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

main.cpp could use some refactoring like moving small parts of the code in utility functions.
Could also make a separate minimal example just for text generation based on given prompt. But this can be done later

@aroidzap
Copy link
Collaborator Author

aroidzap commented Apr 14, 2023

Can I merge it now? Or is there anything other I should address? @ggerganov

@ggerganov ggerganov merged commit f4d277a into ggerganov:master Apr 14, 2023
@rabidcopy
Copy link
Contributor

rabidcopy commented Apr 14, 2023

I'm not sure what is going on but I can't seem to use basic interactive mode with a reverse prompt anymore. It refuses to give any output and continuously hands back control to me. Disabling multiline mode doesn't seem to change this. I'm thoroughly confused. This is me repeatedly pressing enter when it keeps giving me back input and I don't know why.
./main --keep -1 -c 2048 -n -1 -m [model] -t 6 --seed 5 -b 512 -r "Human:" -i -p "a conversation between a human and an assistant."

 a conversation between a human and an assistant.\
Human: Hello?

Ass
istant
:
 Hi







!






 How




 can I assist you today?

And even without the --interactive-first flag, it doesn't even generate and immediately gives me control at the end of the starting prompt.

As a sanity test I compiled with cmake over make and still the same behavior and compile warnings. It seems to give one token of output, then gives me back control. Almost as if it thinks an antiprompt is being detected every token.

 a conversation between a human and an assistant.\
Human: hello?

Ass
istant
:
 Hello
!
 How
 can
 I
 help
 you
 today
?
 Is
 there
 something
 you
 would
 like
 to
 talk
 about
 or
 ask
 me
 a
 question
?
 I
'
m
 here
 to
 assist
 you
 with
 anything
 you
 need
.

@aroidzap
Copy link
Collaborator Author

That's weird, we could probably revert this MR until it's solved.

@ggerganov
Copy link
Owner

If needed, it's OK to revert. I won't be able to look at the examples code soon, so fully counting on the collaborators to assist with this

@prusnak
Copy link
Sponsor Collaborator

prusnak commented Apr 14, 2023

Revert in #982

@aroidzap please open a new pull request when you are ready

ggerganov pushed a commit that referenced this pull request Apr 14, 2023
@KevinColemanInc
Copy link

Could we break this MR into smaller ones for each feature? A bug with one feature is preventing all the other features from landing.

@deadprogram
Copy link
Collaborator

That is interesting, the following command worked well for me from the master branch before this was reverted:

./main -m ~/Downloads/alpaca2/ggml-model-q4_1.bin --config ./configs/alpaca.txt --ctx_size 2048 -n -1 -b 256 --top_k 10000 --temp 0.2 --repeat_penalty 1 -t 7 -p "A continuación hay una instrucción que describe una tarea. Escriba una respuesta que complete adecuadamente la solicitud."

I did modify config/alpaca.txt by changing ---reverse-prompt to --stop-prompt

$ git diff configs/alpaca.txt
diff --git a/configs/alpaca.txt b/configs/alpaca.txt
index 99a3ab4..3a65e61 100644
--- a/configs/alpaca.txt
+++ b/configs/alpaca.txt
@@ -4,6 +4,6 @@
 --ins-prefix-bos
 --ins-prefix "\n\n### Instruction:\n\n"
 --ins-suffix "\n\n### Response:\n\n"
---reverse-prompt "### Instruction:\n\n"
+--stop-prompt "### Instruction:\n\n"

Perhaps a session with git bisect could help? @rabidcopy which model did you use when you ran into this issue?

@rabidcopy
Copy link
Contributor

rabidcopy commented Apr 14, 2023

Perhaps a session with git bisect could help? @rabidcopy which model did you use when you ran into this issue?

All of them. Vicuna, Vicuna 1.1, Alpaca, gpt4all, gpt4-x-alpaca, etc. They all dole out a token per line and give back control.

However I think I'm on to something. When using the --config option instead of manually setting options in the terminal. The new instruct mode works. The only difference being that --rm-trailing-space-workaround was passed this time from vicuna.txt. I put --rm-trailing-space-workaround into my terminal command that previously had issues. Now the issue is gone and things seem to work. New observed issue, pressing Ctrl+D to toggle multi-line mode only works to enable it, and can't seem to be disabled after that. (thus the session is effectively over because you will only be able to input new lines and not get output)

So two issues are here + a unforeseen one.

  1. Interactive mode + reverse prompt/ new alternative instruct mode doesn't work without using --rm-trailing-space-workaround. (why is this an optional flag then? that's probably why this issue was missed because it was always being passed in the config files by default)
  2. Multi-line can be enabled with Ctrl+D but not disabled on Linux.
  3. This section in particular seems to no longer function. (Just the inject reverse prompt on EOS part, EOS is still ignored and replaced with a newline when in interactive mode.)
    // tokenize and inject first reverse prompt
    const auto first_antiprompt = ::llama_tokenize(ctx, params.antiprompt.front(), false);
    embd_inp.insert(embd_inp.end(), first_antiprompt.begin(), first_antiprompt.end());

@jxy
Copy link
Contributor

jxy commented Apr 15, 2023

It looks like just a few really wrong shell scripts, in which the author falsely think a shell string "\n" represent a single newline, 0x0a. Typically it's the shell builtin echo does the translation of \. So just change those to something like

    --ins-prefix '

### Instruction:

'

@ejones ejones mentioned this pull request May 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet