-
-
Notifications
You must be signed in to change notification settings - Fork 206
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for OpenAI completion models, in particular gpt-3.5-turbo-instruct #284
Comments
Using this model involves API documentation: https://platform.openai.com/docs/api-reference/completions/create Example from https://platform.openai.com/docs/guides/gpt/completions-api response = openai.Completion.create(
model="gpt-3.5-turbo-instruct",
prompt="Write a tagline for an ice cream shop."
) The default for |
Got this prototype working: diff --git a/llm/default_plugins/openai_models.py b/llm/default_plugins/openai_models.py
index fd8c689..8b7854e 100644
--- a/llm/default_plugins/openai_models.py
+++ b/llm/default_plugins/openai_models.py
@@ -22,6 +22,10 @@ def register_models(register):
register(Chat("gpt-3.5-turbo-16k"), aliases=("chatgpt-16k", "3.5-16k"))
register(Chat("gpt-4"), aliases=("4", "gpt4"))
register(Chat("gpt-4-32k"), aliases=("4-32k",))
+ register(
+ Completion("gpt-3.5-turbo-instruct"),
+ aliases=("3.5-instruct", "chatgpt-instruct"),
+ )
# Load extra models
extra_path = llm.user_dir() / "extra-openai-models.yaml"
if not extra_path.exists():
@@ -249,6 +253,32 @@ class Chat(Model):
messages.append({"role": "system", "content": prompt.system})
messages.append({"role": "user", "content": prompt.prompt})
response._prompt_json = {"messages": messages}
+ kwargs = self.build_kwargs(prompt)
+ if stream:
+ completion = openai.ChatCompletion.create(
+ model=self.model_name or self.model_id,
+ messages=messages,
+ stream=True,
+ **kwargs,
+ )
+ chunks = []
+ for chunk in completion:
+ chunks.append(chunk)
+ content = chunk["choices"][0].get("delta", {}).get("content")
+ if content is not None:
+ yield content
+ response.response_json = combine_chunks(chunks)
+ else:
+ completion = openai.ChatCompletion.create(
+ model=self.model_name or self.model_id,
+ messages=messages,
+ stream=False,
+ **kwargs,
+ )
+ response.response_json = completion.to_dict_recursive()
+ yield completion.choices[0].message.content
+
+ def build_kwargs(self, prompt):
kwargs = dict(not_nulls(prompt.options))
if self.api_base:
kwargs["api_base"] = self.api_base
@@ -267,29 +297,45 @@ class Chat(Model):
kwargs["api_key"] = "DUMMY_KEY"
if self.headers:
kwargs["headers"] = self.headers
+ return kwargs
+
+
+class Completion(Chat):
+ def __str__(self):
+ return "OpenAI Completion: {}".format(self.model_id)
+
+ def execute(self, prompt, stream, response, conversation=None):
+ messages = []
+ if conversation is not None:
+ for prev_response in conversation.responses:
+ messages.append(prev_response.prompt.prompt)
+ messages.append(prev_response.text())
+ messages.append(prompt.prompt)
+ response._prompt_json = {"messages": messages}
+ kwargs = self.build_kwargs(prompt)
if stream:
- completion = openai.ChatCompletion.create(
+ completion = openai.Completion.create(
model=self.model_name or self.model_id,
- messages=messages,
+ prompt="\n".join(messages),
stream=True,
**kwargs,
)
chunks = []
for chunk in completion:
chunks.append(chunk)
- content = chunk["choices"][0].get("delta", {}).get("content")
+ content = chunk["choices"][0].get("text") or ""
if content is not None:
yield content
response.response_json = combine_chunks(chunks)
else:
- completion = openai.ChatCompletion.create(
+ completion = openai.Completion.create(
model=self.model_name or self.model_id,
- messages=messages,
+ prompt="\n".join(messages),
stream=False,
**kwargs,
)
response.response_json = completion.to_dict_recursive()
- yield completion.choices[0].message.content
+ yield completion.choices[0]["text"]
def not_nulls(data) -> dict:
@@ -303,6 +349,9 @@ def combine_chunks(chunks: List[dict]) -> dict:
for item in chunks:
for choice in item["choices"]:
+ if "text" in choice and "delta" not in choice:
+ content += choice["text"]
+ continue
if "role" in choice["delta"]:
role = choice["delta"]["role"]
if "content" in choice["delta"]: Example usage: llm -m chatgpt-instruct 'A poem about otters:'
llm -m chatgpt-instruct 'A poem about otters:' -o max_tokens 128
|
One option for system prompts: chuck it in |
That default token size of 16 is really small. https://platform.openai.com/playground?mode=complete defaults to 256 so I'm going to use that default instead. |
This is annoying: with a default llm -m instruct 'poem about an otter'
I checked and the output there is exactly 256. I was hoping it would be somehow aware of that limit. |
I'm going to add a
|
Tried this: llm -m instruct 'poem about an otter' -o max_tokens 10 -o logprobs 3 Looks like i need to take extra steps to get it to show up in the logged database response. |
I tried storing the full recursive dictionary version of the response if sqlite-utils "$(llm logs path)" 'select * from responses order by id desc limit 1' | jq '.[0].response_json' -r | jq {
"content": "\n\nIn the river, sleek and sly,\n",
"role": null,
"finish_reason": null,
"chunks": [
{
"id": "cmpl-80LVJQkySEtmJWWug1AH0t7IN8vOO",
"object": "text_completion",
"created": 1695093353,
"choices": [
{
"text": "\n\n",
"index": 0,
"logprobs": {
"tokens": [
"\n\n"
],
"token_logprobs": [
-0.19434144
],
"top_logprobs": [
{
"\n\n": -0.19434144,
"\n": -2.2880914,
" \n\n": -5.4443407
}
],
"text_offset": [
19
]
},
"finish_reason": null
}
],
"model": "gpt-3.5-turbo-instruct"
},
{
"id": "cmpl-80LVJQkySEtmJWWug1AH0t7IN8vOO",
"object": "text_completion",
"created": 1695093353,
"choices": [
{
"text": "In",
"index": 0,
"logprobs": {
"tokens": [
"In"
],
"token_logprobs": [
-1.7583014
],
"top_logprobs": [
{
"In": -1.7583014,
"S": -1.3833013,
"Grace": -1.7426763
}
],
"text_offset": [
21
]
},
"finish_reason": null
}
],
"model": "gpt-3.5-turbo-instruct"
},
{
"id": "cmpl-80LVJQkySEtmJWWug1AH0t7IN8vOO",
"object": "text_completion",
"created": 1695093353,
"choices": [
{
"text": " the",
"index": 0,
"logprobs": {
"tokens": [
" the"
],
"token_logprobs": [
-0.13828558
],
"top_logprobs": [
{
" the": -0.13828558,
" a": -2.6539104,
" rivers": -3.9507854
}
],
"text_offset": [
23
]
},
"finish_reason": null
}
],
"model": "gpt-3.5-turbo-instruct"
},
{
"id": "cmpl-80LVJQkySEtmJWWug1AH0t7IN8vOO",
"object": "text_completion",
"created": 1695093353,
"choices": [
{
"text": " river",
"index": 0,
"logprobs": {
"tokens": [
" river"
],
"token_logprobs": [
-0.78688633
],
"top_logprobs": [
{
" river": -0.78688633,
" water": -3.0681362,
" sparkling": -3.099386
}
],
"text_offset": [
27
]
},
"finish_reason": null
}
],
"model": "gpt-3.5-turbo-instruct"
},
{
"id": "cmpl-80LVJQkySEtmJWWug1AH0t7IN8vOO",
"object": "text_completion",
"created": 1695093353,
"choices": [
{
"text": ",",
"index": 0,
"logprobs": {
"tokens": [
","
],
"token_logprobs": [
-0.94506526
],
"top_logprobs": [
{
",": -0.94506526,
"'s": -1.0700654,
" she": -2.6325655
}
],
"text_offset": [
33
]
},
"finish_reason": null
}
],
"model": "gpt-3.5-turbo-instruct"
},
{
"id": "cmpl-80LVJQkySEtmJWWug1AH0t7IN8vOO",
"object": "text_completion",
"created": 1695093353,
"choices": [
{
"text": " sleek",
"index": 0,
"logprobs": {
"tokens": [
" sleek"
],
"token_logprobs": [
-1.239402
],
"top_logprobs": [
{
" sleek": -1.239402,
" swift": -0.723777,
" graceful": -3.2550268
}
],
"text_offset": [
34
]
},
"finish_reason": null
}
],
"model": "gpt-3.5-turbo-instruct"
},
{
"id": "cmpl-80LVJQkySEtmJWWug1AH0t7IN8vOO",
"object": "text_completion",
"created": 1695093353,
"choices": [
{
"text": " and",
"index": 0,
"logprobs": {
"tokens": [
" and"
],
"token_logprobs": [
-0.00032777296
],
"top_logprobs": [
{
" and": -0.00032777296,
" as": -8.484702,
" ot": -10.187827
}
],
"text_offset": [
40
]
},
"finish_reason": null
}
],
"model": "gpt-3.5-turbo-instruct"
},
{
"id": "cmpl-80LVJQkySEtmJWWug1AH0t7IN8vOO",
"object": "text_completion",
"created": 1695093353,
"choices": [
{
"text": " s",
"index": 0,
"logprobs": {
"tokens": [
" s"
],
"token_logprobs": [
-1.6860211
],
"top_logprobs": [
{
" s": -1.6860211,
" swift": -1.1078961,
" quick": -1.7485211
}
],
"text_offset": [
44
]
},
"finish_reason": null
}
],
"model": "gpt-3.5-turbo-instruct"
},
{
"id": "cmpl-80LVJQkySEtmJWWug1AH0t7IN8vOO",
"object": "text_completion",
"created": 1695093353,
"choices": [
{
"text": "ly",
"index": 0,
"logprobs": {
"tokens": [
"ly"
],
"token_logprobs": [
-0.014393331
],
"top_logprobs": [
{
"ly": -0.014393331,
"velte": -4.3425183,
"li": -7.6550174
}
],
"text_offset": [
46
]
},
"finish_reason": null
}
],
"model": "gpt-3.5-turbo-instruct"
},
{
"id": "cmpl-80LVJQkySEtmJWWug1AH0t7IN8vOO",
"object": "text_completion",
"created": 1695093353,
"choices": [
{
"text": ",\n",
"index": 0,
"logprobs": {
"tokens": [
",\n"
],
"token_logprobs": [
-0.79212964
],
"top_logprobs": [
{
",\n": -0.79212964,
"\n": -0.7452547,
" \n": -3.3858798
}
],
"text_offset": [
48
]
},
"finish_reason": "length"
}
],
"model": "gpt-3.5-turbo-instruct"
},
{
"id": "cmpl-80LVJQkySEtmJWWug1AH0t7IN8vOO",
"object": "text_completion",
"created": 1695093353,
"choices": [
{
"text": "",
"index": 0,
"logprobs": {
"tokens": [],
"token_logprobs": [],
"top_logprobs": [],
"text_offset": []
},
"finish_reason": "length"
}
],
"model": "gpt-3.5-turbo-instruct"
}
],
"id": "cmpl-80LVJQkySEtmJWWug1AH0t7IN8vOO",
"object": "text_completion",
"model": "gpt-3.5-turbo-instruct",
"created": 1695093353
} That's pretty noisy though! Imagine that for a much longer response. |
I'll try storing a |
That's a bit neater: llm -m instruct 'slogan for an an otter-run bakery' -o max_tokens 16 -o logprobs 3
sqlite-utils "$(llm logs path)" 'select * from responses order by id desc limit 1' \
| jq '.[0].response_json' -r | jq {
"content": "\n\n\"Fresh Treats That'll Make You Otterly Happy!\"",
"role": null,
"finish_reason": null,
"logprobs": [
{
"text": "\n",
"top_logprobs": [
{
"\n": -1.9688251,
"\n\n": -0.3125751,
":": -3.968825
}
]
},
{
"text": "\n",
"top_logprobs": [
{
"\n": -0.33703142,
"\"": -1.5089064,
"\"S": -3.8995314
}
]
},
{
"text": "\"",
"top_logprobs": [
{
"\"": -0.13510895,
"\"S": -2.588234,
"\"B": -4.119484
}
]
},
{
"text": "Fresh",
"top_logprobs": [
{
"Fresh": -1.4623433,
"Where": -1.1498433,
"Ind": -2.5560932
}
]
},
{
"text": " Treat",
"top_logprobs": [
{
"ly": -0.39274898,
" treats": -1.861499,
" baked": -3.4396236
}
]
},
{
"text": "s",
"top_logprobs": [
{
"s": -6.5092986e-06,
",": -13.406255,
"z": -13.82813
}
]
},
{
"text": " That",
"top_logprobs": [
{
" from": -0.78117555,
" Straight": -1.6874255,
",": -1.7968005
}
]
},
{
"text": "'ll",
"top_logprobs": [
{
"'ll": -1.1900537,
" Will": -1.4088038,
" Make": -1.4556787
}
]
},
{
"text": " Make",
"top_logprobs": [
{
" Make": -0.14272869,
" Have": -2.8614783,
" Leave": -3.5802286
}
]
},
{
"text": " You",
"top_logprobs": [
{
" You": -0.26677564,
" Your": -1.6730256,
" a": -3.1574006
}
]
},
{
"text": " Ot",
"top_logprobs": [
{
" Ot": -0.58028656,
" Flip": -2.2677865,
" Float": -2.2990365
}
]
},
{
"text": "ter",
"top_logprobs": [
{
"ter": -0.0004432111,
"term": -8.4223175,
"terr": -9.3598175
}
]
},
{
"text": "ly",
"top_logprobs": [
{
"ly": -0.040374786,
"-": -3.7903748,
"-L": -4.8216248
}
]
},
{
"text": " Happy",
"top_logprobs": [
{
" Happy": -0.18345116,
" S": -1.9959509,
" Del": -3.8553257
}
]
},
{
"text": "!\"",
"top_logprobs": [
{
"!\"": -0.035835575,
"\"": -3.4264605,
"!\"\n": -7.2077103
}
]
},
{
"text": "",
"top_logprobs": []
}
],
"id": "cmpl-80LaT4oS6rQPFUg3BfbNn7xBsoIix",
"object": "text_completion",
"model": "gpt-3.5-turbo-instruct",
"created": 1695093673
} |
It's a bit annoying that the only way to see the log probs is to dig around in the SQLite database for them. I can't think of a clean way to let people opt into seeing them on One option would be to teach the It's a bit weird to have code in |
Since I opened a fresh issue for it I won't consider |
Suffix support is unique to completion models and looks interesting too: https://platform.openai.com/docs/api-reference/completions/create#completions/create-suffix |
In trying to write a test for the new import requests
import json
def log_json(response, *args, **kwargs):
try:
data = response.json()
print(json.dumps(data, indent=4))
except ValueError:
# No JSON data in the response
pass
return response
import openai
openai.requestssession = requests.Session()
openai.requestssession.hooks['response'].append(log_json) This worked, but only with llm -m instruct 'slogan for an an otter-run bakery' -o max_tokens 16 -o logprobs 3 --no-stream {
"id": "cmpl-80M2iXLmVOtxrxosK1crpgoxvpk2x",
"object": "text_completion",
"created": 1695095424,
"model": "gpt-3.5-turbo-instruct",
"choices": [
{
"text": "\n\n\"Fresh treats straight from the riverbank!\"",
"index": 0,
"logprobs": {
"tokens": [
"\n\n",
"\"",
"Fresh",
" treats",
" straight",
" from",
" the",
" river",
"bank",
"!\""
],
"token_logprobs": [
-0.3125751,
-0.18438435,
-1.3464811,
-1.6608365,
-2.354677,
-0.0038274676,
-0.013088844,
-0.15179907,
-1.417837,
-0.41310275
],
"top_logprobs": [
{
"\n\n": -0.3125751,
"\n": -1.9688251,
":": -3.968825
},
{
"\"": -0.18438435,
"\"S": -2.293759,
"\"B": -3.9812593
},
{
"Fresh": -1.3464811,
"Where": -1.1277312,
"Making": -2.549606
},
{
" treats": -1.6608365,
"ly": -0.48896152,
" baked": -3.2389612
},
{
" from": -1.0734268,
",": -1.276552,
" that": -2.167177
},
{
" from": -0.0038274676,
" out": -5.738202,
" ot": -7.519452
},
{
" the": -0.013088844,
" our": -4.4662137,
" nature": -6.825588
},
{
" river": -0.15179907,
" ot": -2.308049,
" water": -3.917424
},
{
"bank": -1.417837,
"'s": -0.5584622,
" bank": -2.917837
},
{
"!\"": -0.41310275,
" to": -2.0537276,
",": -2.2099779
}
],
"text_offset": [
33,
35,
36,
41,
48,
57,
62,
66,
72,
76
]
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 9,
"completion_tokens": 10,
"total_tokens": 19
}
} This is notably different from the format that I get for streaming responses. It also highlights that the tests should really mock streaming responses too. |
Changing that debug function to just
|
I'm going to need tests for the LLM_OPENAI_SHOW_RESPONSES=1 llm -m 3.5-instruct 'say hi, one word' -o logprobs 2 Outputs:
And with LLM_OPENAI_SHOW_RESPONSES=1 llm -m 3.5-instruct 'say hi, one word' -o logprobs 2 --no-stream {
"id": "cmpl-80MeBfKJutM0uMNJkRrebJLeP3bxL",
"object": "text_completion",
"created": 1695097747,
"model": "gpt-3.5-turbo-instruct",
"choices": [
{
"text": "\n\nHi!",
"index": 0,
"logprobs": {
"tokens": [
"\n\n",
"Hi",
"!"
],
"token_logprobs": [
-0.61127675,
-1.0273004,
-0.9450184
],
"top_logprobs": [
{
"\n\n": -0.61127675,
"\n": -1.9706517
},
{
"Hi": -1.0273004,
"Hello": -0.73042536
},
{
"!": -0.9450184,
".": -1.1168935
}
],
"text_offset": [
16,
18,
20
]
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 5,
"completion_tokens": 3,
"total_tokens": 8
}
} |
My mocking trick isn't working here (despite working in #287).
https://github.com/openai/openai-python/blob/main/openai/tests/test_api_requestor.py is how OpenAI do it, but it's not very useful. |
My mistake, that trick DOES work, I was using the wrong fixture. |
Since I'm already doing this: completion = openai.Completion.create(
model=self.model_name or self.model_id,
prompt="\n".join(messages),
stream=False,
**kwargs,
)
response.response_json = completion.to_dict_recursive()
yield completion.choices[0]["text"] Which dumps the entire They'll end up in the DB in a slightly different format. I'm OK with that: {
"choices": [
{
"finish_reason": "stop",
"index": 0,
"logprobs": {
"text_offset": [16, 18, 20],
"token_logprobs": [-0.6, -1.1, -0.9],
"tokens": ["\n\n", "Hi", "1"],
"top_logprobs": [
{"\n": -1.9, "\n\n": -0.6},
{"Hello": -0.7, "Hi": -1.1},
{"!": -1.1, ".": -0.9},
],
},
"text": "\n\nHi.",
}
],
"created": 1695097747,
"id": "cmpl-80MeBfKJutM0uMNJkRrebJLeP3bxL",
"model": "gpt-3.5-turbo-instruct",
"object": "text_completion",
"usage": {"completion_tokens": 3, "prompt_tokens": 5, "total_tokens": 8},
} |
Despite being listed in the documentation, the curl https://api.openai.com/v1/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-d '{
"model": "gpt-3.5-turbo-instruct",
"prompt": "Say this is a test",
"max_tokens": 7,
"suffix": "dog",
"temperature": 0
}' {
"error": {
"message": "Unrecognized request argument supplied: suffix",
"type": "invalid_request_error",
"param": null,
"code": null
}
} |
I managed to submit Dropping |
I'm going to do the |
Wouldn't throwing an error be a better alternative? These models aren't designed to use system prompts. It's better to communicate that to end users so that they can adjust their API calls accordingly. |
I'm really torn on this. The reason I'm leaning towards keeping system prompts working is that a really useful application of LLM is to compare the results you get from different models. It would be frustrating if you tried to compare the results of a prompt with a system prompt and got an error back because one of the dozen models you chose to use didn't support system prompts. Plus, really when you look at what system prompts actually do in other models, they're basically just injected into the regular prompt with extra markup around them. For Llama 2 that looks like this, for example:
So wrapping the system prompt in bold is actually a pretty honest imitation of how they work everywhere else! |
Hmmm... thinking about it, we actually have a bit of a precedent problem here. llm-claude silently ignores system prompts, which caught me out already - I fed it a system prompt and it didn't work, but I didn't realize because I didn't get an error. On that basis, an error would actually be a better solution. |
Based on that decision in: I'm going to have this plugin raise an error if you try to send it a system prompt. I'll get rid of that and replace it with the mechanism from #288 once that is implemented. |
I don't yet have a pattern for what exception should be raised by a |
I'm going to raise |
… refs #284 Signed-off-by: Simon Willison <[email protected]>
Released today - no OpenAI blog post yet: https://news.ycombinator.com/item?id=37558911
Got a working version in. Still needed:
max_token
setting? Yes, to 256.logprobs
supportSuffix support(did not implement this, it's not supported bygpt-3.5-turbo-instruct
)The text was updated successfully, but these errors were encountered: