Support for OpenAI completion models, in particular gpt-3.5-turbo-instruct #284

simonw · 2023-09-19T01:21:57Z

Released today - no OpenAI blog post yet: https://news.ycombinator.com/item?id=37558911

Got a working version in. Still needed:

Am I going to increase the default max_token setting? Yes, to 256.
logprobs support
A mechanism for registering more completion models + docs
~~Suffix support~~ (did not implement this, it's not supported by gpt-3.5-turbo-instruct)
Decide what to do about system prompts. This model doesn't have them as a concept - should I throw an error on a system prompt or silently ignore or stick it on the beginning of the message anyway?
Documentation should include completion model example

The text was updated successfully, but these errors were encountered:

simonw · 2023-09-19T01:22:31Z

Using this model involves openai.Completion.create as opposed to openai.ChatCompletion.create`.

API documentation: https://platform.openai.com/docs/api-reference/completions/create

Example from https://platform.openai.com/docs/guides/gpt/completions-api

response = openai.Completion.create(
  model="gpt-3.5-turbo-instruct",
  prompt="Write a tagline for an ice cream shop."
)

The default for max_tokens is just 16. Maybe I should increase that for LLM? It's really tight.

simonw · 2023-09-19T01:25:14Z

Got this prototype working:

diff --git a/llm/default_plugins/openai_models.py b/llm/default_plugins/openai_models.py
index fd8c689..8b7854e 100644
--- a/llm/default_plugins/openai_models.py
+++ b/llm/default_plugins/openai_models.py
@@ -22,6 +22,10 @@ def register_models(register):
     register(Chat("gpt-3.5-turbo-16k"), aliases=("chatgpt-16k", "3.5-16k"))
     register(Chat("gpt-4"), aliases=("4", "gpt4"))
     register(Chat("gpt-4-32k"), aliases=("4-32k",))
+    register(
+        Completion("gpt-3.5-turbo-instruct"),
+        aliases=("3.5-instruct", "chatgpt-instruct"),
+    )
     # Load extra models
     extra_path = llm.user_dir() / "extra-openai-models.yaml"
     if not extra_path.exists():
@@ -249,6 +253,32 @@ class Chat(Model):
             messages.append({"role": "system", "content": prompt.system})
         messages.append({"role": "user", "content": prompt.prompt})
         response._prompt_json = {"messages": messages}
+        kwargs = self.build_kwargs(prompt)
+        if stream:
+            completion = openai.ChatCompletion.create(
+                model=self.model_name or self.model_id,
+                messages=messages,
+                stream=True,
+                **kwargs,
+            )
+            chunks = []
+            for chunk in completion:
+                chunks.append(chunk)
+                content = chunk["choices"][0].get("delta", {}).get("content")
+                if content is not None:
+                    yield content
+            response.response_json = combine_chunks(chunks)
+        else:
+            completion = openai.ChatCompletion.create(
+                model=self.model_name or self.model_id,
+                messages=messages,
+                stream=False,
+                **kwargs,
+            )
+            response.response_json = completion.to_dict_recursive()
+            yield completion.choices[0].message.content
+
+    def build_kwargs(self, prompt):
         kwargs = dict(not_nulls(prompt.options))
         if self.api_base:
             kwargs["api_base"] = self.api_base
@@ -267,29 +297,45 @@ class Chat(Model):
             kwargs["api_key"] = "DUMMY_KEY"
         if self.headers:
             kwargs["headers"] = self.headers
+        return kwargs
+
+
+class Completion(Chat):
+    def __str__(self):
+        return "OpenAI Completion: {}".format(self.model_id)
+
+    def execute(self, prompt, stream, response, conversation=None):
+        messages = []
+        if conversation is not None:
+            for prev_response in conversation.responses:
+                messages.append(prev_response.prompt.prompt)
+                messages.append(prev_response.text())
+        messages.append(prompt.prompt)
+        response._prompt_json = {"messages": messages}
+        kwargs = self.build_kwargs(prompt)
         if stream:
-            completion = openai.ChatCompletion.create(
+            completion = openai.Completion.create(
                 model=self.model_name or self.model_id,
-                messages=messages,
+                prompt="\n".join(messages),
                 stream=True,
                 **kwargs,
             )
             chunks = []
             for chunk in completion:
                 chunks.append(chunk)
-                content = chunk["choices"][0].get("delta", {}).get("content")
+                content = chunk["choices"][0].get("text") or ""
                 if content is not None:
                     yield content
             response.response_json = combine_chunks(chunks)
         else:
-            completion = openai.ChatCompletion.create(
+            completion = openai.Completion.create(
                 model=self.model_name or self.model_id,
-                messages=messages,
+                prompt="\n".join(messages),
                 stream=False,
                 **kwargs,
             )
             response.response_json = completion.to_dict_recursive()
-            yield completion.choices[0].message.content
+            yield completion.choices[0]["text"]
 
 
 def not_nulls(data) -> dict:
@@ -303,6 +349,9 @@ def combine_chunks(chunks: List[dict]) -> dict:
 
     for item in chunks:
         for choice in item["choices"]:
+            if "text" in choice and "delta" not in choice:
+                content += choice["text"]
+                continue
             if "role" in choice["delta"]:
                 role = choice["delta"]["role"]
             if "content" in choice["delta"]:

Example usage:

llm -m chatgpt-instruct 'A poem about otters:'


Graceful and sleek, through the water they glide
With playful spirits and

llm -m chatgpt-instruct 'A poem about otters:' -o max_tokens 128


Graceful creatures of the sea
With fur so soft and eyes so keen
Otters, playful, wild and free
Inhabitants of a watery scene

With sleek bodies and flippers strong
They glide through water with ease
Their movements seem to flow along
As if dancing to the ocean’s breeze

Their playful nature knows no bounds
As they frolic and play all day
In search of fish, they can be found
In a game of hide and seek they play

Their laughter echoes through the waves
A joyful sound for all to hear
In their underwater caves
They have nothing to fear

simonw · 2023-09-19T02:51:48Z

One option for system prompts: chuck it in **bold** at the start of the message.

simonw · 2023-09-19T03:01:37Z

That default token size of 16 is really small.

https://platform.openai.com/playground?mode=complete defaults to 256 so I'm going to use that default instead.

simonw · 2023-09-19T03:05:01Z

This is annoying: with a default max_tokens of 256 it seems to just cut off half way through the output:

llm -m instruct 'poem about an otter'


Graceful and sleek, the otter glides
Through the water, a creature of tides
With fur so soft and eyes so bright
It's hard not to be captivated by its sight

In streams and rivers, it loves to play
Diving, twisting, in a carefree way
It's an expert swimmer, with webbed feet
And a long tail, it uses to steer and greet

Its diet consists of fish and shellfish too
Crunching and munching, with a satisfied chew
But it's not just about survival, for this creature
It enjoys each meal, like it's a special feature

With a playful nature and curious mind
The otter explores, never one to find
A boring moment, in its watery home
It loves to adventure and freely roam

Cuddling with its family, on a cozy bed
The otter rests its tired head
A protector, provider, a loving mate
In its own way, it shows its love and fate

A symbol of joy and adaptability
The otter teaches us, with its own ability
To find happiness, in the simple things
And the importance of family and the joy it brings

Oh otter, with your

I checked and the output there is exactly 256. I was hoping it would be somehow aware of that limit.

simonw · 2023-09-19T03:07:19Z

I'm going to add a logprobs option: https://platform.openai.com/docs/api-reference/completions/create#completions/create-logprobs

integer or null, Optional, Defaults to null

Include the log probabilities on the logprobs most likely tokens, as well the chosen tokens. For example, if logprobs is 5, the API will return a list of the 5 most likely tokens. The API will always return the logprob of the sampled token, so there may be up to logprobs+1 elements in the response.

The maximum value for logprobs is 5.

simonw · 2023-09-19T03:16:21Z

Tried this:

llm -m instruct 'poem about an otter' -o max_tokens 10 -o logprobs 3

Looks like i need to take extra steps to get it to show up in the logged database response.

simonw · 2023-09-19T03:17:56Z

I tried storing the full recursive dictionary version of the response if logprobs are present, ended up with this:

sqlite-utils "$(llm logs path)" 'select * from responses order by id desc limit 1' | jq '.[0].response_json' -r | jq

{
  "content": "\n\nIn the river, sleek and sly,\n",
  "role": null,
  "finish_reason": null,
  "chunks": [
    {
      "id": "cmpl-80LVJQkySEtmJWWug1AH0t7IN8vOO",
      "object": "text_completion",
      "created": 1695093353,
      "choices": [
        {
          "text": "\n\n",
          "index": 0,
          "logprobs": {
            "tokens": [
              "\n\n"
            ],
            "token_logprobs": [
              -0.19434144
            ],
            "top_logprobs": [
              {
                "\n\n": -0.19434144,
                "\n": -2.2880914,
                " \n\n": -5.4443407
              }
            ],
            "text_offset": [
              19
            ]
          },
          "finish_reason": null
        }
      ],
      "model": "gpt-3.5-turbo-instruct"
    },
    {
      "id": "cmpl-80LVJQkySEtmJWWug1AH0t7IN8vOO",
      "object": "text_completion",
      "created": 1695093353,
      "choices": [
        {
          "text": "In",
          "index": 0,
          "logprobs": {
            "tokens": [
              "In"
            ],
            "token_logprobs": [
              -1.7583014
            ],
            "top_logprobs": [
              {
                "In": -1.7583014,
                "S": -1.3833013,
                "Grace": -1.7426763
              }
            ],
            "text_offset": [
              21
            ]
          },
          "finish_reason": null
        }
      ],
      "model": "gpt-3.5-turbo-instruct"
    },
    {
      "id": "cmpl-80LVJQkySEtmJWWug1AH0t7IN8vOO",
      "object": "text_completion",
      "created": 1695093353,
      "choices": [
        {
          "text": " the",
          "index": 0,
          "logprobs": {
            "tokens": [
              " the"
            ],
            "token_logprobs": [
              -0.13828558
            ],
            "top_logprobs": [
              {
                " the": -0.13828558,
                " a": -2.6539104,
                " rivers": -3.9507854
              }
            ],
            "text_offset": [
              23
            ]
          },
          "finish_reason": null
        }
      ],
      "model": "gpt-3.5-turbo-instruct"
    },
    {
      "id": "cmpl-80LVJQkySEtmJWWug1AH0t7IN8vOO",
      "object": "text_completion",
      "created": 1695093353,
      "choices": [
        {
          "text": " river",
          "index": 0,
          "logprobs": {
            "tokens": [
              " river"
            ],
            "token_logprobs": [
              -0.78688633
            ],
            "top_logprobs": [
              {
                " river": -0.78688633,
                " water": -3.0681362,
                " sparkling": -3.099386
              }
            ],
            "text_offset": [
              27
            ]
          },
          "finish_reason": null
        }
      ],
      "model": "gpt-3.5-turbo-instruct"
    },
    {
      "id": "cmpl-80LVJQkySEtmJWWug1AH0t7IN8vOO",
      "object": "text_completion",
      "created": 1695093353,
      "choices": [
        {
          "text": ",",
          "index": 0,
          "logprobs": {
            "tokens": [
              ","
            ],
            "token_logprobs": [
              -0.94506526
            ],
            "top_logprobs": [
              {
                ",": -0.94506526,
                "'s": -1.0700654,
                " she": -2.6325655
              }
            ],
            "text_offset": [
              33
            ]
          },
          "finish_reason": null
        }
      ],
      "model": "gpt-3.5-turbo-instruct"
    },
    {
      "id": "cmpl-80LVJQkySEtmJWWug1AH0t7IN8vOO",
      "object": "text_completion",
      "created": 1695093353,
      "choices": [
        {
          "text": " sleek",
          "index": 0,
          "logprobs": {
            "tokens": [
              " sleek"
            ],
            "token_logprobs": [
              -1.239402
            ],
            "top_logprobs": [
              {
                " sleek": -1.239402,
                " swift": -0.723777,
                " graceful": -3.2550268
              }
            ],
            "text_offset": [
              34
            ]
          },
          "finish_reason": null
        }
      ],
      "model": "gpt-3.5-turbo-instruct"
    },
    {
      "id": "cmpl-80LVJQkySEtmJWWug1AH0t7IN8vOO",
      "object": "text_completion",
      "created": 1695093353,
      "choices": [
        {
          "text": " and",
          "index": 0,
          "logprobs": {
            "tokens": [
              " and"
            ],
            "token_logprobs": [
              -0.00032777296
            ],
            "top_logprobs": [
              {
                " and": -0.00032777296,
                " as": -8.484702,
                " ot": -10.187827
              }
            ],
            "text_offset": [
              40
            ]
          },
          "finish_reason": null
        }
      ],
      "model": "gpt-3.5-turbo-instruct"
    },
    {
      "id": "cmpl-80LVJQkySEtmJWWug1AH0t7IN8vOO",
      "object": "text_completion",
      "created": 1695093353,
      "choices": [
        {
          "text": " s",
          "index": 0,
          "logprobs": {
            "tokens": [
              " s"
            ],
            "token_logprobs": [
              -1.6860211
            ],
            "top_logprobs": [
              {
                " s": -1.6860211,
                " swift": -1.1078961,
                " quick": -1.7485211
              }
            ],
            "text_offset": [
              44
            ]
          },
          "finish_reason": null
        }
      ],
      "model": "gpt-3.5-turbo-instruct"
    },
    {
      "id": "cmpl-80LVJQkySEtmJWWug1AH0t7IN8vOO",
      "object": "text_completion",
      "created": 1695093353,
      "choices": [
        {
          "text": "ly",
          "index": 0,
          "logprobs": {
            "tokens": [
              "ly"
            ],
            "token_logprobs": [
              -0.014393331
            ],
            "top_logprobs": [
              {
                "ly": -0.014393331,
                "velte": -4.3425183,
                "li": -7.6550174
              }
            ],
            "text_offset": [
              46
            ]
          },
          "finish_reason": null
        }
      ],
      "model": "gpt-3.5-turbo-instruct"
    },
    {
      "id": "cmpl-80LVJQkySEtmJWWug1AH0t7IN8vOO",
      "object": "text_completion",
      "created": 1695093353,
      "choices": [
        {
          "text": ",\n",
          "index": 0,
          "logprobs": {
            "tokens": [
              ",\n"
            ],
            "token_logprobs": [
              -0.79212964
            ],
            "top_logprobs": [
              {
                ",\n": -0.79212964,
                "\n": -0.7452547,
                "  \n": -3.3858798
              }
            ],
            "text_offset": [
              48
            ]
          },
          "finish_reason": "length"
        }
      ],
      "model": "gpt-3.5-turbo-instruct"
    },
    {
      "id": "cmpl-80LVJQkySEtmJWWug1AH0t7IN8vOO",
      "object": "text_completion",
      "created": 1695093353,
      "choices": [
        {
          "text": "",
          "index": 0,
          "logprobs": {
            "tokens": [],
            "token_logprobs": [],
            "top_logprobs": [],
            "text_offset": []
          },
          "finish_reason": "length"
        }
      ],
      "model": "gpt-3.5-turbo-instruct"
    }
  ],
  "id": "cmpl-80LVJQkySEtmJWWug1AH0t7IN8vOO",
  "object": "text_completion",
  "model": "gpt-3.5-turbo-instruct",
  "created": 1695093353
}

That's pretty noisy though! Imagine that for a much longer response.

simonw · 2023-09-19T03:19:23Z

I'll try storing a logprobs list of {"text": "...", "top_logprobs": {...}} instead.

simonw · 2023-09-19T03:22:17Z

That's a bit neater:

llm -m instruct 'slogan for an an otter-run bakery' -o max_tokens 16 -o logprobs 3


"Fresh Treats That'll Make You Otterly Happy!"

sqlite-utils "$(llm logs path)" 'select * from responses order by id desc limit 1' \
  | jq '.[0].response_json' -r | jq

{
  "content": "\n\n\"Fresh Treats That'll Make You Otterly Happy!\"",
  "role": null,
  "finish_reason": null,
  "logprobs": [
    {
      "text": "\n",
      "top_logprobs": [
        {
          "\n": -1.9688251,
          "\n\n": -0.3125751,
          ":": -3.968825
        }
      ]
    },
    {
      "text": "\n",
      "top_logprobs": [
        {
          "\n": -0.33703142,
          "\"": -1.5089064,
          "\"S": -3.8995314
        }
      ]
    },
    {
      "text": "\"",
      "top_logprobs": [
        {
          "\"": -0.13510895,
          "\"S": -2.588234,
          "\"B": -4.119484
        }
      ]
    },
    {
      "text": "Fresh",
      "top_logprobs": [
        {
          "Fresh": -1.4623433,
          "Where": -1.1498433,
          "Ind": -2.5560932
        }
      ]
    },
    {
      "text": " Treat",
      "top_logprobs": [
        {
          "ly": -0.39274898,
          " treats": -1.861499,
          " baked": -3.4396236
        }
      ]
    },
    {
      "text": "s",
      "top_logprobs": [
        {
          "s": -6.5092986e-06,
          ",": -13.406255,
          "z": -13.82813
        }
      ]
    },
    {
      "text": " That",
      "top_logprobs": [
        {
          " from": -0.78117555,
          " Straight": -1.6874255,
          ",": -1.7968005
        }
      ]
    },
    {
      "text": "'ll",
      "top_logprobs": [
        {
          "'ll": -1.1900537,
          " Will": -1.4088038,
          " Make": -1.4556787
        }
      ]
    },
    {
      "text": " Make",
      "top_logprobs": [
        {
          " Make": -0.14272869,
          " Have": -2.8614783,
          " Leave": -3.5802286
        }
      ]
    },
    {
      "text": " You",
      "top_logprobs": [
        {
          " You": -0.26677564,
          " Your": -1.6730256,
          " a": -3.1574006
        }
      ]
    },
    {
      "text": " Ot",
      "top_logprobs": [
        {
          " Ot": -0.58028656,
          " Flip": -2.2677865,
          " Float": -2.2990365
        }
      ]
    },
    {
      "text": "ter",
      "top_logprobs": [
        {
          "ter": -0.0004432111,
          "term": -8.4223175,
          "terr": -9.3598175
        }
      ]
    },
    {
      "text": "ly",
      "top_logprobs": [
        {
          "ly": -0.040374786,
          "-": -3.7903748,
          "-L": -4.8216248
        }
      ]
    },
    {
      "text": " Happy",
      "top_logprobs": [
        {
          " Happy": -0.18345116,
          " S": -1.9959509,
          " Del": -3.8553257
        }
      ]
    },
    {
      "text": "!\"",
      "top_logprobs": [
        {
          "!\"": -0.035835575,
          "\"": -3.4264605,
          "!\"\n": -7.2077103
        }
      ]
    },
    {
      "text": "",
      "top_logprobs": []
    }
  ],
  "id": "cmpl-80LaT4oS6rQPFUg3BfbNn7xBsoIix",
  "object": "text_completion",
  "model": "gpt-3.5-turbo-instruct",
  "created": 1695093673
}

simonw · 2023-09-19T03:32:35Z

It's a bit annoying that the only way to see the log probs is to dig around in the SQLite database for them.

I can't think of a clean way to let people opt into seeing them on stderr or similar though.

One option would be to teach the llm logs Markdown output how to display them. That's nicer than messing around in SQLite directly.

It's a bit weird to have code in llm logs that's specific to the OpenAI models though. Maybe I should add a model plugin mechanism that allows models to influence the display of logs?

simonw · 2023-09-19T03:37:11Z

Since I opened a fresh issue for it I won't consider logprobs display any more here.

simonw · 2023-09-19T03:44:22Z

Suffix support is unique to completion models and looks interesting too: https://platform.openai.com/docs/api-reference/completions/create#completions/create-suffix

simonw · 2023-09-19T03:52:30Z

In trying to write a test for the new logprobs stuff I dropped this into the top of cli.py:

import requests
import json

def log_json(response, *args, **kwargs):
    try:
        data = response.json()
        print(json.dumps(data, indent=4))
    except ValueError:
        # No JSON data in the response
        pass
    return response


import openai
openai.requestssession = requests.Session()
openai.requestssession.hooks['response'].append(log_json)

This worked, but only with --no-stream (since otherwise the event source stuff broke the JSON parsing). I got back this:

llm -m instruct 'slogan for an an otter-run bakery' -o max_tokens 16 -o logprobs 3 --no-stream

{
    "id": "cmpl-80M2iXLmVOtxrxosK1crpgoxvpk2x",
    "object": "text_completion",
    "created": 1695095424,
    "model": "gpt-3.5-turbo-instruct",
    "choices": [
        {
            "text": "\n\n\"Fresh treats straight from the riverbank!\"",
            "index": 0,
            "logprobs": {
                "tokens": [
                    "\n\n",
                    "\"",
                    "Fresh",
                    " treats",
                    " straight",
                    " from",
                    " the",
                    " river",
                    "bank",
                    "!\""
                ],
                "token_logprobs": [
                    -0.3125751,
                    -0.18438435,
                    -1.3464811,
                    -1.6608365,
                    -2.354677,
                    -0.0038274676,
                    -0.013088844,
                    -0.15179907,
                    -1.417837,
                    -0.41310275
                ],
                "top_logprobs": [
                    {
                        "\n\n": -0.3125751,
                        "\n": -1.9688251,
                        ":": -3.968825
                    },
                    {
                        "\"": -0.18438435,
                        "\"S": -2.293759,
                        "\"B": -3.9812593
                    },
                    {
                        "Fresh": -1.3464811,
                        "Where": -1.1277312,
                        "Making": -2.549606
                    },
                    {
                        " treats": -1.6608365,
                        "ly": -0.48896152,
                        " baked": -3.2389612
                    },
                    {
                        " from": -1.0734268,
                        ",": -1.276552,
                        " that": -2.167177
                    },
                    {
                        " from": -0.0038274676,
                        " out": -5.738202,
                        " ot": -7.519452
                    },
                    {
                        " the": -0.013088844,
                        " our": -4.4662137,
                        " nature": -6.825588
                    },
                    {
                        " river": -0.15179907,
                        " ot": -2.308049,
                        " water": -3.917424
                    },
                    {
                        "bank": -1.417837,
                        "'s": -0.5584622,
                        " bank": -2.917837
                    },
                    {
                        "!\"": -0.41310275,
                        " to": -2.0537276,
                        ",": -2.2099779
                    }
                ],
                "text_offset": [
                    33,
                    35,
                    36,
                    41,
                    48,
                    57,
                    62,
                    66,
                    72,
                    76
                ]
            },
            "finish_reason": "stop"
        }
    ],
    "usage": {
        "prompt_tokens": 9,
        "completion_tokens": 10,
        "total_tokens": 19
    }
}

This is notably different from the format that I get for streaming responses.

It also highlights that the tests should really mock streaming responses too.

simonw · 2023-09-19T03:53:14Z

Changing that debug function to just print(response.text) gave me this:

data: {"id":"cmpl-80M56thRFHyy1fPDQ79XeMwVGsJ4v","object":"text_completion","created":1695095572,"choices":[{"text":"\n\n","index":0,"logprobs":{"tokens":["\n\n"],"token_logprobs":[-0.3125751],"top_logprobs":[{"\n\n":-0.3125751,"\n":-1.9688251,":":-3.968825}],"text_offset":[33]},"finish_reason":null}],"model":"gpt-3.5-turbo-instruct"}

data: {"id":"cmpl-80M56thRFHyy1fPDQ79XeMwVGsJ4v","object":"text_completion","created":1695095572,"choices":[{"text":"\"B","index":0,"logprobs":{"tokens":["\"B"],"token_logprobs":[-3.9812593],"top_logprobs":[{"\"B":-3.9812593,"\"":-0.18438435,"\"S":-2.293759}],"text_offset":[35]},"finish_reason":null}],"model":"gpt-3.5-turbo-instruct"}

data: {"id":"cmpl-80M56thRFHyy1fPDQ79XeMwVGsJ4v","object":"text_completion","created":1695095572,"choices":[{"text":"aking","index":0,"logprobs":{"tokens":["aking"],"token_logprobs":[-0.64748794],"top_logprobs":[{"aking":-0.64748794,"ite":-1.444363,"aked":-1.7568629}],"text_offset":[37]},"finish_reason":null}],"model":"gpt-3.5-turbo-instruct"}

data: {"id":"cmpl-80M56thRFHyy1fPDQ79XeMwVGsJ4v","object":"text_completion","created":1695095572,"choices":[{"text":" up","index":0,"logprobs":{"tokens":[" up"],"token_logprobs":[-0.21909842],"top_logprobs":[{" up":-0.21909842," with":-2.6878486," happiness":-2.8597233}],"text_offset":[42]},"finish_reason":null}],"model":"gpt-3.5-turbo-instruct"}

data: {"id":"cmpl-80M56thRFHyy1fPDQ79XeMwVGsJ4v","object":"text_completion","created":1695095572,"choices":[{"text":" pure","index":0,"logprobs":{"tokens":[" pure"],"token_logprobs":[-5.3262787],"top_logprobs":[{" a":-0.8575289," smiles":-1.9825288," happiness":-2.4200287}],"text_offset":[45]},"finish_reason":null}],"model":"gpt-3.5-turbo-instruct"}

data: {"id":"cmpl-80M56thRFHyy1fPDQ79XeMwVGsJ4v","object":"text_completion","created":1695095572,"choices":[{"text":" ot","index":0,"logprobs":{"tokens":[" ot"],"token_logprobs":[-0.46827942],"top_logprobs":[{" ot":-0.46827942," joy":-1.3276544," delight":-2.8901541}],"text_offset":[50]},"finish_reason":null}],"model":"gpt-3.5-turbo-instruct"}

data: {"id":"cmpl-80M56thRFHyy1fPDQ79XeMwVGsJ4v","object":"text_completion","created":1695095572,"choices":[{"text":"ter","index":0,"logprobs":{"tokens":["ter"],"token_logprobs":[-0.0015959481],"top_logprobs":[{"ter":-0.0015959481,"terr":-7.126596,"term":-7.845346}],"text_offset":[53]},"finish_reason":null}],"model":"gpt-3.5-turbo-instruct"}

data: {"id":"cmpl-80M56thRFHyy1fPDQ79XeMwVGsJ4v","object":"text_completion","created":1695095572,"choices":[{"text":" goodness","index":0,"logprobs":{"tokens":[" goodness"],"token_logprobs":[-2.569492],"top_logprobs":[{"-":-1.2726171,"ly":-1.5382422,"lic":-1.8663673}],"text_offset":[56]},"finish_reason":null}],"model":"gpt-3.5-turbo-instruct"}

data: {"id":"cmpl-80M56thRFHyy1fPDQ79XeMwVGsJ4v","object":"text_completion","created":1695095572,"choices":[{"text":",","index":0,"logprobs":{"tokens":[","],"token_logprobs":[-0.84223104],"top_logprobs":[{",":-0.84223104,"!\"":-1.154731," in":-2.232856}],"text_offset":[65]},"finish_reason":null}],"model":"gpt-3.5-turbo-instruct"}

data: {"id":"cmpl-80M56thRFHyy1fPDQ79XeMwVGsJ4v","object":"text_completion","created":1695095572,"choices":[{"text":" one","index":0,"logprobs":{"tokens":[" one"],"token_logprobs":[-0.010558619],"top_logprobs":[{" one":-0.010558619," every":-5.432433," bite":-6.3074327}],"text_offset":[66]},"finish_reason":null}],"model":"gpt-3.5-turbo-instruct"}

data: {"id":"cmpl-80M56thRFHyy1fPDQ79XeMwVGsJ4v","object":"text_completion","created":1695095572,"choices":[{"text":" treat","index":0,"logprobs":{"tokens":[" treat"],"token_logprobs":[-0.31707865],"top_logprobs":[{" treat":-0.31707865," bite":-2.3639536," delicious":-2.6139536}],"text_offset":[70]},"finish_reason":null}],"model":"gpt-3.5-turbo-instruct"}

data: {"id":"cmpl-80M56thRFHyy1fPDQ79XeMwVGsJ4v","object":"text_completion","created":1695095572,"choices":[{"text":" at","index":0,"logprobs":{"tokens":[" at"],"token_logprobs":[-0.00004501652],"top_logprobs":[{" at":-0.00004501652," a":-10.82817," ot":-11.421921}],"text_offset":[76]},"finish_reason":null}],"model":"gpt-3.5-turbo-instruct"}

data: {"id":"cmpl-80M56thRFHyy1fPDQ79XeMwVGsJ4v","object":"text_completion","created":1695095572,"choices":[{"text":" a","index":0,"logprobs":{"tokens":[" a"],"token_logprobs":[-0.00006635395],"top_logprobs":[{" a":-0.00006635395," at":-10.390691," ":-10.609441}],"text_offset":[79]},"finish_reason":null}],"model":"gpt-3.5-turbo-instruct"}

data: {"id":"cmpl-80M56thRFHyy1fPDQ79XeMwVGsJ4v","object":"text_completion","created":1695095572,"choices":[{"text":" time","index":0,"logprobs":{"tokens":[" time"],"token_logprobs":[-0.00006468596],"top_logprobs":[{" time":-0.00006468596," t":-10.562564," tim":-11.51569}],"text_offset":[81]},"finish_reason":null}],"model":"gpt-3.5-turbo-instruct"}

data: {"id":"cmpl-80M56thRFHyy1fPDQ79XeMwVGsJ4v","object":"text_completion","created":1695095572,"choices":[{"text":"!\"","index":0,"logprobs":{"tokens":["!\""],"token_logprobs":[-0.07164093],"top_logprobs":[{"!\"":-0.07164093,".\"":-2.8060157,"\"":-5.399766}],"text_offset":[86]},"finish_reason":null}],"model":"gpt-3.5-turbo-instruct"}

data: {"id":"cmpl-80M56thRFHyy1fPDQ79XeMwVGsJ4v","object":"text_completion","created":1695095572,"choices":[{"text":"","index":0,"logprobs":{"tokens":[],"token_logprobs":[],"top_logprobs":[],"text_offset":[]},"finish_reason":"stop"}],"model":"gpt-3.5-turbo-instruct"}

data: [DONE]

simonw · 2023-09-19T04:29:25Z

I'm going to need tests for the logprobs stuff.

LLM_OPENAI_SHOW_RESPONSES=1 llm -m 3.5-instruct 'say hi, one word' -o logprobs 2

Outputs:

data: {"id":"cmpl-80MdSaou7NnPuff5ZyRMysWBmgSPS","object":"text_completion","created":1695097702,"choices":[{"text":"\n\n","index":0,"logprobs":{"tokens":["\n\n"],"token_logprobs":[-0.61127675],"top_logprobs":[{"\n\n":-0.61127675,"\n":-1.9706517}],"text_offset":[16]},"finish_reason":null}],"model":"gpt-3.5-turbo-instruct"}

data: {"id":"cmpl-80MdSaou7NnPuff5ZyRMysWBmgSPS","object":"text_completion","created":1695097702,"choices":[{"text":"Hi","index":0,"logprobs":{"tokens":["Hi"],"token_logprobs":[-1.0273004],"top_logprobs":[{"Hi":-1.0273004,"Hello":-0.73042536}],"text_offset":[18]},"finish_reason":null}],"model":"gpt-3.5-turbo-instruct"}

data: {"id":"cmpl-80MdSaou7NnPuff5ZyRMysWBmgSPS","object":"text_completion","created":1695097702,"choices":[{"text":".","index":0,"logprobs":{"tokens":["."],"token_logprobs":[-1.1168935],"top_logprobs":[{".":-1.1168935,"!":-0.9450184}],"text_offset":[20]},"finish_reason":null}],"model":"gpt-3.5-turbo-instruct"}

data: {"id":"cmpl-80MdSaou7NnPuff5ZyRMysWBmgSPS","object":"text_completion","created":1695097702,"choices":[{"text":"","index":0,"logprobs":{"tokens":[],"token_logprobs":[],"top_logprobs":[],"text_offset":[]},"finish_reason":"stop"}],"model":"gpt-3.5-turbo-instruct"}

data: [DONE]



Hi.

And with --no-stream:

LLM_OPENAI_SHOW_RESPONSES=1 llm -m 3.5-instruct 'say hi, one word' -o logprobs 2 --no-stream

{
  "id": "cmpl-80MeBfKJutM0uMNJkRrebJLeP3bxL",
  "object": "text_completion",
  "created": 1695097747,
  "model": "gpt-3.5-turbo-instruct",
  "choices": [
    {
      "text": "\n\nHi!",
      "index": 0,
      "logprobs": {
        "tokens": [
          "\n\n",
          "Hi",
          "!"
        ],
        "token_logprobs": [
          -0.61127675,
          -1.0273004,
          -0.9450184
        ],
        "top_logprobs": [
          {
            "\n\n": -0.61127675,
            "\n": -1.9706517
          },
          {
            "Hi": -1.0273004,
            "Hello": -0.73042536
          },
          {
            "!": -0.9450184,
            ".": -1.1168935
          }
        ],
        "text_offset": [
          16,
          18,
          20
        ]
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 5,
    "completion_tokens": 3,
    "total_tokens": 8
  }
}

simonw · 2023-09-19T04:49:12Z

My mocking trick isn't working here (despite working in #287).

    @pytest.mark.parametrize("streaming", (True, False))
    def test_openai_completion_logprobs(mocked_openai_completion, user_path, streaming):
        log_path = user_path / "logs.db"
        log_db = sqlite_utils.Database(str(log_path))
        log_db["responses"].delete_where()
        runner = CliRunner()
        args = ["-m", "gpt-3.5-turbo-instruct", "Say hi", "-o", "logprobs", "2", "--key", "x"]
        if not streaming:
            args.append("--no-stream")
>       result = runner.invoke(cli, args, catch_exceptions=False)


        if stream:
            # must be an iterator
>           assert not isinstance(response, OpenAIResponse)
E           AssertionError

../../../.local/share/virtualenvs/llm-p4p8CDpq/lib/python3.10/site-packages/openai/api_resources/abstract/engine_api_resource.py:165: AssertionError
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> entering PDB >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> PDB post_mortem (IO-capturing turned off) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> /Users/simon/.local/share/virtualenvs/llm-p4p8CDpq/lib/python3.10/site-packages/openai/api_resources/abstract/engine_api_resource.py(165)create()
-> assert not isinstance(response, OpenAIResponse)
(Pdb) response
<openai.openai_response.OpenAIResponse object at 0x29f21fd60>
(Pdb) list
160  	            request_timeout=request_timeout,
161  	        )
162  	
163  	        if stream:
164  	            # must be an iterator
165  ->	            assert not isinstance(response, OpenAIResponse)
166  	            return (
167  	                util.convert_to_openai_object(
168  	                    line,
169  	                    api_key,
170  	                    api_version,

https://github.com/openai/openai-python/blob/main/openai/tests/test_api_requestor.py is how OpenAI do it, but it's not very useful.

simonw · 2023-09-19T04:55:10Z

My mistake, that trick DOES work, I was using the wrong fixture.

simonw · 2023-09-19T05:02:21Z

Since I'm already doing this:

            completion = openai.Completion.create(
                model=self.model_name or self.model_id,
                prompt="\n".join(messages),
                stream=False,
                **kwargs,
            )
            response.response_json = completion.to_dict_recursive()
            yield completion.choices[0]["text"]

Which dumps the entire completion.to_dict_recursive() to the DB log for non-streaming responses, I don't need to do anything special to log logprobs for the non-streaming case.

They'll end up in the DB in a slightly different format. I'm OK with that:

{
      "choices": [
          {
              "finish_reason": "stop",
              "index": 0,
              "logprobs": {
                  "text_offset": [16, 18, 20],
                  "token_logprobs": [-0.6, -1.1, -0.9],
                  "tokens": ["\n\n", "Hi", "1"],
                  "top_logprobs": [
                      {"\n": -1.9, "\n\n": -0.6},
                      {"Hello": -0.7, "Hi": -1.1},
                      {"!": -1.1, ".": -0.9},
                  ],
              },
              "text": "\n\nHi.",
          }
      ],
      "created": 1695097747,
      "id": "cmpl-80MeBfKJutM0uMNJkRrebJLeP3bxL",
      "model": "gpt-3.5-turbo-instruct",
      "object": "text_completion",
      "usage": {"completion_tokens": 3, "prompt_tokens": 5, "total_tokens": 8},
  }

simonw · 2023-09-19T05:25:15Z

Despite being listed in the documentation, the suffix option does not appear to work - at least not for gpt-3.5-turbo-instruct:

curl https://api.openai.com/v1/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -d '{
    "model": "gpt-3.5-turbo-instruct",
    "prompt": "Say this is a test",
    "max_tokens": 7,
    "suffix": "dog",
    "temperature": 0
  }'

{
  "error": {
    "message": "Unrecognized request argument supplied: suffix",
    "type": "invalid_request_error",
    "param": null,
    "code": null
  }
}

simonw · 2023-09-19T05:27:51Z

I managed to submit suffix to an old text-davinci-003 completion model but it didn't seem to have the expected effect.

Dropping suffix support entirely.

simonw · 2023-09-19T05:29:01Z

I'm going to do the **bold** thing for system prompts, and hope it doesn't turn out to be a bad idea later.

sgondala · 2023-09-19T05:36:20Z

Wouldn't throwing an error be a better alternative? These models aren't designed to use system prompts. It's better to communicate that to end users so that they can adjust their API calls accordingly.

simonw · 2023-09-19T05:39:44Z

Wouldn't throwing an error be a better alternative? These models aren't designed to use system prompts. It's better to communicate that to end users so that they can adjust their API calls accordingly.

I'm really torn on this.

The reason I'm leaning towards keeping system prompts working is that a really useful application of LLM is to compare the results you get from different models.

It would be frustrating if you tried to compare the results of a prompt with a system prompt and got an error back because one of the dozen models you chose to use didn't support system prompts.

Plus, really when you look at what system prompts actually do in other models, they're basically just injected into the regular prompt with extra markup around them. For Llama 2 that looks like this, for example:

<s>[INST] <<SYS>>
{{ system_prompt }}
<</SYS>>

{{ user_message }} [/INST]

So wrapping the system prompt in bold is actually a pretty honest imitation of how they work everywhere else!

simonw · 2023-09-19T05:42:07Z

Hmmm... thinking about it, we actually have a bit of a precedent problem here.

llm-claude silently ignores system prompts, which caught me out already - I fed it a system prompt and it didn't work, but I didn't realize because I didn't get an error.

On that basis, an error would actually be a better solution.

simonw · 2023-09-19T05:44:05Z

Based on that decision in:

Mechanism for marking models as supporting system prompts #288

I'm going to have this plugin raise an error if you try to send it a system prompt. I'll get rid of that and replace it with the mechanism from #288 once that is implemented.

This reverts commit 4eed871. Decesion made in #288

simonw · 2023-09-19T05:46:40Z

I don't yet have a pattern for what exception should be raised by a Model.execute() method if something goes wrong.

simonw · 2023-09-19T05:48:38Z

I'm going to raise NotImplmentedError for this, because the system prompt support is not implemented.

… refs #284 Signed-off-by: Simon Willison <[email protected]>

Refs #273, #274, #275, #280, #284, #286

simonw added the enhancement New feature or request label Sep 19, 2023

simonw added a commit that referenced this issue Sep 19, 2023

OpenAI completion models including gpt-3.5-turbo-instruct, refs #284

4d46eba

simonw added a commit that referenced this issue Sep 19, 2023

Bump default gpt-3.5-turbo-instruct max tokens to 256, refs #284

4d18da4

simonw mentioned this issue Sep 19, 2023

Mechanism for models to influence Markdown display of their logs #285

Open

This was referenced Sep 19, 2023

Debug mechanism for showing OpenAI responses #286

Closed

Tests for OpenAI streaming responses #287

Closed

simonw added a commit that referenced this issue Sep 19, 2023

logprobs support for OpenAI completion models, refs #284

4fea461

simonw added this to the 0.11 milestone Sep 19, 2023

simonw added a commit that referenced this issue Sep 19, 2023

completion: true to register completion models, refs #284

fcff36c

simonw added a commit that referenced this issue Sep 19, 2023

Handle system prompts for completion models, refs #284

4eed871

simonw mentioned this issue Sep 19, 2023

Mechanism for marking models as supporting system prompts #288

Open

simonw added a commit that referenced this issue Sep 19, 2023

Revert "Handle system prompts for completion models, refs #284"

f76b212

This reverts commit 4eed871. Decesion made in #288

simonw added a commit that referenced this issue Sep 19, 2023

NotImplementedError for system prompts with OpenAI completion models,…

b4ec54e

… refs #284 Signed-off-by: Simon Willison <[email protected]>

simonw closed this as completed in bb99186 Sep 19, 2023

simonw mentioned this issue Sep 19, 2023

Docs describe "suffix" for completion models, doesn't work with gpt-3.5-turbo-instruct openai/openai-python#619

Closed

simonw added a commit that referenced this issue Sep 19, 2023

Release 0.11

bf22994

Refs #273, #274, #275, #280, #284, #286

This was referenced Sep 20, 2023

Support for logprobs simonw/llm-llama-cpp#17

Open

Enable GPT functions #281

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for OpenAI completion models, in particular gpt-3.5-turbo-instruct #284

Support for OpenAI completion models, in particular gpt-3.5-turbo-instruct #284

simonw commented Sep 19, 2023 •

edited

Loading

simonw commented Sep 19, 2023 •

edited

Loading

simonw commented Sep 19, 2023

simonw commented Sep 19, 2023

simonw commented Sep 19, 2023

simonw commented Sep 19, 2023

simonw commented Sep 19, 2023

simonw commented Sep 19, 2023

simonw commented Sep 19, 2023

simonw commented Sep 19, 2023

simonw commented Sep 19, 2023

simonw commented Sep 19, 2023 •

edited

Loading

simonw commented Sep 19, 2023

simonw commented Sep 19, 2023

simonw commented Sep 19, 2023

simonw commented Sep 19, 2023

simonw commented Sep 19, 2023

simonw commented Sep 19, 2023 •

edited

Loading

simonw commented Sep 19, 2023

simonw commented Sep 19, 2023

simonw commented Sep 19, 2023

simonw commented Sep 19, 2023

simonw commented Sep 19, 2023

sgondala commented Sep 19, 2023

simonw commented Sep 19, 2023

simonw commented Sep 19, 2023

simonw commented Sep 19, 2023

simonw commented Sep 19, 2023

simonw commented Sep 19, 2023

Support for OpenAI completion models, in particular gpt-3.5-turbo-instruct #284

Support for OpenAI completion models, in particular gpt-3.5-turbo-instruct #284

Comments

simonw commented Sep 19, 2023 • edited Loading

simonw commented Sep 19, 2023 • edited Loading

simonw commented Sep 19, 2023

simonw commented Sep 19, 2023

simonw commented Sep 19, 2023

simonw commented Sep 19, 2023

simonw commented Sep 19, 2023

simonw commented Sep 19, 2023

simonw commented Sep 19, 2023

simonw commented Sep 19, 2023

simonw commented Sep 19, 2023

simonw commented Sep 19, 2023 • edited Loading

simonw commented Sep 19, 2023

simonw commented Sep 19, 2023

simonw commented Sep 19, 2023

simonw commented Sep 19, 2023

simonw commented Sep 19, 2023

simonw commented Sep 19, 2023 • edited Loading

simonw commented Sep 19, 2023

simonw commented Sep 19, 2023

simonw commented Sep 19, 2023

simonw commented Sep 19, 2023

simonw commented Sep 19, 2023

sgondala commented Sep 19, 2023

simonw commented Sep 19, 2023

simonw commented Sep 19, 2023

simonw commented Sep 19, 2023

simonw commented Sep 19, 2023

simonw commented Sep 19, 2023

simonw commented Sep 19, 2023 •

edited

Loading

simonw commented Sep 19, 2023 •

edited

Loading

simonw commented Sep 19, 2023 •

edited

Loading

simonw commented Sep 19, 2023 •

edited

Loading