Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for OpenAI completion models, in particular gpt-3.5-turbo-instruct #284

Closed
6 tasks done
simonw opened this issue Sep 19, 2023 · 28 comments
Closed
6 tasks done
Labels
enhancement New feature or request
Milestone

Comments

@simonw
Copy link
Owner

simonw commented Sep 19, 2023

Released today - no OpenAI blog post yet: https://news.ycombinator.com/item?id=37558911

Got a working version in. Still needed:

  • Am I going to increase the default max_token setting? Yes, to 256.
  • logprobs support
  • A mechanism for registering more completion models + docs
  • Suffix support (did not implement this, it's not supported by gpt-3.5-turbo-instruct)
  • Decide what to do about system prompts. This model doesn't have them as a concept - should I throw an error on a system prompt or silently ignore or stick it on the beginning of the message anyway?
  • Documentation should include completion model example
@simonw simonw added the enhancement New feature or request label Sep 19, 2023
@simonw
Copy link
Owner Author

simonw commented Sep 19, 2023

Using this model involves openai.Completion.create as opposed to openai.ChatCompletion.create`.

API documentation: https://platform.openai.com/docs/api-reference/completions/create

Example from https://platform.openai.com/docs/guides/gpt/completions-api

response = openai.Completion.create(
  model="gpt-3.5-turbo-instruct",
  prompt="Write a tagline for an ice cream shop."
)

The default for max_tokens is just 16. Maybe I should increase that for LLM? It's really tight.

@simonw
Copy link
Owner Author

simonw commented Sep 19, 2023

Got this prototype working:

diff --git a/llm/default_plugins/openai_models.py b/llm/default_plugins/openai_models.py
index fd8c689..8b7854e 100644
--- a/llm/default_plugins/openai_models.py
+++ b/llm/default_plugins/openai_models.py
@@ -22,6 +22,10 @@ def register_models(register):
     register(Chat("gpt-3.5-turbo-16k"), aliases=("chatgpt-16k", "3.5-16k"))
     register(Chat("gpt-4"), aliases=("4", "gpt4"))
     register(Chat("gpt-4-32k"), aliases=("4-32k",))
+    register(
+        Completion("gpt-3.5-turbo-instruct"),
+        aliases=("3.5-instruct", "chatgpt-instruct"),
+    )
     # Load extra models
     extra_path = llm.user_dir() / "extra-openai-models.yaml"
     if not extra_path.exists():
@@ -249,6 +253,32 @@ class Chat(Model):
             messages.append({"role": "system", "content": prompt.system})
         messages.append({"role": "user", "content": prompt.prompt})
         response._prompt_json = {"messages": messages}
+        kwargs = self.build_kwargs(prompt)
+        if stream:
+            completion = openai.ChatCompletion.create(
+                model=self.model_name or self.model_id,
+                messages=messages,
+                stream=True,
+                **kwargs,
+            )
+            chunks = []
+            for chunk in completion:
+                chunks.append(chunk)
+                content = chunk["choices"][0].get("delta", {}).get("content")
+                if content is not None:
+                    yield content
+            response.response_json = combine_chunks(chunks)
+        else:
+            completion = openai.ChatCompletion.create(
+                model=self.model_name or self.model_id,
+                messages=messages,
+                stream=False,
+                **kwargs,
+            )
+            response.response_json = completion.to_dict_recursive()
+            yield completion.choices[0].message.content
+
+    def build_kwargs(self, prompt):
         kwargs = dict(not_nulls(prompt.options))
         if self.api_base:
             kwargs["api_base"] = self.api_base
@@ -267,29 +297,45 @@ class Chat(Model):
             kwargs["api_key"] = "DUMMY_KEY"
         if self.headers:
             kwargs["headers"] = self.headers
+        return kwargs
+
+
+class Completion(Chat):
+    def __str__(self):
+        return "OpenAI Completion: {}".format(self.model_id)
+
+    def execute(self, prompt, stream, response, conversation=None):
+        messages = []
+        if conversation is not None:
+            for prev_response in conversation.responses:
+                messages.append(prev_response.prompt.prompt)
+                messages.append(prev_response.text())
+        messages.append(prompt.prompt)
+        response._prompt_json = {"messages": messages}
+        kwargs = self.build_kwargs(prompt)
         if stream:
-            completion = openai.ChatCompletion.create(
+            completion = openai.Completion.create(
                 model=self.model_name or self.model_id,
-                messages=messages,
+                prompt="\n".join(messages),
                 stream=True,
                 **kwargs,
             )
             chunks = []
             for chunk in completion:
                 chunks.append(chunk)
-                content = chunk["choices"][0].get("delta", {}).get("content")
+                content = chunk["choices"][0].get("text") or ""
                 if content is not None:
                     yield content
             response.response_json = combine_chunks(chunks)
         else:
-            completion = openai.ChatCompletion.create(
+            completion = openai.Completion.create(
                 model=self.model_name or self.model_id,
-                messages=messages,
+                prompt="\n".join(messages),
                 stream=False,
                 **kwargs,
             )
             response.response_json = completion.to_dict_recursive()
-            yield completion.choices[0].message.content
+            yield completion.choices[0]["text"]
 
 
 def not_nulls(data) -> dict:
@@ -303,6 +349,9 @@ def combine_chunks(chunks: List[dict]) -> dict:
 
     for item in chunks:
         for choice in item["choices"]:
+            if "text" in choice and "delta" not in choice:
+                content += choice["text"]
+                continue
             if "role" in choice["delta"]:
                 role = choice["delta"]["role"]
             if "content" in choice["delta"]:

Example usage:

llm -m chatgpt-instruct 'A poem about otters:'

Graceful and sleek, through the water they glide
With playful spirits and
llm -m chatgpt-instruct 'A poem about otters:' -o max_tokens 128

Graceful creatures of the sea
With fur so soft and eyes so keen
Otters, playful, wild and free
Inhabitants of a watery scene

With sleek bodies and flippers strong
They glide through water with ease
Their movements seem to flow along
As if dancing to the ocean’s breeze

Their playful nature knows no bounds
As they frolic and play all day
In search of fish, they can be found
In a game of hide and seek they play

Their laughter echoes through the waves
A joyful sound for all to hear
In their underwater caves
They have nothing to fear

@simonw
Copy link
Owner Author

simonw commented Sep 19, 2023

One option for system prompts: chuck it in **bold** at the start of the message.

@simonw
Copy link
Owner Author

simonw commented Sep 19, 2023

That default token size of 16 is really small.

https://platform.openai.com/playground?mode=complete defaults to 256 so I'm going to use that default instead.

CleanShot 2023-09-18 at 20 01 38@2x

@simonw
Copy link
Owner Author

simonw commented Sep 19, 2023

This is annoying: with a default max_tokens of 256 it seems to just cut off half way through the output:

llm -m instruct 'poem about an otter'

Graceful and sleek, the otter glides
Through the water, a creature of tides
With fur so soft and eyes so bright
It's hard not to be captivated by its sight

In streams and rivers, it loves to play
Diving, twisting, in a carefree way
It's an expert swimmer, with webbed feet
And a long tail, it uses to steer and greet

Its diet consists of fish and shellfish too
Crunching and munching, with a satisfied chew
But it's not just about survival, for this creature
It enjoys each meal, like it's a special feature

With a playful nature and curious mind
The otter explores, never one to find
A boring moment, in its watery home
It loves to adventure and freely roam

Cuddling with its family, on a cozy bed
The otter rests its tired head
A protector, provider, a loving mate
In its own way, it shows its love and fate

A symbol of joy and adaptability
The otter teaches us, with its own ability
To find happiness, in the simple things
And the importance of family and the joy it brings

Oh otter, with your

I checked and the output there is exactly 256. I was hoping it would be somehow aware of that limit.

@simonw
Copy link
Owner Author

simonw commented Sep 19, 2023

I'm going to add a logprobs option: https://platform.openai.com/docs/api-reference/completions/create#completions/create-logprobs

integer or null, Optional, Defaults to null

Include the log probabilities on the logprobs most likely tokens, as well the chosen tokens. For example, if logprobs is 5, the API will return a list of the 5 most likely tokens. The API will always return the logprob of the sampled token, so there may be up to logprobs+1 elements in the response.

The maximum value for logprobs is 5.

@simonw
Copy link
Owner Author

simonw commented Sep 19, 2023

Tried this:

llm -m instruct 'poem about an otter' -o max_tokens 10 -o logprobs 3

Looks like i need to take extra steps to get it to show up in the logged database response.

@simonw
Copy link
Owner Author

simonw commented Sep 19, 2023

I tried storing the full recursive dictionary version of the response if logprobs are present, ended up with this:

sqlite-utils "$(llm logs path)" 'select * from responses order by id desc limit 1' | jq '.[0].response_json' -r | jq
{
  "content": "\n\nIn the river, sleek and sly,\n",
  "role": null,
  "finish_reason": null,
  "chunks": [
    {
      "id": "cmpl-80LVJQkySEtmJWWug1AH0t7IN8vOO",
      "object": "text_completion",
      "created": 1695093353,
      "choices": [
        {
          "text": "\n\n",
          "index": 0,
          "logprobs": {
            "tokens": [
              "\n\n"
            ],
            "token_logprobs": [
              -0.19434144
            ],
            "top_logprobs": [
              {
                "\n\n": -0.19434144,
                "\n": -2.2880914,
                " \n\n": -5.4443407
              }
            ],
            "text_offset": [
              19
            ]
          },
          "finish_reason": null
        }
      ],
      "model": "gpt-3.5-turbo-instruct"
    },
    {
      "id": "cmpl-80LVJQkySEtmJWWug1AH0t7IN8vOO",
      "object": "text_completion",
      "created": 1695093353,
      "choices": [
        {
          "text": "In",
          "index": 0,
          "logprobs": {
            "tokens": [
              "In"
            ],
            "token_logprobs": [
              -1.7583014
            ],
            "top_logprobs": [
              {
                "In": -1.7583014,
                "S": -1.3833013,
                "Grace": -1.7426763
              }
            ],
            "text_offset": [
              21
            ]
          },
          "finish_reason": null
        }
      ],
      "model": "gpt-3.5-turbo-instruct"
    },
    {
      "id": "cmpl-80LVJQkySEtmJWWug1AH0t7IN8vOO",
      "object": "text_completion",
      "created": 1695093353,
      "choices": [
        {
          "text": " the",
          "index": 0,
          "logprobs": {
            "tokens": [
              " the"
            ],
            "token_logprobs": [
              -0.13828558
            ],
            "top_logprobs": [
              {
                " the": -0.13828558,
                " a": -2.6539104,
                " rivers": -3.9507854
              }
            ],
            "text_offset": [
              23
            ]
          },
          "finish_reason": null
        }
      ],
      "model": "gpt-3.5-turbo-instruct"
    },
    {
      "id": "cmpl-80LVJQkySEtmJWWug1AH0t7IN8vOO",
      "object": "text_completion",
      "created": 1695093353,
      "choices": [
        {
          "text": " river",
          "index": 0,
          "logprobs": {
            "tokens": [
              " river"
            ],
            "token_logprobs": [
              -0.78688633
            ],
            "top_logprobs": [
              {
                " river": -0.78688633,
                " water": -3.0681362,
                " sparkling": -3.099386
              }
            ],
            "text_offset": [
              27
            ]
          },
          "finish_reason": null
        }
      ],
      "model": "gpt-3.5-turbo-instruct"
    },
    {
      "id": "cmpl-80LVJQkySEtmJWWug1AH0t7IN8vOO",
      "object": "text_completion",
      "created": 1695093353,
      "choices": [
        {
          "text": ",",
          "index": 0,
          "logprobs": {
            "tokens": [
              ","
            ],
            "token_logprobs": [
              -0.94506526
            ],
            "top_logprobs": [
              {
                ",": -0.94506526,
                "'s": -1.0700654,
                " she": -2.6325655
              }
            ],
            "text_offset": [
              33
            ]
          },
          "finish_reason": null
        }
      ],
      "model": "gpt-3.5-turbo-instruct"
    },
    {
      "id": "cmpl-80LVJQkySEtmJWWug1AH0t7IN8vOO",
      "object": "text_completion",
      "created": 1695093353,
      "choices": [
        {
          "text": " sleek",
          "index": 0,
          "logprobs": {
            "tokens": [
              " sleek"
            ],
            "token_logprobs": [
              -1.239402
            ],
            "top_logprobs": [
              {
                " sleek": -1.239402,
                " swift": -0.723777,
                " graceful": -3.2550268
              }
            ],
            "text_offset": [
              34
            ]
          },
          "finish_reason": null
        }
      ],
      "model": "gpt-3.5-turbo-instruct"
    },
    {
      "id": "cmpl-80LVJQkySEtmJWWug1AH0t7IN8vOO",
      "object": "text_completion",
      "created": 1695093353,
      "choices": [
        {
          "text": " and",
          "index": 0,
          "logprobs": {
            "tokens": [
              " and"
            ],
            "token_logprobs": [
              -0.00032777296
            ],
            "top_logprobs": [
              {
                " and": -0.00032777296,
                " as": -8.484702,
                " ot": -10.187827
              }
            ],
            "text_offset": [
              40
            ]
          },
          "finish_reason": null
        }
      ],
      "model": "gpt-3.5-turbo-instruct"
    },
    {
      "id": "cmpl-80LVJQkySEtmJWWug1AH0t7IN8vOO",
      "object": "text_completion",
      "created": 1695093353,
      "choices": [
        {
          "text": " s",
          "index": 0,
          "logprobs": {
            "tokens": [
              " s"
            ],
            "token_logprobs": [
              -1.6860211
            ],
            "top_logprobs": [
              {
                " s": -1.6860211,
                " swift": -1.1078961,
                " quick": -1.7485211
              }
            ],
            "text_offset": [
              44
            ]
          },
          "finish_reason": null
        }
      ],
      "model": "gpt-3.5-turbo-instruct"
    },
    {
      "id": "cmpl-80LVJQkySEtmJWWug1AH0t7IN8vOO",
      "object": "text_completion",
      "created": 1695093353,
      "choices": [
        {
          "text": "ly",
          "index": 0,
          "logprobs": {
            "tokens": [
              "ly"
            ],
            "token_logprobs": [
              -0.014393331
            ],
            "top_logprobs": [
              {
                "ly": -0.014393331,
                "velte": -4.3425183,
                "li": -7.6550174
              }
            ],
            "text_offset": [
              46
            ]
          },
          "finish_reason": null
        }
      ],
      "model": "gpt-3.5-turbo-instruct"
    },
    {
      "id": "cmpl-80LVJQkySEtmJWWug1AH0t7IN8vOO",
      "object": "text_completion",
      "created": 1695093353,
      "choices": [
        {
          "text": ",\n",
          "index": 0,
          "logprobs": {
            "tokens": [
              ",\n"
            ],
            "token_logprobs": [
              -0.79212964
            ],
            "top_logprobs": [
              {
                ",\n": -0.79212964,
                "\n": -0.7452547,
                "  \n": -3.3858798
              }
            ],
            "text_offset": [
              48
            ]
          },
          "finish_reason": "length"
        }
      ],
      "model": "gpt-3.5-turbo-instruct"
    },
    {
      "id": "cmpl-80LVJQkySEtmJWWug1AH0t7IN8vOO",
      "object": "text_completion",
      "created": 1695093353,
      "choices": [
        {
          "text": "",
          "index": 0,
          "logprobs": {
            "tokens": [],
            "token_logprobs": [],
            "top_logprobs": [],
            "text_offset": []
          },
          "finish_reason": "length"
        }
      ],
      "model": "gpt-3.5-turbo-instruct"
    }
  ],
  "id": "cmpl-80LVJQkySEtmJWWug1AH0t7IN8vOO",
  "object": "text_completion",
  "model": "gpt-3.5-turbo-instruct",
  "created": 1695093353
}

That's pretty noisy though! Imagine that for a much longer response.

@simonw
Copy link
Owner Author

simonw commented Sep 19, 2023

I'll try storing a logprobs list of {"text": "...", "top_logprobs": {...}} instead.

@simonw
Copy link
Owner Author

simonw commented Sep 19, 2023

That's a bit neater:

llm -m instruct 'slogan for an an otter-run bakery' -o max_tokens 16 -o logprobs 3                   

"Fresh Treats That'll Make You Otterly Happy!"
sqlite-utils "$(llm logs path)" 'select * from responses order by id desc limit 1' \
  | jq '.[0].response_json' -r | jq
{
  "content": "\n\n\"Fresh Treats That'll Make You Otterly Happy!\"",
  "role": null,
  "finish_reason": null,
  "logprobs": [
    {
      "text": "\n",
      "top_logprobs": [
        {
          "\n": -1.9688251,
          "\n\n": -0.3125751,
          ":": -3.968825
        }
      ]
    },
    {
      "text": "\n",
      "top_logprobs": [
        {
          "\n": -0.33703142,
          "\"": -1.5089064,
          "\"S": -3.8995314
        }
      ]
    },
    {
      "text": "\"",
      "top_logprobs": [
        {
          "\"": -0.13510895,
          "\"S": -2.588234,
          "\"B": -4.119484
        }
      ]
    },
    {
      "text": "Fresh",
      "top_logprobs": [
        {
          "Fresh": -1.4623433,
          "Where": -1.1498433,
          "Ind": -2.5560932
        }
      ]
    },
    {
      "text": " Treat",
      "top_logprobs": [
        {
          "ly": -0.39274898,
          " treats": -1.861499,
          " baked": -3.4396236
        }
      ]
    },
    {
      "text": "s",
      "top_logprobs": [
        {
          "s": -6.5092986e-06,
          ",": -13.406255,
          "z": -13.82813
        }
      ]
    },
    {
      "text": " That",
      "top_logprobs": [
        {
          " from": -0.78117555,
          " Straight": -1.6874255,
          ",": -1.7968005
        }
      ]
    },
    {
      "text": "'ll",
      "top_logprobs": [
        {
          "'ll": -1.1900537,
          " Will": -1.4088038,
          " Make": -1.4556787
        }
      ]
    },
    {
      "text": " Make",
      "top_logprobs": [
        {
          " Make": -0.14272869,
          " Have": -2.8614783,
          " Leave": -3.5802286
        }
      ]
    },
    {
      "text": " You",
      "top_logprobs": [
        {
          " You": -0.26677564,
          " Your": -1.6730256,
          " a": -3.1574006
        }
      ]
    },
    {
      "text": " Ot",
      "top_logprobs": [
        {
          " Ot": -0.58028656,
          " Flip": -2.2677865,
          " Float": -2.2990365
        }
      ]
    },
    {
      "text": "ter",
      "top_logprobs": [
        {
          "ter": -0.0004432111,
          "term": -8.4223175,
          "terr": -9.3598175
        }
      ]
    },
    {
      "text": "ly",
      "top_logprobs": [
        {
          "ly": -0.040374786,
          "-": -3.7903748,
          "-L": -4.8216248
        }
      ]
    },
    {
      "text": " Happy",
      "top_logprobs": [
        {
          " Happy": -0.18345116,
          " S": -1.9959509,
          " Del": -3.8553257
        }
      ]
    },
    {
      "text": "!\"",
      "top_logprobs": [
        {
          "!\"": -0.035835575,
          "\"": -3.4264605,
          "!\"\n": -7.2077103
        }
      ]
    },
    {
      "text": "",
      "top_logprobs": []
    }
  ],
  "id": "cmpl-80LaT4oS6rQPFUg3BfbNn7xBsoIix",
  "object": "text_completion",
  "model": "gpt-3.5-turbo-instruct",
  "created": 1695093673
}

@simonw
Copy link
Owner Author

simonw commented Sep 19, 2023

It's a bit annoying that the only way to see the log probs is to dig around in the SQLite database for them.

I can't think of a clean way to let people opt into seeing them on stderr or similar though.

One option would be to teach the llm logs Markdown output how to display them. That's nicer than messing around in SQLite directly.

It's a bit weird to have code in llm logs that's specific to the OpenAI models though. Maybe I should add a model plugin mechanism that allows models to influence the display of logs?

@simonw
Copy link
Owner Author

simonw commented Sep 19, 2023

Since I opened a fresh issue for it I won't consider logprobs display any more here.

@simonw
Copy link
Owner Author

simonw commented Sep 19, 2023

Suffix support is unique to completion models and looks interesting too: https://platform.openai.com/docs/api-reference/completions/create#completions/create-suffix

@simonw
Copy link
Owner Author

simonw commented Sep 19, 2023

In trying to write a test for the new logprobs stuff I dropped this into the top of cli.py:

import requests
import json

def log_json(response, *args, **kwargs):
    try:
        data = response.json()
        print(json.dumps(data, indent=4))
    except ValueError:
        # No JSON data in the response
        pass
    return response


import openai
openai.requestssession = requests.Session()
openai.requestssession.hooks['response'].append(log_json)

This worked, but only with --no-stream (since otherwise the event source stuff broke the JSON parsing). I got back this:

llm -m instruct 'slogan for an an otter-run bakery' -o max_tokens 16 -o logprobs 3 --no-stream
{
    "id": "cmpl-80M2iXLmVOtxrxosK1crpgoxvpk2x",
    "object": "text_completion",
    "created": 1695095424,
    "model": "gpt-3.5-turbo-instruct",
    "choices": [
        {
            "text": "\n\n\"Fresh treats straight from the riverbank!\"",
            "index": 0,
            "logprobs": {
                "tokens": [
                    "\n\n",
                    "\"",
                    "Fresh",
                    " treats",
                    " straight",
                    " from",
                    " the",
                    " river",
                    "bank",
                    "!\""
                ],
                "token_logprobs": [
                    -0.3125751,
                    -0.18438435,
                    -1.3464811,
                    -1.6608365,
                    -2.354677,
                    -0.0038274676,
                    -0.013088844,
                    -0.15179907,
                    -1.417837,
                    -0.41310275
                ],
                "top_logprobs": [
                    {
                        "\n\n": -0.3125751,
                        "\n": -1.9688251,
                        ":": -3.968825
                    },
                    {
                        "\"": -0.18438435,
                        "\"S": -2.293759,
                        "\"B": -3.9812593
                    },
                    {
                        "Fresh": -1.3464811,
                        "Where": -1.1277312,
                        "Making": -2.549606
                    },
                    {
                        " treats": -1.6608365,
                        "ly": -0.48896152,
                        " baked": -3.2389612
                    },
                    {
                        " from": -1.0734268,
                        ",": -1.276552,
                        " that": -2.167177
                    },
                    {
                        " from": -0.0038274676,
                        " out": -5.738202,
                        " ot": -7.519452
                    },
                    {
                        " the": -0.013088844,
                        " our": -4.4662137,
                        " nature": -6.825588
                    },
                    {
                        " river": -0.15179907,
                        " ot": -2.308049,
                        " water": -3.917424
                    },
                    {
                        "bank": -1.417837,
                        "'s": -0.5584622,
                        " bank": -2.917837
                    },
                    {
                        "!\"": -0.41310275,
                        " to": -2.0537276,
                        ",": -2.2099779
                    }
                ],
                "text_offset": [
                    33,
                    35,
                    36,
                    41,
                    48,
                    57,
                    62,
                    66,
                    72,
                    76
                ]
            },
            "finish_reason": "stop"
        }
    ],
    "usage": {
        "prompt_tokens": 9,
        "completion_tokens": 10,
        "total_tokens": 19
    }
}

This is notably different from the format that I get for streaming responses.

It also highlights that the tests should really mock streaming responses too.

@simonw
Copy link
Owner Author

simonw commented Sep 19, 2023

Changing that debug function to just print(response.text) gave me this:

data: {"id":"cmpl-80M56thRFHyy1fPDQ79XeMwVGsJ4v","object":"text_completion","created":1695095572,"choices":[{"text":"\n\n","index":0,"logprobs":{"tokens":["\n\n"],"token_logprobs":[-0.3125751],"top_logprobs":[{"\n\n":-0.3125751,"\n":-1.9688251,":":-3.968825}],"text_offset":[33]},"finish_reason":null}],"model":"gpt-3.5-turbo-instruct"}

data: {"id":"cmpl-80M56thRFHyy1fPDQ79XeMwVGsJ4v","object":"text_completion","created":1695095572,"choices":[{"text":"\"B","index":0,"logprobs":{"tokens":["\"B"],"token_logprobs":[-3.9812593],"top_logprobs":[{"\"B":-3.9812593,"\"":-0.18438435,"\"S":-2.293759}],"text_offset":[35]},"finish_reason":null}],"model":"gpt-3.5-turbo-instruct"}

data: {"id":"cmpl-80M56thRFHyy1fPDQ79XeMwVGsJ4v","object":"text_completion","created":1695095572,"choices":[{"text":"aking","index":0,"logprobs":{"tokens":["aking"],"token_logprobs":[-0.64748794],"top_logprobs":[{"aking":-0.64748794,"ite":-1.444363,"aked":-1.7568629}],"text_offset":[37]},"finish_reason":null}],"model":"gpt-3.5-turbo-instruct"}

data: {"id":"cmpl-80M56thRFHyy1fPDQ79XeMwVGsJ4v","object":"text_completion","created":1695095572,"choices":[{"text":" up","index":0,"logprobs":{"tokens":[" up"],"token_logprobs":[-0.21909842],"top_logprobs":[{" up":-0.21909842," with":-2.6878486," happiness":-2.8597233}],"text_offset":[42]},"finish_reason":null}],"model":"gpt-3.5-turbo-instruct"}

data: {"id":"cmpl-80M56thRFHyy1fPDQ79XeMwVGsJ4v","object":"text_completion","created":1695095572,"choices":[{"text":" pure","index":0,"logprobs":{"tokens":[" pure"],"token_logprobs":[-5.3262787],"top_logprobs":[{" a":-0.8575289," smiles":-1.9825288," happiness":-2.4200287}],"text_offset":[45]},"finish_reason":null}],"model":"gpt-3.5-turbo-instruct"}

data: {"id":"cmpl-80M56thRFHyy1fPDQ79XeMwVGsJ4v","object":"text_completion","created":1695095572,"choices":[{"text":" ot","index":0,"logprobs":{"tokens":[" ot"],"token_logprobs":[-0.46827942],"top_logprobs":[{" ot":-0.46827942," joy":-1.3276544," delight":-2.8901541}],"text_offset":[50]},"finish_reason":null}],"model":"gpt-3.5-turbo-instruct"}

data: {"id":"cmpl-80M56thRFHyy1fPDQ79XeMwVGsJ4v","object":"text_completion","created":1695095572,"choices":[{"text":"ter","index":0,"logprobs":{"tokens":["ter"],"token_logprobs":[-0.0015959481],"top_logprobs":[{"ter":-0.0015959481,"terr":-7.126596,"term":-7.845346}],"text_offset":[53]},"finish_reason":null}],"model":"gpt-3.5-turbo-instruct"}

data: {"id":"cmpl-80M56thRFHyy1fPDQ79XeMwVGsJ4v","object":"text_completion","created":1695095572,"choices":[{"text":" goodness","index":0,"logprobs":{"tokens":[" goodness"],"token_logprobs":[-2.569492],"top_logprobs":[{"-":-1.2726171,"ly":-1.5382422,"lic":-1.8663673}],"text_offset":[56]},"finish_reason":null}],"model":"gpt-3.5-turbo-instruct"}

data: {"id":"cmpl-80M56thRFHyy1fPDQ79XeMwVGsJ4v","object":"text_completion","created":1695095572,"choices":[{"text":",","index":0,"logprobs":{"tokens":[","],"token_logprobs":[-0.84223104],"top_logprobs":[{",":-0.84223104,"!\"":-1.154731," in":-2.232856}],"text_offset":[65]},"finish_reason":null}],"model":"gpt-3.5-turbo-instruct"}

data: {"id":"cmpl-80M56thRFHyy1fPDQ79XeMwVGsJ4v","object":"text_completion","created":1695095572,"choices":[{"text":" one","index":0,"logprobs":{"tokens":[" one"],"token_logprobs":[-0.010558619],"top_logprobs":[{" one":-0.010558619," every":-5.432433," bite":-6.3074327}],"text_offset":[66]},"finish_reason":null}],"model":"gpt-3.5-turbo-instruct"}

data: {"id":"cmpl-80M56thRFHyy1fPDQ79XeMwVGsJ4v","object":"text_completion","created":1695095572,"choices":[{"text":" treat","index":0,"logprobs":{"tokens":[" treat"],"token_logprobs":[-0.31707865],"top_logprobs":[{" treat":-0.31707865," bite":-2.3639536," delicious":-2.6139536}],"text_offset":[70]},"finish_reason":null}],"model":"gpt-3.5-turbo-instruct"}

data: {"id":"cmpl-80M56thRFHyy1fPDQ79XeMwVGsJ4v","object":"text_completion","created":1695095572,"choices":[{"text":" at","index":0,"logprobs":{"tokens":[" at"],"token_logprobs":[-0.00004501652],"top_logprobs":[{" at":-0.00004501652," a":-10.82817," ot":-11.421921}],"text_offset":[76]},"finish_reason":null}],"model":"gpt-3.5-turbo-instruct"}

data: {"id":"cmpl-80M56thRFHyy1fPDQ79XeMwVGsJ4v","object":"text_completion","created":1695095572,"choices":[{"text":" a","index":0,"logprobs":{"tokens":[" a"],"token_logprobs":[-0.00006635395],"top_logprobs":[{" a":-0.00006635395," at":-10.390691," ":-10.609441}],"text_offset":[79]},"finish_reason":null}],"model":"gpt-3.5-turbo-instruct"}

data: {"id":"cmpl-80M56thRFHyy1fPDQ79XeMwVGsJ4v","object":"text_completion","created":1695095572,"choices":[{"text":" time","index":0,"logprobs":{"tokens":[" time"],"token_logprobs":[-0.00006468596],"top_logprobs":[{" time":-0.00006468596," t":-10.562564," tim":-11.51569}],"text_offset":[81]},"finish_reason":null}],"model":"gpt-3.5-turbo-instruct"}

data: {"id":"cmpl-80M56thRFHyy1fPDQ79XeMwVGsJ4v","object":"text_completion","created":1695095572,"choices":[{"text":"!\"","index":0,"logprobs":{"tokens":["!\""],"token_logprobs":[-0.07164093],"top_logprobs":[{"!\"":-0.07164093,".\"":-2.8060157,"\"":-5.399766}],"text_offset":[86]},"finish_reason":null}],"model":"gpt-3.5-turbo-instruct"}

data: {"id":"cmpl-80M56thRFHyy1fPDQ79XeMwVGsJ4v","object":"text_completion","created":1695095572,"choices":[{"text":"","index":0,"logprobs":{"tokens":[],"token_logprobs":[],"top_logprobs":[],"text_offset":[]},"finish_reason":"stop"}],"model":"gpt-3.5-turbo-instruct"}

data: [DONE]

@simonw
Copy link
Owner Author

simonw commented Sep 19, 2023

I'm going to need tests for the logprobs stuff.

LLM_OPENAI_SHOW_RESPONSES=1 llm -m 3.5-instruct 'say hi, one word' -o logprobs 2

Outputs:

data: {"id":"cmpl-80MdSaou7NnPuff5ZyRMysWBmgSPS","object":"text_completion","created":1695097702,"choices":[{"text":"\n\n","index":0,"logprobs":{"tokens":["\n\n"],"token_logprobs":[-0.61127675],"top_logprobs":[{"\n\n":-0.61127675,"\n":-1.9706517}],"text_offset":[16]},"finish_reason":null}],"model":"gpt-3.5-turbo-instruct"}

data: {"id":"cmpl-80MdSaou7NnPuff5ZyRMysWBmgSPS","object":"text_completion","created":1695097702,"choices":[{"text":"Hi","index":0,"logprobs":{"tokens":["Hi"],"token_logprobs":[-1.0273004],"top_logprobs":[{"Hi":-1.0273004,"Hello":-0.73042536}],"text_offset":[18]},"finish_reason":null}],"model":"gpt-3.5-turbo-instruct"}

data: {"id":"cmpl-80MdSaou7NnPuff5ZyRMysWBmgSPS","object":"text_completion","created":1695097702,"choices":[{"text":".","index":0,"logprobs":{"tokens":["."],"token_logprobs":[-1.1168935],"top_logprobs":[{".":-1.1168935,"!":-0.9450184}],"text_offset":[20]},"finish_reason":null}],"model":"gpt-3.5-turbo-instruct"}

data: {"id":"cmpl-80MdSaou7NnPuff5ZyRMysWBmgSPS","object":"text_completion","created":1695097702,"choices":[{"text":"","index":0,"logprobs":{"tokens":[],"token_logprobs":[],"top_logprobs":[],"text_offset":[]},"finish_reason":"stop"}],"model":"gpt-3.5-turbo-instruct"}

data: [DONE]



Hi.

And with --no-stream:

LLM_OPENAI_SHOW_RESPONSES=1 llm -m 3.5-instruct 'say hi, one word' -o logprobs 2 --no-stream
{
  "id": "cmpl-80MeBfKJutM0uMNJkRrebJLeP3bxL",
  "object": "text_completion",
  "created": 1695097747,
  "model": "gpt-3.5-turbo-instruct",
  "choices": [
    {
      "text": "\n\nHi!",
      "index": 0,
      "logprobs": {
        "tokens": [
          "\n\n",
          "Hi",
          "!"
        ],
        "token_logprobs": [
          -0.61127675,
          -1.0273004,
          -0.9450184
        ],
        "top_logprobs": [
          {
            "\n\n": -0.61127675,
            "\n": -1.9706517
          },
          {
            "Hi": -1.0273004,
            "Hello": -0.73042536
          },
          {
            "!": -0.9450184,
            ".": -1.1168935
          }
        ],
        "text_offset": [
          16,
          18,
          20
        ]
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 5,
    "completion_tokens": 3,
    "total_tokens": 8
  }
}

@simonw
Copy link
Owner Author

simonw commented Sep 19, 2023

My mocking trick isn't working here (despite working in #287).

    @pytest.mark.parametrize("streaming", (True, False))
    def test_openai_completion_logprobs(mocked_openai_completion, user_path, streaming):
        log_path = user_path / "logs.db"
        log_db = sqlite_utils.Database(str(log_path))
        log_db["responses"].delete_where()
        runner = CliRunner()
        args = ["-m", "gpt-3.5-turbo-instruct", "Say hi", "-o", "logprobs", "2", "--key", "x"]
        if not streaming:
            args.append("--no-stream")
>       result = runner.invoke(cli, args, catch_exceptions=False)


        if stream:
            # must be an iterator
>           assert not isinstance(response, OpenAIResponse)
E           AssertionError

../../../.local/share/virtualenvs/llm-p4p8CDpq/lib/python3.10/site-packages/openai/api_resources/abstract/engine_api_resource.py:165: AssertionError
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> entering PDB >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> PDB post_mortem (IO-capturing turned off) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> /Users/simon/.local/share/virtualenvs/llm-p4p8CDpq/lib/python3.10/site-packages/openai/api_resources/abstract/engine_api_resource.py(165)create()
-> assert not isinstance(response, OpenAIResponse)
(Pdb) response
<openai.openai_response.OpenAIResponse object at 0x29f21fd60>
(Pdb) list
160  	            request_timeout=request_timeout,
161  	        )
162  	
163  	        if stream:
164  	            # must be an iterator
165  ->	            assert not isinstance(response, OpenAIResponse)
166  	            return (
167  	                util.convert_to_openai_object(
168  	                    line,
169  	                    api_key,
170  	                    api_version,

https://github.com/openai/openai-python/blob/main/openai/tests/test_api_requestor.py is how OpenAI do it, but it's not very useful.

@simonw
Copy link
Owner Author

simonw commented Sep 19, 2023

My mistake, that trick DOES work, I was using the wrong fixture.

@simonw
Copy link
Owner Author

simonw commented Sep 19, 2023

Since I'm already doing this:

            completion = openai.Completion.create(
                model=self.model_name or self.model_id,
                prompt="\n".join(messages),
                stream=False,
                **kwargs,
            )
            response.response_json = completion.to_dict_recursive()
            yield completion.choices[0]["text"]

Which dumps the entire completion.to_dict_recursive() to the DB log for non-streaming responses, I don't need to do anything special to log logprobs for the non-streaming case.

They'll end up in the DB in a slightly different format. I'm OK with that:

{
      "choices": [
          {
              "finish_reason": "stop",
              "index": 0,
              "logprobs": {
                  "text_offset": [16, 18, 20],
                  "token_logprobs": [-0.6, -1.1, -0.9],
                  "tokens": ["\n\n", "Hi", "1"],
                  "top_logprobs": [
                      {"\n": -1.9, "\n\n": -0.6},
                      {"Hello": -0.7, "Hi": -1.1},
                      {"!": -1.1, ".": -0.9},
                  ],
              },
              "text": "\n\nHi.",
          }
      ],
      "created": 1695097747,
      "id": "cmpl-80MeBfKJutM0uMNJkRrebJLeP3bxL",
      "model": "gpt-3.5-turbo-instruct",
      "object": "text_completion",
      "usage": {"completion_tokens": 3, "prompt_tokens": 5, "total_tokens": 8},
  }

@simonw
Copy link
Owner Author

simonw commented Sep 19, 2023

Despite being listed in the documentation, the suffix option does not appear to work - at least not for gpt-3.5-turbo-instruct:

curl https://api.openai.com/v1/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -d '{
    "model": "gpt-3.5-turbo-instruct",
    "prompt": "Say this is a test",
    "max_tokens": 7,
    "suffix": "dog",
    "temperature": 0
  }'
{
  "error": {
    "message": "Unrecognized request argument supplied: suffix",
    "type": "invalid_request_error",
    "param": null,
    "code": null
  }
}

@simonw
Copy link
Owner Author

simonw commented Sep 19, 2023

I managed to submit suffix to an old text-davinci-003 completion model but it didn't seem to have the expected effect.

Dropping suffix support entirely.

@simonw
Copy link
Owner Author

simonw commented Sep 19, 2023

I'm going to do the **bold** thing for system prompts, and hope it doesn't turn out to be a bad idea later.

@sgondala
Copy link

Wouldn't throwing an error be a better alternative? These models aren't designed to use system prompts. It's better to communicate that to end users so that they can adjust their API calls accordingly.

@simonw
Copy link
Owner Author

simonw commented Sep 19, 2023

Wouldn't throwing an error be a better alternative? These models aren't designed to use system prompts. It's better to communicate that to end users so that they can adjust their API calls accordingly.

I'm really torn on this.

The reason I'm leaning towards keeping system prompts working is that a really useful application of LLM is to compare the results you get from different models.

It would be frustrating if you tried to compare the results of a prompt with a system prompt and got an error back because one of the dozen models you chose to use didn't support system prompts.

Plus, really when you look at what system prompts actually do in other models, they're basically just injected into the regular prompt with extra markup around them. For Llama 2 that looks like this, for example:

<s>[INST] <<SYS>>
{{ system_prompt }}
<</SYS>>

{{ user_message }} [/INST]

So wrapping the system prompt in bold is actually a pretty honest imitation of how they work everywhere else!

@simonw
Copy link
Owner Author

simonw commented Sep 19, 2023

Hmmm... thinking about it, we actually have a bit of a precedent problem here.

llm-claude silently ignores system prompts, which caught me out already - I fed it a system prompt and it didn't work, but I didn't realize because I didn't get an error.

On that basis, an error would actually be a better solution.

@simonw
Copy link
Owner Author

simonw commented Sep 19, 2023

Based on that decision in:

I'm going to have this plugin raise an error if you try to send it a system prompt. I'll get rid of that and replace it with the mechanism from #288 once that is implemented.

simonw added a commit that referenced this issue Sep 19, 2023
@simonw
Copy link
Owner Author

simonw commented Sep 19, 2023

I don't yet have a pattern for what exception should be raised by a Model.execute() method if something goes wrong.

@simonw
Copy link
Owner Author

simonw commented Sep 19, 2023

I'm going to raise NotImplmentedError for this, because the system prompt support is not implemented.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants