[Feature]: Add support for OpenAI's echo parameter. #699

ishaan-jaff · 2023-10-25T15:55:46Z

The Feature

What: Add support for OpenAI's echo parameter.

Why: Frameworks like lm-evaluation-harness rely on the echo parameter to get the logprobs of the prompt tokens. This could allow for a significant speed-up of lm-eval evaluation by using TGI's echo equivalent decoder_input_details. Though not all back-ends support it, it could also enable much easier comparisons of different model providers!

Bonus: TGI also supports top_n_tokens , which can return the log prob of the most likely tokens at each timestep, semi-equivalent of OpenAI's logprobs parameter.

Motivation, pitch

from @Vinno97

Twitter / LinkedIn details

No response

ishaan-jaff · 2023-10-25T15:57:03Z

is echo supported for chat completion? I only see it for completions: https://platform.openai.com/docs/api-reference/completions/object

Vinno97 · 2023-10-25T16:51:20Z

It appears it's only supported in the legacy completion API (of which I just learnt that it's legacy)

krrishdholakia · 2023-10-25T18:29:59Z

@Vinno97 @ishaan-jaff what're next steps on this?

Vinno97 · 2023-10-25T18:59:58Z

I'd be willing to help, if you agree on the value this brings to LiteLLM.

If this gets added, EleutherAI/lm-evaluation-harness#804 and EleutherAI/lm-evaluation-harness#869 are perhaps both solved already.

krrishdholakia · 2023-10-26T01:05:31Z

@Vinno97 recapping my understanding:

the 2 issues are for vllm and tgi
we support both (vllm also provides an openai compatible endpoint)
your issue is to add a way to see logprobs from litellm
we have this today via ['choices'][0]['message'].logprobs for TGI -

litellm/litellm/llms/huggingface_restapi.py

Line 313 in ae54db8

model_response["choices"][0]["message"]._logprob = sum_logprob

Open Questions

What is the problem you face today?
Does vllm's openai chatcompletions endpoint return logprobs?
What else do we need to add to solve it?

Vinno97 · 2023-10-26T09:45:35Z

I'm sorry for the confusion. My main point was about the echo feature, which makes the prompt be part of the API response, including the logprobs for every token in the sent prompt.

If you run a prompt through an LLM, it inherently outputs next-token logits for every token, not only the last one. OpenAI decided to expose this information via the echo parameter, TGI can do it via decoder_input_details, vLLM is currently working on it (vllm-project/vllm#833, vllm-project/vllm#201). This is a core feature that lm-eval relies on.

As an example: I can send two prompts: "the doctor is a man" and "the doctor is a woman", I can use echo to compare the exact logprobs of the words "man" and "woman" in this context.

I'd provide an example OpenAI response if I had access atm, but here's a TGI response (look at details.prefill):

Prompt: "The doctor is a man"

{
  "generated_text": " of",
  "details": {
    "finish_reason": "length",
    "generated_tokens": 1,
    "seed": null,
    "prefill": [
      {
        "id": 1410,
        "text": "the",
        "logprob": null
      },
      {
        "id": 5032,
        "text": " doctor",
        "logprob": -25.640625
      },
      {
        "id": 304,
        "text": " is",
        "logprob": -2.6445312
      },
      {
        "id": 241,
        "text": " a",
        "logprob": -2.8496094
      },
      {
        "id": 546,
        "text": " man",
        "logprob": -4.2695312
      }
    ],
    "tokens": [
      {
        "id": 275,
        "text": " of",
        "logprob": -1.8183594,
        "special": false
      }
    ],
    "top_tokens": null,
    "best_of_sequences": null
  }
}

Prompt: "The doctor is a woman"

{
  "generated_text": ",",
  "details": {
    "finish_reason": "length",
    "generated_tokens": 1,
    "seed": null,
    "prefill": [
      {
        "id": 1410,
        "text": "the",
        "logprob": null
      },
      {
        "id": 5032,
        "text": " doctor",
        "logprob": -25.640625
      },
      {
        "id": 304,
        "text": " is",
        "logprob": -2.6445312
      },
      {
        "id": 241,
        "text": " a",
        "logprob": -2.8496094
      },
      {
        "id": 2961,
        "text": " woman",
        "logprob": -3.1992188
      }
    ],
    "tokens": [
      {
        "id": 23,
        "text": ",",
        "logprob": -1.8242188,
        "special": false
      }
    ],
    "top_tokens": null,
    "best_of_sequences": null
  }
}

Here you can see that the model I'm using actually thinks that "the doctor is a woman" is a more likely sentence than "the doctor is a man"

leocnj · 2023-10-26T23:11:52Z

This PR vllm-project/vllm#959 supports echo=True for both engine and OpenAI API servers.

Using this branch, I can obtain log-probs for prompt tokens. Please give a try.

krrishdholakia · 2023-10-26T23:17:46Z

Hey @Vinno97 we'd welcome the PR for echo - excited see the approach!

ishaan-jaff · 2023-10-31T20:17:33Z

working on this PR

ishaan-jaff · 2023-10-31T20:25:18Z

We already support echo for text-davinci-003

response = completion(model="text-davinci-003", messages=messages, echo=True)

ishaan-jaff · 2023-10-31T20:38:55Z

it looks like lm-eval harness is not adding support for gpt-3.5-turbo since it does not return logprobs:
EleutherAI/lm-evaluation-harness#541

ishaan-jaff · 2023-10-31T21:23:14Z

was trying to use our text_completion with eval harness and it failed, lm harness passes prompt as a list - we need to add support for this

{'engine': 'text-davinci-003', 'prompt': [[3152, 833, 396, 2596, 338, 1306, 2239, 339, 373, 5055, 13970, 257, 13546, 11, 290, 262, 26839, 2971, 44193, 35254, 319, 262, 1660, 26, 290, 788, 339, 373, 6155, 832, 262, 12269, 11, 832, 21757, 1067, 3775, 11, 810, 262, 26839, 2971, 373, 12548, 287, 262, 2951, 286, 262, 8109, 286, 262, 1029, 3013, 1666, 26, 290, 788, 339, 373, ..

ishaan-jaff · 2023-10-31T21:30:10Z

fixed here: b4e14ae

ishaan-jaff · 2023-10-31T22:16:48Z

it looks like llm eval harness passes 'temperature': 0.0, 'max_tokens': 0, 'echo': True, 'logprobs': 10

Current issues:

Our litellm.text_completion() does not return the logprobs when it is set for text-davinci-003
HF TGI log probs would not be compatible with llm eval harness since we expect it to be accessed as ._logprob but llm harness expect this

"choices": [
    {
      "text": "on Guardian you get:\n\n1. Secure",
      "index": 0,
      "logprobs": {
        "tokens": [
          "on",
          " Guardian",
          " you",
          " get",
          ":",
          "\n",
          "\n",
          "1",
          ".",
          " Secure"
        ],
        "token_logprobs": [
          -3.7846956,
          -12.922583,
          -2.2359743,
          -3.0041907,
          -2.0863824,
          -0.029573089,
          -0.013009035,
          -1.3277724,
          -0.06319551,
          -1.4571579
        ],
        "top_logprobs": [
          {
            "ac": -2.6180239,
            "acey": -3.0217085,
            "usted": -3.2943392,
            "im": -3.4510107,
            "ish": -3.5101204
          },
          {
            ",": -1.683592,
            "\n": -3.2098136,
            "bytes:\\xe2\\x80": -3.2249804,
            "Wallet": -3.2496285,
            " Legacy": -3.2982492
          },
          {
            " you": -2.2359743,
            "\n": -1.2495747,
            ",": -1.2551193,
            " and": -3.8073368,
            "bytes:\\xe2\\x80": -4.5486817
          },
          {
            " get": -3.0041907,
            " can": -0.5294326,
            " will": -2.023661,
            " are": -2.8523924,
            " have": -3.0540316
          },
          {
            ":": -2.0863824,
            " a": -1.7326131,
            "\n": -1.8805203,
            " the": -1.9610744,
            " access": -2.7664504
          },
          {
            "\n": -0.029573089,
            " ": -4.1331725,
            "\n\n": -4.616098,
            "</": -7.5522456,
            "  ": -7.807361
          },
          {
            "\n": -0.013009035,
            "-": -5.246068,
            "*": -6.2495985,
            " ": -6.570405,
            " \u00a7\u00a7": -6.9500294
          },
          {
            "1": -1.3277724,
            "-": -0.9693186,
            "\u2022": -1.1041319,
            "*": -4.3544083,
            "T": -5.119253
          },
          {
            ".": -0.06319551,
            ")": -2.8080018,
            " -": -7.942205,
            "-": -8.040881,
            " ": -9.436379
          },
          {
            " Secure": -1.4571579,
            " Security": -1.6510513,
            " A": -1.8732746,
            " Enhanced": -2.6642444,
            " Increased": -3.5049694
          }
        ],
        "text_offset": [
          7,
          9,
          18,
          22,
          26,
          27,
          28,
          29,
          30,
          31
        ]
      },
      "finish_reason": "length"
    }
  ],

krrishdholakia · 2023-10-31T22:54:49Z

since we read and translate the chatcompletions output in the textcompletions endpoint, can't we just do the same for logprobs? @ishaan-jaff

ishaan-jaff · 2023-11-01T00:10:09Z

added support for transformed logprobs for TGI LLMs

{
   "id":"chatcmpl-8e87a54f-5cf7-401f-8ff4-e5d32c20c41a",
   "object":"text_completion",
   "created":1698797307.028908,
   "model":"bigcode/starcoder",
   "choices":[
      {
         "text":", I'm going to make you a sand",
         "index":0,
         "logprobs":{
            "tokens":[
               ",",
               " I",
               "'m",
               " going",
               " to",
               " make",
               " you",
               " a",
               " s",
               "and"
            ],
            "token_logprobs":[
               -2.2285156,
               -2.734375,
               -2.0957031,
               -2.0917969,
               -0.09429932,
               -3.1132812,
               -1.3203125,
               -1.2304688,
               -1.6201172,
               -0.010292053
            ]
         },
         "finish_reason":"length"
      }
   ],
   "usage":"<Usage at 0x1231fd210> JSON":{
      "completion_tokens":9,
      "prompt_tokens":2,
      "total_tokens":11
   }
}

ishaan-jaff · 2023-11-01T01:18:34Z

this is done we added support for echo for HF TGI LLMs - here's how you can use it @Vinno97

from litellm import text_completion
response = text_completion(
            model="huggingface/bigcode/starcoder", 
            prompt="good morning", 
            max_tokens=10, logprobs=10,
            echo=True
        )

Here's the response - you can see the input prompt part of the log probs

{
  "id":"chatcmpl-3fc71792-c442-4ba1-a611-19dd0ac371ad",
  "object":"text_completion",
  "created":1698801125.936519,
  "model":"bigcode/starcoder",
  "choices":[
     {
        "text":", I'm going to make you a sand",
        "index":0,
        "logprobs":{
           "tokens":[
              "good",
              " morning",
              ",",
              " I",
              "'m",
              " going",
              " to",
              " make",
              " you",
              " a",
              " s",
              "and"
           ],
           "token_logprobs":[
              "None",
              -14.96875,
              -2.2285156,
              -2.734375,
              -2.0957031,
              -2.0917969,
              -0.09429932,
              -3.1132812,
              -1.3203125,
              -1.2304688,
              -1.6201172,
              -0.010292053
           ]
        },
        "finish_reason":"length"
     }
  ],
  "usage":{
     "completion_tokens":9,
     "prompt_tokens":2,
     "total_tokens":11
  }
}

ishaan-jaff · 2023-11-01T01:38:35Z

docs on how to do this too:
https://docs.litellm.ai/docs/providers/huggingface#viewing-log-probs

ishaan-jaff added the enhancement New feature or request label Oct 25, 2023

ishaan-jaff mentioned this issue Oct 25, 2023

🎅 I WISH LITELLM HAD... #361

Open

ishaan-jaff self-assigned this Oct 31, 2023

ishaan-jaff closed this as completed Nov 4, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature]: Add support for OpenAI's echo parameter. #699

[Feature]: Add support for OpenAI's echo parameter. #699

ishaan-jaff commented Oct 25, 2023

ishaan-jaff commented Oct 25, 2023

Vinno97 commented Oct 25, 2023

krrishdholakia commented Oct 25, 2023

Vinno97 commented Oct 25, 2023

krrishdholakia commented Oct 26, 2023

Vinno97 commented Oct 26, 2023

leocnj commented Oct 26, 2023

krrishdholakia commented Oct 26, 2023

ishaan-jaff commented Oct 31, 2023

ishaan-jaff commented Oct 31, 2023

ishaan-jaff commented Oct 31, 2023

ishaan-jaff commented Oct 31, 2023

ishaan-jaff commented Oct 31, 2023

ishaan-jaff commented Oct 31, 2023

krrishdholakia commented Oct 31, 2023

ishaan-jaff commented Nov 1, 2023

ishaan-jaff commented Nov 1, 2023

ishaan-jaff commented Nov 1, 2023

[Feature]: Add support for OpenAI's echo parameter. #699

[Feature]: Add support for OpenAI's echo parameter. #699

Comments

ishaan-jaff commented Oct 25, 2023

The Feature

Motivation, pitch

Twitter / LinkedIn details

ishaan-jaff commented Oct 25, 2023

Vinno97 commented Oct 25, 2023

krrishdholakia commented Oct 25, 2023

Vinno97 commented Oct 25, 2023

krrishdholakia commented Oct 26, 2023

Vinno97 commented Oct 26, 2023

leocnj commented Oct 26, 2023

krrishdholakia commented Oct 26, 2023

ishaan-jaff commented Oct 31, 2023

ishaan-jaff commented Oct 31, 2023

ishaan-jaff commented Oct 31, 2023

ishaan-jaff commented Oct 31, 2023

ishaan-jaff commented Oct 31, 2023

ishaan-jaff commented Oct 31, 2023

Current issues:

krrishdholakia commented Oct 31, 2023

ishaan-jaff commented Nov 1, 2023

ishaan-jaff commented Nov 1, 2023

ishaan-jaff commented Nov 1, 2023