Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature]: Add support for OpenAI's echo parameter. #699

Closed
ishaan-jaff opened this issue Oct 25, 2023 · 18 comments
Closed

[Feature]: Add support for OpenAI's echo parameter. #699

ishaan-jaff opened this issue Oct 25, 2023 · 18 comments
Assignees
Labels
enhancement New feature or request

Comments

@ishaan-jaff
Copy link
Contributor

The Feature

What: Add support for OpenAI's echo parameter.

Why: Frameworks like lm-evaluation-harness rely on the echo parameter to get the logprobs of the prompt tokens. This could allow for a significant speed-up of lm-eval evaluation by using TGI's echo equivalent decoder_input_details. Though not all back-ends support it, it could also enable much easier comparisons of different model providers!

Bonus: TGI also supports top_n_tokens , which can return the log prob of the most likely tokens at each timestep, semi-equivalent of OpenAI's logprobs parameter.

Motivation, pitch

from @Vinno97

Twitter / LinkedIn details

No response

@ishaan-jaff ishaan-jaff added the enhancement New feature or request label Oct 25, 2023
@ishaan-jaff
Copy link
Contributor Author

is echo supported for chat completion? I only see it for completions: https://platform.openai.com/docs/api-reference/completions/object

@Vinno97
Copy link

Vinno97 commented Oct 25, 2023

It appears it's only supported in the legacy completion API (of which I just learnt that it's legacy)

@krrishdholakia
Copy link
Contributor

@Vinno97 @ishaan-jaff what're next steps on this?

@Vinno97
Copy link

Vinno97 commented Oct 25, 2023

I'd be willing to help, if you agree on the value this brings to LiteLLM.

If this gets added, EleutherAI/lm-evaluation-harness#804 and EleutherAI/lm-evaluation-harness#869 are perhaps both solved already.

@krrishdholakia
Copy link
Contributor

@Vinno97 recapping my understanding:

  • the 2 issues are for vllm and tgi
  • we support both (vllm also provides an openai compatible endpoint)
  • your issue is to add a way to see logprobs from litellm
  • we have this today via ['choices'][0]['message'].logprobs for TGI -
    model_response["choices"][0]["message"]._logprob = sum_logprob

Open Questions

  • What is the problem you face today?
  • Does vllm's openai chatcompletions endpoint return logprobs?
  • What else do we need to add to solve it?

@Vinno97
Copy link

Vinno97 commented Oct 26, 2023

I'm sorry for the confusion. My main point was about the echo feature, which makes the prompt be part of the API response, including the logprobs for every token in the sent prompt.

If you run a prompt through an LLM, it inherently outputs next-token logits for every token, not only the last one. OpenAI decided to expose this information via the echo parameter, TGI can do it via decoder_input_details, vLLM is currently working on it (vllm-project/vllm#833, vllm-project/vllm#201). This is a core feature that lm-eval relies on.

As an example: I can send two prompts: "the doctor is a man" and "the doctor is a woman", I can use echo to compare the exact logprobs of the words "man" and "woman" in this context.

I'd provide an example OpenAI response if I had access atm, but here's a TGI response (look at details.prefill):

Prompt: "The doctor is a man"

{
  "generated_text": " of",
  "details": {
    "finish_reason": "length",
    "generated_tokens": 1,
    "seed": null,
    "prefill": [
      {
        "id": 1410,
        "text": "the",
        "logprob": null
      },
      {
        "id": 5032,
        "text": " doctor",
        "logprob": -25.640625
      },
      {
        "id": 304,
        "text": " is",
        "logprob": -2.6445312
      },
      {
        "id": 241,
        "text": " a",
        "logprob": -2.8496094
      },
      {
        "id": 546,
        "text": " man",
        "logprob": -4.2695312
      }
    ],
    "tokens": [
      {
        "id": 275,
        "text": " of",
        "logprob": -1.8183594,
        "special": false
      }
    ],
    "top_tokens": null,
    "best_of_sequences": null
  }
}

Prompt: "The doctor is a woman"

{
  "generated_text": ",",
  "details": {
    "finish_reason": "length",
    "generated_tokens": 1,
    "seed": null,
    "prefill": [
      {
        "id": 1410,
        "text": "the",
        "logprob": null
      },
      {
        "id": 5032,
        "text": " doctor",
        "logprob": -25.640625
      },
      {
        "id": 304,
        "text": " is",
        "logprob": -2.6445312
      },
      {
        "id": 241,
        "text": " a",
        "logprob": -2.8496094
      },
      {
        "id": 2961,
        "text": " woman",
        "logprob": -3.1992188
      }
    ],
    "tokens": [
      {
        "id": 23,
        "text": ",",
        "logprob": -1.8242188,
        "special": false
      }
    ],
    "top_tokens": null,
    "best_of_sequences": null
  }
}

Here you can see that the model I'm using actually thinks that "the doctor is a woman" is a more likely sentence than "the doctor is a man"

@leocnj
Copy link

leocnj commented Oct 26, 2023

This PR vllm-project/vllm#959 supports echo=True for both engine and OpenAI API servers.

Using this branch, I can obtain log-probs for prompt tokens. Please give a try.

@krrishdholakia
Copy link
Contributor

Hey @Vinno97 we'd welcome the PR for echo - excited see the approach!

@ishaan-jaff
Copy link
Contributor Author

working on this PR

@ishaan-jaff ishaan-jaff self-assigned this Oct 31, 2023
@ishaan-jaff
Copy link
Contributor Author

  • We already support echo for text-davinci-003

response = completion(model="text-davinci-003", messages=messages, echo=True)

@ishaan-jaff
Copy link
Contributor Author

it looks like lm-eval harness is not adding support for gpt-3.5-turbo since it does not return logprobs:
EleutherAI/lm-evaluation-harness#541

@ishaan-jaff
Copy link
Contributor Author

was trying to use our text_completion with eval harness and it failed, lm harness passes prompt as a list - we need to add support for this

{'engine': 'text-davinci-003', 'prompt': [[3152, 833, 396, 2596, 338, 1306, 2239, 339, 373, 5055, 13970, 257, 13546, 11, 290, 262, 26839, 2971, 44193, 35254, 319, 262, 1660, 26, 290, 788, 339, 373, 6155, 832, 262, 12269, 11, 832, 21757, 1067, 3775, 11, 810, 262, 26839, 2971, 373, 12548, 287, 262, 2951, 286, 262, 8109, 286, 262, 1029, 3013, 1666, 26, 290, 788, 339, 373, ..

@ishaan-jaff
Copy link
Contributor Author

fixed here: b4e14ae

@ishaan-jaff
Copy link
Contributor Author

it looks like llm eval harness passes 'temperature': 0.0, 'max_tokens': 0, 'echo': True, 'logprobs': 10

Current issues:

  • Our litellm.text_completion() does not return the logprobs when it is set for text-davinci-003
  • HF TGI log probs would not be compatible with llm eval harness since we expect it to be accessed as ._logprob but llm harness expect this
"choices": [
    {
      "text": "on Guardian you get:\n\n1. Secure",
      "index": 0,
      "logprobs": {
        "tokens": [
          "on",
          " Guardian",
          " you",
          " get",
          ":",
          "\n",
          "\n",
          "1",
          ".",
          " Secure"
        ],
        "token_logprobs": [
          -3.7846956,
          -12.922583,
          -2.2359743,
          -3.0041907,
          -2.0863824,
          -0.029573089,
          -0.013009035,
          -1.3277724,
          -0.06319551,
          -1.4571579
        ],
        "top_logprobs": [
          {
            "ac": -2.6180239,
            "acey": -3.0217085,
            "usted": -3.2943392,
            "im": -3.4510107,
            "ish": -3.5101204
          },
          {
            ",": -1.683592,
            "\n": -3.2098136,
            "bytes:\\xe2\\x80": -3.2249804,
            "Wallet": -3.2496285,
            " Legacy": -3.2982492
          },
          {
            " you": -2.2359743,
            "\n": -1.2495747,
            ",": -1.2551193,
            " and": -3.8073368,
            "bytes:\\xe2\\x80": -4.5486817
          },
          {
            " get": -3.0041907,
            " can": -0.5294326,
            " will": -2.023661,
            " are": -2.8523924,
            " have": -3.0540316
          },
          {
            ":": -2.0863824,
            " a": -1.7326131,
            "\n": -1.8805203,
            " the": -1.9610744,
            " access": -2.7664504
          },
          {
            "\n": -0.029573089,
            " ": -4.1331725,
            "\n\n": -4.616098,
            "</": -7.5522456,
            "  ": -7.807361
          },
          {
            "\n": -0.013009035,
            "-": -5.246068,
            "*": -6.2495985,
            " ": -6.570405,
            " \u00a7\u00a7": -6.9500294
          },
          {
            "1": -1.3277724,
            "-": -0.9693186,
            "\u2022": -1.1041319,
            "*": -4.3544083,
            "T": -5.119253
          },
          {
            ".": -0.06319551,
            ")": -2.8080018,
            " -": -7.942205,
            "-": -8.040881,
            " ": -9.436379
          },
          {
            " Secure": -1.4571579,
            " Security": -1.6510513,
            " A": -1.8732746,
            " Enhanced": -2.6642444,
            " Increased": -3.5049694
          }
        ],
        "text_offset": [
          7,
          9,
          18,
          22,
          26,
          27,
          28,
          29,
          30,
          31
        ]
      },
      "finish_reason": "length"
    }
  ],

@krrishdholakia
Copy link
Contributor

since we read and translate the chatcompletions output in the textcompletions endpoint, can't we just do the same for logprobs? @ishaan-jaff

@ishaan-jaff
Copy link
Contributor Author

added support for transformed logprobs for TGI LLMs

{
   "id":"chatcmpl-8e87a54f-5cf7-401f-8ff4-e5d32c20c41a",
   "object":"text_completion",
   "created":1698797307.028908,
   "model":"bigcode/starcoder",
   "choices":[
      {
         "text":", I'm going to make you a sand",
         "index":0,
         "logprobs":{
            "tokens":[
               ",",
               " I",
               "'m",
               " going",
               " to",
               " make",
               " you",
               " a",
               " s",
               "and"
            ],
            "token_logprobs":[
               -2.2285156,
               -2.734375,
               -2.0957031,
               -2.0917969,
               -0.09429932,
               -3.1132812,
               -1.3203125,
               -1.2304688,
               -1.6201172,
               -0.010292053
            ]
         },
         "finish_reason":"length"
      }
   ],
   "usage":"<Usage at 0x1231fd210> JSON":{
      "completion_tokens":9,
      "prompt_tokens":2,
      "total_tokens":11
   }
}

@ishaan-jaff
Copy link
Contributor Author

this is done we added support for echo for HF TGI LLMs - here's how you can use it @Vinno97

from litellm import text_completion
response = text_completion(
            model="huggingface/bigcode/starcoder", 
            prompt="good morning", 
            max_tokens=10, logprobs=10,
            echo=True
        )

Here's the response - you can see the input prompt part of the log probs

{
  "id":"chatcmpl-3fc71792-c442-4ba1-a611-19dd0ac371ad",
  "object":"text_completion",
  "created":1698801125.936519,
  "model":"bigcode/starcoder",
  "choices":[
     {
        "text":", I'm going to make you a sand",
        "index":0,
        "logprobs":{
           "tokens":[
              "good",
              " morning",
              ",",
              " I",
              "'m",
              " going",
              " to",
              " make",
              " you",
              " a",
              " s",
              "and"
           ],
           "token_logprobs":[
              "None",
              -14.96875,
              -2.2285156,
              -2.734375,
              -2.0957031,
              -2.0917969,
              -0.09429932,
              -3.1132812,
              -1.3203125,
              -1.2304688,
              -1.6201172,
              -0.010292053
           ]
        },
        "finish_reason":"length"
     }
  ],
  "usage":{
     "completion_tokens":9,
     "prompt_tokens":2,
     "total_tokens":11
  }
}

@ishaan-jaff
Copy link
Contributor Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants