Chat completion request streaming #271

luisfarzati · 2023-08-31T09:05:05Z

Confirm this is a feature request for the Node library and not the underlying OpenAI API.

This is a feature request for the Node library

Describe the feature or improvement you're requesting

It would be great to have an additional overload for chat.completions.create that can be used to stream requests.

In the most basic form it could be something like:

create(
    body: ReadableStream<string>,
    options?: Core.RequestOptions,
): APIPromise<Stream<ChatCompletionChunk>>;

However, passing the completion parameters in the stream may be cumbersome – and, since the point of streaming only makes sense when having big message contexts, a better option could be to use the stream only for the messages:

create(
    body: Omit<CompletionCreateParamsStreaming, "messages">,
    messages: ReadableStream<string>,
    options?: Core.RequestOptions,
): APIPromise<Stream<ChatCompletionChunk>>;

Usage example (of second approach)

// getMessageHistory makes a call to a database and returns a `ReadableStream<string>`
// The content of the stream is a JSON array of `CreateChatCompletionRequestMessage`
const stream = await getMessageHistory();

const params = {
  model: 'gpt-3.5-turbo',
  stream: true
}

const res = await openai.chat.completions.create(params, stream);

Additional context

This feature would be particularly useful in Edge environments.

The text was updated successfully, but these errors were encountered:

rattrayalex · 2023-08-31T15:29:43Z

The API doesn't accept a stream of messages; the request must be a single JSON payload. If you'd like to request that feature in the API, you may do so here: https://community.openai.com

luisfarzati · 2023-08-31T22:14:41Z

Hi @rattrayalex, thanks for your response.

To clarify, I didn't mean to ask for any changes or feature in the API. The stream of messages was an optimization to the first suggestion - which is simply to be able to pass a stream as body for openai.chat.completions.create.

I can do this right now by using fetch like this (not coding in IDE - may have errors):

const encoder = new TextEncoder();

const stream = new ReadableStream({
  async pull(controller) {
    const body = JSON.stringify({
       model: "gpt-3.5-turbo",
       messages: [{ role: 'system', content: 'you are a helpful assistant' }]
    });
    const bytes = encoder.encode(body);
    controller.enqueue(bytes);
    controller.close();
})

fetch(CHAT_COMPLETION_ENDPOINT, {
  method: "POST",
  headers: { /* apiKey */ },
  body: stream
});

As of now, if I want to accomplish the same using this library instead of fetch, I can't because openai.chat.completions.create accepts the payload object for body.

Having the above solution provided out of the box (either as an overload of openai.chat.completions.create or via a different method) would be nice.

Additionally, an improvement of the above, IMO, would be to be able to send only the messages in the stream, while still be able to pass the options in an object. As in the previous example, today I can do this with fetch:

// stream with a JSON of CreateChatCompletionRequestMessage[]
// this is typically what we would want to stream - because it can get large
// (and as models get bigger context windows, this approach will become more desired)
const messageStream = await db.fetchMessages(); 

const encoder = new TextEncoder();

const stream = new ReadableStream({
  async start(controller) {
    // the options could still come from the object currently passed in the `body` argument
    const bytes = encoder.encode(`{"model":"gpt-3.5-turbo","temperature":0.5,"messages":`);
    controller.enqueue(bytes);
  },

  async pull(controller) {
     // here we combine the chat completion options sent above with the stream of messages coming from 
     // some source, e.g. a database
     for await (const chunk of messageStream) {
       controller.enqueue(encoder.encode(chunk));
     }
     controller.enqueue(encoder.encode(`}`));
  }
})

fetch(CHAT_COMPLETION_ENDPOINT, {
  method: "POST",
  headers: { /* apiKey */ },
  body: stream
});

rattrayalex · 2023-08-31T22:17:19Z

Interesting… can you share more about the motivating use-case for streaming request bodies?

luisfarzati · 2023-08-31T22:43:51Z

In my case it's mainly for constrained environments such as Edge runtimes (e.g. Cloudflare Workers); in addition, when conversations are maxed in terms of context window and every roundtrip has 16k tokens (and we are already looking at GPT-4's 32k) we want to handle this efficiently in high traffic scenarios.

rattrayalex · 2023-09-01T02:49:07Z

Right, thank you, that makes sense. Sorry for misunderstanding at first. I'm not sure that we'll be able to do this in the short term but it does sound worthwhile.

luisfarzati · 2023-09-01T10:15:25Z

I might take a shot at opening a PR for this, if that's ok.

rattrayalex · 2023-09-01T14:20:08Z

Thanks @luisfarzati, I appreciate your willingness to do that – unfortunately, there's a very slim chance it'd be accepted at this time, since this library is generated and we'd want to implement this in a way that allows arbitrary streamable request bodies to all methods with minimal impact to the happy-path DX (which overloads can cause problems with).

You're welcome to make a fork and share it here for others to use in the meantime, however!

luisfarzati changed the title ~~Request streaming support~~ Chat completion request streaming Aug 31, 2023

rattrayalex closed this as completed Aug 31, 2023

rattrayalex added the openai api Related to underlying OpenAI API label Aug 31, 2023

rattrayalex removed the openai api Related to underlying OpenAI API label Aug 31, 2023

rattrayalex reopened this Sep 1, 2023

rattrayalex added the enhancement New feature or request label Sep 3, 2023

rattrayalex mentioned this issue Nov 4, 2023

Support for web ReadableStream without buffering the whole file. #418

Open

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Chat completion request streaming #271

Chat completion request streaming #271

luisfarzati commented Aug 31, 2023 •

edited

Loading

rattrayalex commented Aug 31, 2023

luisfarzati commented Aug 31, 2023 •

edited

Loading

rattrayalex commented Aug 31, 2023

luisfarzati commented Aug 31, 2023 •

edited

Loading

rattrayalex commented Sep 1, 2023

luisfarzati commented Sep 1, 2023

rattrayalex commented Sep 1, 2023

Chat completion request streaming #271

Chat completion request streaming #271

Comments

luisfarzati commented Aug 31, 2023 • edited Loading

Confirm this is a feature request for the Node library and not the underlying OpenAI API.

Describe the feature or improvement you're requesting

Usage example (of second approach)

Additional context

rattrayalex commented Aug 31, 2023

luisfarzati commented Aug 31, 2023 • edited Loading

rattrayalex commented Aug 31, 2023

luisfarzati commented Aug 31, 2023 • edited Loading

rattrayalex commented Sep 1, 2023

luisfarzati commented Sep 1, 2023

rattrayalex commented Sep 1, 2023

luisfarzati commented Aug 31, 2023 •

edited

Loading

luisfarzati commented Aug 31, 2023 •

edited

Loading

luisfarzati commented Aug 31, 2023 •

edited

Loading