Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chat completion request streaming #271

Open
1 task done
luisfarzati opened this issue Aug 31, 2023 · 7 comments
Open
1 task done

Chat completion request streaming #271

luisfarzati opened this issue Aug 31, 2023 · 7 comments
Labels
enhancement New feature or request

Comments

@luisfarzati
Copy link

luisfarzati commented Aug 31, 2023

Confirm this is a feature request for the Node library and not the underlying OpenAI API.

  • This is a feature request for the Node library

Describe the feature or improvement you're requesting

It would be great to have an additional overload for chat.completions.create that can be used to stream requests.

In the most basic form it could be something like:

create(
    body: ReadableStream<string>,
    options?: Core.RequestOptions,
): APIPromise<Stream<ChatCompletionChunk>>;

However, passing the completion parameters in the stream may be cumbersome – and, since the point of streaming only makes sense when having big message contexts, a better option could be to use the stream only for the messages:

create(
    body: Omit<CompletionCreateParamsStreaming, "messages">,
    messages: ReadableStream<string>,
    options?: Core.RequestOptions,
): APIPromise<Stream<ChatCompletionChunk>>;

Usage example (of second approach)

// getMessageHistory makes a call to a database and returns a `ReadableStream<string>`
// The content of the stream is a JSON array of `CreateChatCompletionRequestMessage`
const stream = await getMessageHistory();

const params = {
  model: 'gpt-3.5-turbo',
  stream: true
}

const res = await openai.chat.completions.create(params, stream);

Additional context

This feature would be particularly useful in Edge environments.

@luisfarzati luisfarzati changed the title Request streaming support Chat completion request streaming Aug 31, 2023
@rattrayalex
Copy link
Collaborator

The API doesn't accept a stream of messages; the request must be a single JSON payload. If you'd like to request that feature in the API, you may do so here: https://community.openai.com

@rattrayalex rattrayalex added the openai api Related to underlying OpenAI API label Aug 31, 2023
@luisfarzati
Copy link
Author

luisfarzati commented Aug 31, 2023

Hi @rattrayalex, thanks for your response.

To clarify, I didn't mean to ask for any changes or feature in the API. The stream of messages was an optimization to the first suggestion - which is simply to be able to pass a stream as body for openai.chat.completions.create.

I can do this right now by using fetch like this (not coding in IDE - may have errors):

const encoder = new TextEncoder();

const stream = new ReadableStream({
  async pull(controller) {
    const body = JSON.stringify({
       model: "gpt-3.5-turbo",
       messages: [{ role: 'system', content: 'you are a helpful assistant' }]
    });
    const bytes = encoder.encode(body);
    controller.enqueue(bytes);
    controller.close();
})

fetch(CHAT_COMPLETION_ENDPOINT, {
  method: "POST",
  headers: { /* apiKey */ },
  body: stream
});

As of now, if I want to accomplish the same using this library instead of fetch, I can't because openai.chat.completions.create accepts the payload object for body.

Having the above solution provided out of the box (either as an overload of openai.chat.completions.create or via a different method) would be nice.

Additionally, an improvement of the above, IMO, would be to be able to send only the messages in the stream, while still be able to pass the options in an object. As in the previous example, today I can do this with fetch:

// stream with a JSON of CreateChatCompletionRequestMessage[]
// this is typically what we would want to stream - because it can get large
// (and as models get bigger context windows, this approach will become more desired)
const messageStream = await db.fetchMessages(); 

const encoder = new TextEncoder();

const stream = new ReadableStream({
  async start(controller) {
    // the options could still come from the object currently passed in the `body` argument
    const bytes = encoder.encode(`{"model":"gpt-3.5-turbo","temperature":0.5,"messages":`);
    controller.enqueue(bytes);
  },

  async pull(controller) {
     // here we combine the chat completion options sent above with the stream of messages coming from 
     // some source, e.g. a database
     for await (const chunk of messageStream) {
       controller.enqueue(encoder.encode(chunk));
     }
     controller.enqueue(encoder.encode(`}`));
  }
})

fetch(CHAT_COMPLETION_ENDPOINT, {
  method: "POST",
  headers: { /* apiKey */ },
  body: stream
});

@rattrayalex rattrayalex removed the openai api Related to underlying OpenAI API label Aug 31, 2023
@rattrayalex
Copy link
Collaborator

Interesting… can you share more about the motivating use-case for streaming request bodies?

@luisfarzati
Copy link
Author

luisfarzati commented Aug 31, 2023

In my case it's mainly for constrained environments such as Edge runtimes (e.g. Cloudflare Workers); in addition, when conversations are maxed in terms of context window and every roundtrip has 16k tokens (and we are already looking at GPT-4's 32k) we want to handle this efficiently in high traffic scenarios.

@rattrayalex
Copy link
Collaborator

Right, thank you, that makes sense. Sorry for misunderstanding at first. I'm not sure that we'll be able to do this in the short term but it does sound worthwhile.

@rattrayalex rattrayalex reopened this Sep 1, 2023
@luisfarzati
Copy link
Author

I might take a shot at opening a PR for this, if that's ok.

@rattrayalex
Copy link
Collaborator

Thanks @luisfarzati, I appreciate your willingness to do that – unfortunately, there's a very slim chance it'd be accepted at this time, since this library is generated and we'd want to implement this in a way that allows arbitrary streamable request bodies to all methods with minimal impact to the happy-path DX (which overloads can cause problems with).

You're welcome to make a fork and share it here for others to use in the meantime, however!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants