-
Notifications
You must be signed in to change notification settings - Fork 784
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Chat completion request streaming #271
Comments
The API doesn't accept a stream of |
Hi @rattrayalex, thanks for your response. To clarify, I didn't mean to ask for any changes or feature in the API. The stream of messages was an optimization to the first suggestion - which is simply to be able to pass a stream as body for I can do this right now by using const encoder = new TextEncoder();
const stream = new ReadableStream({
async pull(controller) {
const body = JSON.stringify({
model: "gpt-3.5-turbo",
messages: [{ role: 'system', content: 'you are a helpful assistant' }]
});
const bytes = encoder.encode(body);
controller.enqueue(bytes);
controller.close();
})
fetch(CHAT_COMPLETION_ENDPOINT, {
method: "POST",
headers: { /* apiKey */ },
body: stream
}); As of now, if I want to accomplish the same using this library instead of fetch, I can't because Having the above solution provided out of the box (either as an overload of Additionally, an improvement of the above, IMO, would be to be able to send only the messages in the stream, while still be able to pass the options in an object. As in the previous example, today I can do this with // stream with a JSON of CreateChatCompletionRequestMessage[]
// this is typically what we would want to stream - because it can get large
// (and as models get bigger context windows, this approach will become more desired)
const messageStream = await db.fetchMessages();
const encoder = new TextEncoder();
const stream = new ReadableStream({
async start(controller) {
// the options could still come from the object currently passed in the `body` argument
const bytes = encoder.encode(`{"model":"gpt-3.5-turbo","temperature":0.5,"messages":`);
controller.enqueue(bytes);
},
async pull(controller) {
// here we combine the chat completion options sent above with the stream of messages coming from
// some source, e.g. a database
for await (const chunk of messageStream) {
controller.enqueue(encoder.encode(chunk));
}
controller.enqueue(encoder.encode(`}`));
}
})
fetch(CHAT_COMPLETION_ENDPOINT, {
method: "POST",
headers: { /* apiKey */ },
body: stream
}); |
Interesting… can you share more about the motivating use-case for streaming request bodies? |
In my case it's mainly for constrained environments such as Edge runtimes (e.g. Cloudflare Workers); in addition, when conversations are maxed in terms of context window and every roundtrip has 16k tokens (and we are already looking at GPT-4's 32k) we want to handle this efficiently in high traffic scenarios. |
Right, thank you, that makes sense. Sorry for misunderstanding at first. I'm not sure that we'll be able to do this in the short term but it does sound worthwhile. |
I might take a shot at opening a PR for this, if that's ok. |
Thanks @luisfarzati, I appreciate your willingness to do that – unfortunately, there's a very slim chance it'd be accepted at this time, since this library is generated and we'd want to implement this in a way that allows arbitrary streamable request bodies to all methods with minimal impact to the happy-path DX (which overloads can cause problems with). You're welcome to make a fork and share it here for others to use in the meantime, however! |
Confirm this is a feature request for the Node library and not the underlying OpenAI API.
Describe the feature or improvement you're requesting
It would be great to have an additional overload for
chat.completions.create
that can be used to stream requests.In the most basic form it could be something like:
However, passing the completion parameters in the stream may be cumbersome – and, since the point of streaming only makes sense when having big message contexts, a better option could be to use the stream only for the messages:
Usage example (of second approach)
Additional context
This feature would be particularly useful in Edge environments.
The text was updated successfully, but these errors were encountered: