Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for web ReadableStream without buffering the whole file. #418

Open
1 task done
nicolopadovan opened this issue Nov 3, 2023 · 3 comments
Open
1 task done
Labels
enhancement New feature or request

Comments

@nicolopadovan
Copy link

nicolopadovan commented Nov 3, 2023

Confirm this is a Node library issue and not an underlying OpenAI API issue

  • This is an issue with the Node library

Describe the bug

Whenever a web ReadableStream is passed to the toFile helper function, the full contents of the file are buffered before forwarding the request to the Whisper OpenAI API endpoint.
However, it is possible to avoid having to buffer the whole file into the server memory, and instead just use the server as a middleware that streamlines the data from the source of the file to the API endpoint.
The problem has been discussed on issue #414 as well.

A workaround using axios and FormData that seems to work:

import FormData = require("form-data");
import axios from "axios";

async function foo() {

  const bucket = storage.bucket(bucketName);
  const file = bucket.file(fileName);
  const readStream = file.createReadStream();
  
  const form = new FormData();
  
  // Make sure that the file has the proper extension as well
  // (In this example, we just add the `webm` extension for brevity
  if (!fileName.includes(".")) {
      fileName = `${fileName}.webm`;
  }
  
  form.append("file", readStream, fileName);
  form.append("model", "whisper-1");
  
  const apiKey = OPENAI_API_KEY
  
  const response = await axios.post(
      "https://api.openai.com/v1/audio/transcriptions",
      form,
      {
          headers: {
              ...form.getHeaders(),
              Authorization: `Bearer ${apiKey}`,
          },
      }
  );
  
  const transcription = response.data.text:
  return transcription;
}

To Reproduce

Example uses Cloud / Firebase Storage.

Code snippets

import {OpenAI, toFile} from 'openai'

const bucket = storage.bucket(bucketName);
const file = bucket.file(fileName);

const openai = new OpenAI({
    apiKey: apiKey,
});

const completion = await openai.audio.transcriptions.create({
    file: await toFile(file, "myfile.mp3"),
    model: "whisper-1",
});

OS

Linux (Google Cloud Functions)

Node version

18

Library version

4.11.1

@nicolopadovan nicolopadovan added the bug Something isn't working label Nov 3, 2023
@rattrayalex rattrayalex added enhancement New feature or request and removed bug Something isn't working labels Nov 3, 2023
@rattrayalex
Copy link
Collaborator

rattrayalex commented Nov 3, 2023

Thanks for filing, I would like to do this at some point.

Note that this applies to all forms of streams, not just web ReadableStreams (and the GCP libraries return NodeJS ReadableStreams, not web ones).

Unfortunately, I'm not sure this will really be possible in a clean way until OpenAI's backend can infer content-types from the contents instead of from the filenames, since you can't construct a File class with a stream, and you need the filename.

I'll leave this as an open TODO for now.

@nicolopadovan
Copy link
Author

nicolopadovan commented Nov 4, 2023

Thanks for filing, I would like to do this at some point.

Note that this applies to all forms of streams, not just web ReadableStreams (and the GCP libraries return NodeJS ReadableStreams, not web ones).

Unfortunately, I'm not sure this will really be possible in a clean way until OpenAI's backend can infer content-types from the contents instead of from the filenames, since you can't construct a File class with a stream, and you need the filename.

I'll leave this as an open TODO for now.

Wouldn’t it be possible to infer the file type server-side, and using that to specify the correct extension for the API to work with?
In my proposed workaround, I am just naively adding the .webm extension for brevity, but it would be possible to improve that snippet in order to infer the actual filetype.
You can see that I am passing the readstream as well as the filename in the form, without actually buffering the file into the server itself.
Obviously, the best thing would be if the API could directly use the Google Storage file, without having to pass it through the server, but this workaround allows at least to reduce the memory that is needed at any given moment for the operation, since the readstream data is deallocated as soon as possible.

I will be working on a PR to implement this if I have the time :)

@rattrayalex
Copy link
Collaborator

Thank you. This relates to #271 but I think we would approach it differently.

In this case, it seems it may be simplest to allow params: FormData in place of params: TranscriptionCreateParams, eg:

import {OpenAI} from 'openai'

const openai = new OpenAI();

const form = new FormData();
form.append("file", myReadStream, myFileName);
form.append("model", "whisper-1");

const transcription = await openai.audio.transcriptions.create(form);

I'm not sure how trivial this will be for us (it may be relatively simple).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants