Add Verbose Transcription Object Return Type to Audio Transcription Endpoint #735

wrogati · 2024-03-24T16:26:02Z

Hello 👋,

I've made some enhancements to the OpenAI Node SDK, specifically around the audio transcription endpoint, which I believe will make it more useful and align it closer to the API's capabilities as documented. This pull request is in response to the need identified in issue #702.

What's Changed

Based on the API documentation for the audio transcription endpoint, there are two potential return types: a "transcription object" or a "verbose transcription object." The latter was not previously represented in the SDK. To address this, I've added the "verbose transcription object" as a possible return type for the create function in the transcription service. The updated function signature now looks like this:

create(body: TranscriptionCreateParams, options?: Core.RequestOptions): Core.APIPromise<Transcription | TranscriptionVerboseJson> {
    return this._client.post('/audio/transcriptions', multipartFormRequestOptions({ body, ...options }));
}

Points of Discussion

There are a couple of observations worth discussing, see the link API documentation for verbose_json transcription object:

Language Attribute Type: The API documentation lists the "language" attribute of the "verbose transcription object" as a "string". However, considering consistency and ease of use, I propose using an enum based on the ISO-639-1 standard, which is already used in the request body of this endpoint. This change could enhance the SDK's type safety and usability.
Duration Attribute Type: The "duration" attribute is documented as a "string," but in practice, the endpoint always returns it as a "number." My code reflects this reality, defining "duration" as a "number" in the interface. This adjustment deviates from the documentation but aligns with the actual behavior observed from the API, thus improving the SDK's accuracy.

/**
 * Represents a verbose JSON transcription response returned by the model, based on the provided input.
 */
export interface TranscriptionVerboseJson {
  /**
   * The language of the input audio.
   */
  language: string;

  /**
   * The duration of the input audio.
   */
  duration: number;

  /**
   * The transcribed text.
   */
  text: string;

  /**
   * Extracted words and their corresponding timestamps.
   */
  words?: VerboseJsonWord[];

  /**
   * Segments of the transcribed text and their corresponding details.
   */
  segments?: VerboseJsonSegment[];
}

Conclusion

These enhancements aim to make the SDK more reflective of the API's capabilities and more user-friendly. I'm open to feedback on these changes, especially regarding the proposed shift from string to enum for the "language" attribute and the adjustment of the "duration" attribute's type.

This pull request resolves issue #702.

Thank you for considering these improvements!

Best,
Wellington Rogati

…on response_format, fixes openai#702

rattrayalex · 2024-03-26T00:50:25Z

Thanks for the PR, but as I think we've shared elsewhere, we want to fix this with overloads, not a union type. The change proposed here would be a breaking type change for ~all current users.

We do plan to do this, it'll just be bit longer. We really are sorry for the inconvenience.

wrogati · 2024-04-07T14:45:33Z

Hi, Alex @rattrayalex!

Thank you for your response and guidance on the issue. I appreciate the direction you've provided on how we should address this problem. I am more than willing to create a new pull request to resolve the issue as you've suggested.

Before proceeding, I just want to confirm: would this be helpful for the project? I an eager to contribute effectively.

rattrayalex · 2024-04-07T19:42:24Z

Thanks, I really appreciate that! I think our team is still excited to solve this at the codegen level, which would make a handwritten solution here obsolete. If we don't get that done within 2-3 weeks, do ping again!

wrogati · 2024-04-07T21:57:39Z

Thanks for the update! I understand the plan and am happy to help if needed. I'll check back in a few weeks if there's no update. Wishing the team the best with the codegen solution!

rattrayalex · 2024-07-08T19:10:32Z

@wrogati sorry we haven't been able to fix this with codegen yet. Would you like to take another crack at doing this with overloads instead of a union?

Fix incorrect type when using verbose_json as the whisper transcripti…

ea2de12

…on response_format, fixes openai#702

wrogati requested a review from a team as a code owner March 24, 2024 16:26

rattrayalex closed this Mar 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Verbose Transcription Object Return Type to Audio Transcription Endpoint #735

Add Verbose Transcription Object Return Type to Audio Transcription Endpoint #735

wrogati commented Mar 24, 2024

rattrayalex commented Mar 26, 2024

wrogati commented Apr 7, 2024 •

edited

Loading

rattrayalex commented Apr 7, 2024

wrogati commented Apr 7, 2024

rattrayalex commented Jul 8, 2024

Add Verbose Transcription Object Return Type to Audio Transcription Endpoint #735

Add Verbose Transcription Object Return Type to Audio Transcription Endpoint #735

Conversation

wrogati commented Mar 24, 2024

What's Changed

Points of Discussion

Conclusion

rattrayalex commented Mar 26, 2024

wrogati commented Apr 7, 2024 • edited Loading

rattrayalex commented Apr 7, 2024

wrogati commented Apr 7, 2024

rattrayalex commented Jul 8, 2024

wrogati commented Apr 7, 2024 •

edited

Loading