Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Verbose Transcription Object Return Type to Audio Transcription Endpoint #735

Closed

Conversation

wrogati
Copy link

@wrogati wrogati commented Mar 24, 2024

Hello 👋,

I've made some enhancements to the OpenAI Node SDK, specifically around the audio transcription endpoint, which I believe will make it more useful and align it closer to the API's capabilities as documented. This pull request is in response to the need identified in issue #702.

What's Changed

Based on the API documentation for the audio transcription endpoint, there are two potential return types: a "transcription object" or a "verbose transcription object." The latter was not previously represented in the SDK. To address this, I've added the "verbose transcription object" as a possible return type for the create function in the transcription service. The updated function signature now looks like this:

create(body: TranscriptionCreateParams, options?: Core.RequestOptions): Core.APIPromise<Transcription | TranscriptionVerboseJson> {
    return this._client.post('/audio/transcriptions', multipartFormRequestOptions({ body, ...options }));
}

Points of Discussion

There are a couple of observations worth discussing, see the link API documentation for verbose_json transcription object:

  1. Language Attribute Type: The API documentation lists the "language" attribute of the "verbose transcription object" as a "string". However, considering consistency and ease of use, I propose using an enum based on the ISO-639-1 standard, which is already used in the request body of this endpoint. This change could enhance the SDK's type safety and usability.

  2. Duration Attribute Type: The "duration" attribute is documented as a "string," but in practice, the endpoint always returns it as a "number." My code reflects this reality, defining "duration" as a "number" in the interface. This adjustment deviates from the documentation but aligns with the actual behavior observed from the API, thus improving the SDK's accuracy.

image

/**
 * Represents a verbose JSON transcription response returned by the model, based on the provided input.
 */
export interface TranscriptionVerboseJson {
  /**
   * The language of the input audio.
   */
  language: string;

  /**
   * The duration of the input audio.
   */
  duration: number;

  /**
   * The transcribed text.
   */
  text: string;

  /**
   * Extracted words and their corresponding timestamps.
   */
  words?: VerboseJsonWord[];

  /**
   * Segments of the transcribed text and their corresponding details.
   */
  segments?: VerboseJsonSegment[];
}

Conclusion

These enhancements aim to make the SDK more reflective of the API's capabilities and more user-friendly. I'm open to feedback on these changes, especially regarding the proposed shift from string to enum for the "language" attribute and the adjustment of the "duration" attribute's type.

This pull request resolves issue #702.

Thank you for considering these improvements!

Best,
Wellington Rogati

@wrogati wrogati requested a review from a team as a code owner March 24, 2024 16:26
@rattrayalex
Copy link
Collaborator

Thanks for the PR, but as I think we've shared elsewhere, we want to fix this with overloads, not a union type. The change proposed here would be a breaking type change for ~all current users.

We do plan to do this, it'll just be bit longer. We really are sorry for the inconvenience.

@wrogati
Copy link
Author

wrogati commented Apr 7, 2024

Hi, Alex @rattrayalex!

Thank you for your response and guidance on the issue. I appreciate the direction you've provided on how we should address this problem. I am more than willing to create a new pull request to resolve the issue as you've suggested.

Before proceeding, I just want to confirm: would this be helpful for the project? I an eager to contribute effectively.

@rattrayalex
Copy link
Collaborator

Thanks, I really appreciate that! I think our team is still excited to solve this at the codegen level, which would make a handwritten solution here obsolete. If we don't get that done within 2-3 weeks, do ping again!

@wrogati
Copy link
Author

wrogati commented Apr 7, 2024

Thanks for the update! I understand the plan and am happy to help if needed. I'll check back in a few weeks if there's no update. Wishing the team the best with the codegen solution!

@rattrayalex
Copy link
Collaborator

@wrogati sorry we haven't been able to fix this with codegen yet. Would you like to take another crack at doing this with overloads instead of a union?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants