Add Verbose Transcription Object Return Type to Audio Transcription Endpoint #735
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Hello 👋,
I've made some enhancements to the OpenAI Node SDK, specifically around the audio transcription endpoint, which I believe will make it more useful and align it closer to the API's capabilities as documented. This pull request is in response to the need identified in issue #702.
What's Changed
Based on the API documentation for the audio transcription endpoint, there are two potential return types: a "transcription object" or a "verbose transcription object." The latter was not previously represented in the SDK. To address this, I've added the "verbose transcription object" as a possible return type for the
create
function in the transcription service. The updated function signature now looks like this:Points of Discussion
There are a couple of observations worth discussing, see the link API documentation for verbose_json transcription object:
Language Attribute Type: The API documentation lists the "language" attribute of the "verbose transcription object" as a "string". However, considering consistency and ease of use, I propose using an enum based on the ISO-639-1 standard, which is already used in the request body of this endpoint. This change could enhance the SDK's type safety and usability.
Duration Attribute Type: The "duration" attribute is documented as a "string," but in practice, the endpoint always returns it as a "number." My code reflects this reality, defining "duration" as a "number" in the interface. This adjustment deviates from the documentation but aligns with the actual behavior observed from the API, thus improving the SDK's accuracy.
Conclusion
These enhancements aim to make the SDK more reflective of the API's capabilities and more user-friendly. I'm open to feedback on these changes, especially regarding the proposed shift from string to enum for the "language" attribute and the adjustment of the "duration" attribute's type.
This pull request resolves issue #702.
Thank you for considering these improvements!
Best,
Wellington Rogati