Add InteractMode Component with TTS and STT Functions #134

o-stahl · 2024-05-29T23:10:00Z

Summary

This pull request introduces a new InteractMode component and integrates text-to-speech (TTS) and speech-to-text (STT) functionalities (the latter is not fully implemented in InteractMode). The enhancement by default leverages the Web Speech API and OpenAI's Whisper API to provide improved speech transcription.

Key Changes

InteractMode Component:
- Implemented the InteractMode component to handle speech interactions within the chat application.
- Added functionality to monitor and visualize audio input in real-time.
fetchTTSResponse Function:
- Added fetchTTSResponse function to convert text to speech using the OpenAI API.
- Ensures high-quality audio playback of transcribed text.
fetchSTTResponse Function:
- Added fetchSTTResponse function to transcribe audio to text using the OpenAI Whisper API.
- Utilizes the Web Speech API for initial speech detection and transcription.
- Switches to Whisper API for more accurate transcription when enabled.
Toggle for Enhanced Accuracy:
- Introduced a toggle to switch between Web Speech API and Whisper API for transcription.
- Ensures only relevant speech is transcribed, reducing noise and improving accuracy.

Benefits

Enhanced user experience by enabling multimodal interaction.
Improved usage of OpenAI's endpoints, now also including TTS and STT.
Provides users with accurate and reliable speech-to-text and text-to-speech capabilities.

Notes & future plans

This is the first revision and only implements user speech to message transcription, but it should be perfectly usable in it's current state.

Speech to text on assistant messages when the interact mode is enabled. (40754)
Settings tab for TTS/STT related selections especially whether to use only Web Speech API.
Adding TTS/STT functionalities to the other providers.

Auto Generated Notes (Do Not Change)

… always used. Switched to the `tts-1-hd` models as it seems to be peforming fine. Increase audio speed 5% for a more natural conversation flow. Switch voice to nova just because...

fingerthief · 2024-05-31T21:46:18Z

Really excellent work on this!

I've done some testing and I think this is easily solid enough to go ahead and merge into the main branch.

I made one commit to tweak a few little things:

Added a dynamic check for the highest quality supported audio format for the user's current device. It starts checking with the highest quality format and falls back to the next highest quality if it isn't supported. Rinse and repeat until the highest quality format that is supported is found.
Removed showing the error for no-speech while in interact mode. Otherwise it shows as an error after a bit of silence with no speech.
Increased audio playback speed by 5%
switched to tts-1-hd model as it seems to work fine
- Soon enough this will be user configurable along with speed etc..
Notes

I know the mobile support for interact mode has some wonkiness on my phone at least, I'll be creating an issue for that problem though. I have some notion of an idea for a dynamic noise floor level calculation so our speech detection floor can vary with microphone sensitivity

o-stahl · 2024-06-01T08:54:36Z

switched to tts-1-hd model as it seems to work fine

OpenAI's regular "tts-1" model is faster and 2x cheaper while according to user feedback the quality difference is (or at least was) barely noticeable even with audiophile gear. However as you mentioned as well, model selection will take care of different preferences.

Add InteractMode Component with TTS and STT Functions

aeb5ab6

o-stahl force-pushed the feature/interact-mode-enhancements branch from bac1d77 to aeb5ab6 Compare May 30, 2024 20:19

o-stahl and others added 6 commits May 30, 2024 23:21

Merge remote-tracking branch 'upstream/main'

e3c9327

Move interactMode out of icons for better separation.

1080856

Merge branch 'fingerthief:main' into feature/interact-mode-enhancements

a019d94

Add handleTextStreamEnd in gpt-api-access and TTS for interactmode.

40754d4

Added more checks to see if features are supported.

8a7f166

Added a cascading check so that the highest possible quality audio is…

ccb45a5

… always used. Switched to the `tts-1-hd` models as it seems to be peforming fine. Increase audio speed 5% for a more natural conversation flow. Switch voice to nova just because...

fingerthief merged commit 229cc86 into fingerthief:main May 31, 2024
2 of 3 checks passed

o-stahl deleted the feature/interact-mode-enhancements branch June 1, 2024 08:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add InteractMode Component with TTS and STT Functions #134

Add InteractMode Component with TTS and STT Functions #134

o-stahl commented May 29, 2024 •

edited

Loading

fingerthief commented May 31, 2024

Notes

o-stahl commented Jun 1, 2024 •

edited

Loading

Add InteractMode Component with TTS and STT Functions #134

Add InteractMode Component with TTS and STT Functions #134

Conversation

o-stahl commented May 29, 2024 • edited Loading

Summary

Key Changes

Benefits

Notes & future plans

Auto Generated Notes (Do Not Change)

fingerthief commented May 31, 2024

Notes

o-stahl commented Jun 1, 2024 • edited Loading

o-stahl commented May 29, 2024 •

edited

Loading

o-stahl commented Jun 1, 2024 •

edited

Loading