Add stream=
kwarg to Recognizer.listen
#757
Open
+36
−5
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Support for receiving captured audio one chunk at a time, while continuing to use the wakeword and audio energy detection code.
Notably, Coqui.ai/DeepSpeech (the python STT package) support a streaming interface, which greatly improves interaction latency for continuous listening applications. Even for non-streaming interfaces, this implementation allows for eager encoding (for example converting to numpy buffers, or even precomputing transformer KVs), or just an earlier start to transmission (when using websockets or other chunked transfer mechanisms).
Note: This is a minimal extraction from a larger edit I have in a side project. There, I ended up carving up huge chunks of recognizer to make it a bit more observable (i.e. trigger events based on speech detection start/stop aside from yielding audio, as well as real-time events for audio-energy threshold and detected value). This is a much smaller edit, but I have not vetted it as well. I am in the process of adopting this change directly into a new project leveraging self-hosted whisper over http.