-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AI] Parler Text to Speech #219
Conversation
ai-worker bindings, container image, and go-livepeer gateway + orch have been rebuilt and can be validated/tested via : https://colab.research.google.com/drive/12UN65fV6q44IyMRL2tyKFal8OMCyyRMN |
6bf7bc7
to
5abec52
Compare
This commit fixes a rebase conflict that was introduced during the last rebase.
This commit fixes several linting errors found in the codebase.
This commit simplifies the text-to-speech (T2S) codebase by ensuring the audio file is passed as an in-memory buffer. It also includes several other code cleanups for better readability and maintainability.
This commit applies isort and black linting suggestions to the T2S codebase.
This commit adds missing error types to worker T2S handling function.
This commit enables [flash attention 2](https://github.com/Dao-AILab/flash-attention) in the text to speech pipeline to speedup performance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pschroedl approved. Let's rebase the go-livepeer version and check if we can get rid of the base64 encoding between the runner and worker 👍🏻.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pschroedl please merge after last comment is resolved 👍🏻.
This commit changes a comment in the TTS Docker so that it is clear what it does.
This pull request introduces an implementation of the Text to Speech pipeline within the AI worker.
This implements the basic functionality of the Parler Text-to-Speech model (https://huggingface.co/parler-tts/parler-tts-large-v1) by supplying text_input and description fields - the description is used to ‘steer’ the qualities of the voice synthesized by the model.
This PR follows the method of adding pipelines dependencies without breaking existing implementations present in ai-runner by using an additional dockerfile definition. Example usage for local development and testing: