Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AI] Parler Text to Speech #219

Merged
merged 20 commits into from
Oct 31, 2024
Merged

[AI] Parler Text to Speech #219

merged 20 commits into from
Oct 31, 2024

Conversation

pschroedl
Copy link
Collaborator

This pull request introduces an implementation of the Text to Speech pipeline within the AI worker.

This implements the basic functionality of the Parler Text-to-Speech model (https://huggingface.co/parler-tts/parler-tts-large-v1) by supplying text_input and description fields - the description is used to ‘steer’ the qualities of the voice synthesized by the model.

This PR follows the method of adding pipelines dependencies without breaking existing implementations present in ai-runner by using an additional dockerfile definition. Example usage for local development and testing:

cd ai-worker/runner
docker build -t livepeer/ai-runner:base .
docker build -f docker/Dockerfile.text_to_speech -t livepeer/ai-runner:text-to-speech .

docker run --name text-to-speech -e DEEPCACHE=True -e PIPELINE=text-to-speech -e MODEL_ID=parler-tts/parler-tts-large-v1 -e HUGGINGFACE_TOKEN={token} --gpus all -p 47906:8000 -v ~/.lpData/models:/models livepeer/ai-runner:text-to-speech

@pschroedl pschroedl marked this pull request as ready for review October 7, 2024 07:12
@pschroedl pschroedl removed the request for review from rickstaa October 7, 2024 07:12
@pschroedl
Copy link
Collaborator Author

ai-worker bindings, container image, and go-livepeer gateway + orch have been rebuilt and can be validated/tested via : https://colab.research.google.com/drive/12UN65fV6q44IyMRL2tyKFal8OMCyyRMN

@rickstaa rickstaa force-pushed the parler_tts branch 9 times, most recently from 6bf7bc7 to 5abec52 Compare October 28, 2024 10:02
This commit simplifies the text-to-speech (T2S) codebase by ensuring the audio
file is passed as an in-memory buffer. It also includes several other code
cleanups for better readability and maintainability.
This commit applies isort and black linting suggestions to the T2S
codebase.
This commit adds missing error types to worker T2S handling function.
This commit enables [flash attention 2](https://github.com/Dao-AILab/flash-attention)
in the text to speech pipeline to speedup performance.
Copy link
Member

@rickstaa rickstaa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pschroedl approved. Let's rebase the go-livepeer version and check if we can get rid of the base64 encoding between the runner and worker 👍🏻.

Copy link
Member

@rickstaa rickstaa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pschroedl please merge after last comment is resolved 👍🏻.

This commit changes a comment in the TTS Docker so that it is clear what
it does.
@pschroedl <