[AI] Parler Text to Speech #219

pschroedl · 2024-10-06T09:14:09Z

This pull request introduces an implementation of the Text to Speech pipeline within the AI worker.

This implements the basic functionality of the Parler Text-to-Speech model (https://huggingface.co/parler-tts/parler-tts-large-v1) by supplying text_input and description fields - the description is used to ‘steer’ the qualities of the voice synthesized by the model.

This PR follows the method of adding pipelines dependencies without breaking existing implementations present in ai-runner by using an additional dockerfile definition. Example usage for local development and testing:

cd ai-worker/runner
docker build -t livepeer/ai-runner:base .
docker build -f docker/Dockerfile.text_to_speech -t livepeer/ai-runner:text-to-speech .

docker run --name text-to-speech -e DEEPCACHE=True -e PIPELINE=text-to-speech -e MODEL_ID=parler-tts/parler-tts-large-v1 -e HUGGINGFACE_TOKEN={token} --gpus all -p 47906:8000 -v ~/.lpData/models:/models livepeer/ai-runner:text-to-speech

pschroedl · 2024-10-10T00:01:21Z

ai-worker bindings, container image, and go-livepeer gateway + orch have been rebuilt and can be validated/tested via : https://colab.research.google.com/drive/12UN65fV6q44IyMRL2tyKFal8OMCyyRMN

runner/docker/Dockerfile.text_to_speech

This commit fixes a rebase conflict that was introduced during the last rebase.

This commit fixes several linting errors found in the codebase.

This commit simplifies the text-to-speech (T2S) codebase by ensuring the audio file is passed as an in-memory buffer. It also includes several other code cleanups for better readability and maintainability.

This commit applies isort and black linting suggestions to the T2S codebase.

This commit adds missing error types to worker T2S handling function.

This commit enables [flash attention 2](https://github.com/Dao-AILab/flash-attention) in the text to speech pipeline to speedup performance.

rickstaa

@pschroedl approved. Let's rebase the go-livepeer version and check if we can get rid of the base64 encoding between the runner and worker 👍🏻.

runner/docker/Dockerfile.text_to_speech

rickstaa

@pschroedl please merge after last comment is resolved 👍🏻.

This commit changes a comment in the TTS Docker so that it is clear what it does.

pschroedl mentioned this pull request Oct 6, 2024

[AI] Parler Text to Speech livepeer/go-livepeer#3199

Merged

5 tasks

pschroedl marked this pull request as ready for review October 7, 2024 07:12

pschroedl requested a review from rickstaa as a code owner October 7, 2024 07:12

pschroedl removed the request for review from rickstaa October 7, 2024 07:12

pschroedl force-pushed the parler_tts branch from 7ed2427 to ebab897 Compare October 21, 2024 18:23

rickstaa reviewed Oct 25, 2024

View reviewed changes

runner/docker/Dockerfile.text_to_speech Outdated Show resolved Hide resolved

rickstaa force-pushed the parler_tts branch 9 times, most recently from 6bf7bc7 to 5abec52 Compare October 28, 2024 10:02

pschroedl and others added 9 commits October 28, 2024 14:39

add parler_tts pipeline and Dockerfile.tts

081881f

fix pipeline naming in borrowContainer

2c5906b

add github workflow task for image, cleanup

f4aea2e

revert unintended changes

d0e4880

use model_id param, remove unecessary clearing of memory, whitespace

f425e73

add missing EOF newlines

ccf3115

use app.routes.util.HTTPError, remove unused model DL, whitespace

bb517de

fix(worker): fix rebase conflicts

49a5fb4

This commit fixes a rebase conflict that was introduced during the last rebase.

feat: fix linting errors

b98bde7

This commit fixes several linting errors found in the codebase.

rickstaa force-pushed the parler_tts branch from 5abec52 to 05ece4e Compare October 28, 2024 16:26

refactor(runner): simplify T2S codebase

263590c

This commit simplifies the text-to-speech (T2S) codebase by ensuring the audio file is passed as an in-memory buffer. It also includes several other code cleanups for better readability and maintainability.

rickstaa force-pushed the parler_tts branch from 05ece4e to 263590c Compare October 28, 2024 16:33

refactor: apply linting suggestions

26c125b

This commit applies isort and black linting suggestions to the T2S codebase.

rickstaa force-pushed the parler_tts branch from bef9b54 to 26c125b Compare October 28, 2024 17:13

rickstaa added 2 commits October 28, 2024 18:18

refactor(worker): add missing T2S error types

c590208

This commit adds missing error types to worker T2S handling function.

feat(runner): enable flash attention in T2S pipeline

5e18d41

This commit enables [flash attention 2](https://github.com/Dao-AILab/flash-attention) in the text to speech pipeline to speedup performance.

rickstaa approved these changes Oct 28, 2024

View reviewed changes

pschroedl and others added 6 commits October 29, 2024 12:31

Merge branch 'main' into parler_tts

0d7c216

add decode audio base64 function

999339f

update summary for docs, re-generate/codegen

cc1c955

update dockerfile, pin parler commit, copy app/

8af0010

Merge branch 'main' into parler_tts

54b27c2

regenerate runner.gen.go

d9934e9

rickstaa reviewed Oct 31, 2024

View reviewed changes

runner/docker/Dockerfile.text_to_speech Show resolved Hide resolved

rickstaa approved these changes Oct 31, 2024

View reviewed changes

refactor(runner): add small comment change

afcc56e

This commit changes a comment in the TTS Docker so that it is clear what it does.

<

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AI] Parler Text to Speech #219

[AI] Parler Text to Speech #219

pschroedl commented Oct 6, 2024

pschroedl commented Oct 10, 2024

rickstaa left a comment

rickstaa left a comment

[AI] Parler Text to Speech #219

[AI] Parler Text to Speech #219

Conversation

pschroedl commented Oct 6, 2024

pschroedl commented Oct 10, 2024

rickstaa left a comment

Choose a reason for hiding this comment

rickstaa left a comment

Choose a reason for hiding this comment