Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat 4155 Text To Speech in Python #172

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
85 changes: 85 additions & 0 deletions python/text-to-speech/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
# 🗣️ Text To Speech with Google, Azure and AWS API

A Python cloud function for text to speech synthesis using [Google](https://cloud.google.com/text-to-speech), [Azure](https://azure.microsoft.com/en-us/products/ai-services/text-to-speech) and [AWS](https://docs.aws.amazon.com/polly/latest/dg/API_SynthesizeSpeech.html).

### Supported Providers and Language Codes
| Providers | Language Code (BCP-47) |
| ----------- | ----------- |
| Google |[Google Language Code](https://cloud.google.com/text-to-speech/docs/voices) |
| Azure |[Azure Language Code](https://learn.microsoft.com/en-us/azure/ai-services/speech-service/language-support?tabs=stt) |
| AWS |[AWS Language Code](https://docs.aws.amazon.com/polly/latest/dg/API_SynthesizeSpeech.html) |

### Example Input:
```json
{
"provider":"<YOUR_PROVIDER_HERE>",
"language":"<YOUR_LANGUAGE_CODE>",
"text":"Hello world!"
}
```
### Example output:
```json
{
"success":true,
"audio_bytes":"iVBORw0KGgoAAAANSUhE...o6Ie+UAAAAASU5CYII="
}
```
### Example error output:
```json
{
"success":false,
"error":"Missing API_KEY"
}
```

## 📝 Environment Variables
List of environment variables used by this cloud function:
- **API_KEY** - Supported with Google, Azure, and AWS.
- **PROJECT_ID** - Supported with Google.
- **SECRET_API_KEY** - Supported with AWS.

| **Google**| **AWS** | **Azure** |
| -------- | -------- | -------- |
|API_KEY | API_KEY | API_KEY
|PROJECT_ID |SECRET_API_KEY|


## 🚀 Deployment

1. Clone this repository, and enter this function folder:

```bash
git clone https://github.com/open-runtimes/examples.git && cd examples
cd python/text-to-speech
```

2. Enter this function folder and build the code:
```bash
docker run --rm --interactive --tty --volume $PWD:/usr/code openruntimes/python:v2-3.10 sh /usr/local/src/build.sh
```
As a result, a `code.tar.gz` file will be generated.

3. Start the Open Runtime:
```bash
docker run -p 3000:3000 -e INTERNAL_RUNTIME_KEY=secret-key -e INTERNAL_RUNTIME_ENTRYPOINT=main.py --rm --interactive --tty --volume $PWD/code.tar.gz:/tmp/code.tar.gz:ro openruntimes/python:v2-3.10 sh /usr/local/src/start.sh
```

> Make sure to replace `YOUR_API_KEY` with your key.
Your function is now listening on port `3000`, and you can execute it by sending `POST` request with appropriate authorization headers. To learn more about runtime, you can visit Python runtime [README](https://github.com/open-runtimes/open-runtimes/tree/main/openruntimes/python:v2-3.10).

4. Run the cURL function to send request.
>Google Curl Example (Supports only API_KEY and PROJECT_ID in Environment Variables)
```bash
curl http:https://localhost:3000/ -H "X-Internal-Challenge: secret-key" -H "Content-Type: application/json" -d '{"payload": {"provider": "google", "language": "en-US", "text": "Hello World!"}, "variables": {"API_KEY": "<YOUR_API_KEY>", "PROJECT_ID": "<YOUR_PROJECT_ID>"}}'
```
>Azure Curl Example (Supports API_KEY in Environment Variables)
```bash
curl http:https://localhost:3000/ -H "X-Internal-Challenge: secret-key" -H "Content-Type: application/json" -d '{"payload": {"provider": "azure", "language":"en-US", "text": "Hello World!"}, "variables": {"API_KEY": "<YOUR_API_KEY>"}}'
```
>AWS Curl Example (Supports API_KEY and SECRET_API_KEY in Environment Variables)
```bash
curl http:https://localhost:3000/ -H "X-Internal-Challenge: secret-key" -H "Content-Type: application/json" -d '{"payload": {"provider": "aws", "language":"en-US", "text":"Hello World!"}, "variables": {"API_KEY": "<YOUR_API_KEY>", "SECRET_API_KEY": "<YOUR_SECRET_API_KEY>"}}'
```
## 📝 Notes
- This function is designed for use with Appwrite Cloud Functions. You can learn more about it in [Appwrite docs](https://appwrite.io/docs/functions).
- This example is compatible with Python 3.10. Other versions may work but are not guaranteed to work as they haven't been tested.
294 changes: 294 additions & 0 deletions python/text-to-speech/main.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,294 @@
"""Synthesize text to speech using Google, Azure and AWS API."""
# Standard library
import abc
import base64

# Third party
import boto3
import requests
from google.cloud import texttospeech


class TextToSpeech():
"""Base class for Text to Speech."""

def __init__(self, req: requests) -> None:
"""Initialize class method."""
self.validate_request(req)

@abc.abstractmethod
def validate_request(self, req: requests) -> None:
"""Abstract validate request method for providers."""

@abc.abstractmethod
def speech(self, text: str, language: str) -> bytes:
"""Abstract speech method for providers."""


class Google(TextToSpeech):
"""Represent the implementation of Google text to speech."""

def validate_request(self, req: requests) -> None:
"""
Validate the request data for Google text to speech.

Input:
req (request): The request provided by the user.

Raises:
ValueError: If any required value is missing or invalid.
"""
if not req.variables.get("API_KEY"):
raise ValueError("Missing API_KEY.")
if not req.variables.get("PROJECT_ID"):
raise ValueError("Missing PROJECT_ID.")
self.api_key = req.variables.get("API_KEY")
self.project_id = req.variables.get("PROJECT_ID")

def speech(self, text: str, language: str) -> bytes:
"""
Convert the given text into speech with the Google text to speech API.

Input:
text: The text to be converted into speech.
language: The language code (BCP-47 format).

Returns:
bytes: The synthezied speech in bytes.
"""
# Instantiate a client.
client = texttospeech.TextToSpeechClient(
client_options={
"api_key": self.api_key,
"quota_project_id": self.project_id,
}
)
# Set the text input to be synthesized.
synthesis_input = texttospeech.SynthesisInput(text=text)
# Build the voice request, select the language code ("en-US")
# and the ssml voice gender is neutral.
voice = texttospeech.VoiceSelectionParams(
language_code=language,
ssml_gender=texttospeech.SsmlVoiceGender.NEUTRAL,
)
# Select the type of audio file you want returned.
audio_config = texttospeech.AudioConfig(
audio_encoding=texttospeech.AudioEncoding.MP3,
)
# Perform the text-to-speech request on the text input
# with the selected voice parameters and audio file type.
response = client.synthesize_speech(
input=synthesis_input,
voice=voice,
audio_config=audio_config,
)
return response.audio_content


class Azure(TextToSpeech):
"""Represent the implementation of Azure text to speech."""

VOICE = "en-US-ChristopherNeural"
GENDER = "Male"
REGION = "westus"
FETCH_TOKEN_URL = (
"https://westus.api.cognitive.microsoft.com/sts/v1.0/issuetoken"
)

def validate_request(self, req: requests) -> None:
"""
Validate the request data for Azure text to speech.

Input:
req (request): The request provided by the user.
Raises:
ValueError: If any required value is missing or invalid.
"""
if not req.variables.get("API_KEY"):
raise ValueError("Missing API_KEY.")
self.api_key = req.variables.get("API_KEY")

def get_token(self, subscription_key: str) -> str:
"""Return an Azure token for a given subscription key."""
headers = {
"Ocp-Apim-Subscription-Key": subscription_key
}
# Send request with subscription key.
response = requests.post(
self.FETCH_TOKEN_URL,
headers=headers,
timeout=10,
)
# Grab access token valid for 10 minutes.
response.raise_for_status()
return response.text

def speech(self, text: str, language: str) -> bytes:
"""
Convert the given text into speech with the Google text to speech API.

Input:
text: The text to be converted into speech.
language: The language code (BCP-47 format).

Returns:
bytes: The synthezied speech in bytes.
"""
# Endpoint for cognitive services speech api
url = (
f"https://{self.REGION}.tts."
"speech.microsoft.com/cognitiveservices/v1"
)
# Headers and auth for request.
headers_azure = {
"Content-type": "application/ssml+xml",
"Authorization": "Bearer " + self.get_token(self.api_key),
"X-Microsoft-OutputFormat": "audio-16khz-32kbitrate-mono-mp3",
}
data_azure = (
f"<speak version='1.0' xml:lang='{language}'><voice "
f"xml:lang='{language}' xml:gender='{self.GENDER}' "
f"name='{self.VOICE}'>{text}</voice></speak>"
)
response = requests.request(
"POST",
url,
headers=headers_azure,
data=data_azure,
timeout=10,
)
response.raise_for_status()
return response.content


class AWS(TextToSpeech):
"""Represent the implementation of AWS text to speech. """

VOICE_ID = "Joanna"
REGION = "us-west-2"

def validate_request(self, req: requests) -> None:
"""
Validate the request data for AWS text to speech.

Input:
req (request): The request provided by the user.
Raises:
ValueError: If any required value is missing or invalid.
"""
if not req.variables.get("API_KEY"):
raise ValueError("Missing API_KEY.")
if not req.variables.get("SECRET_API_KEY"):
raise ValueError("Missing SECRET_API_KEY.")
self.api_key = req.variables.get("API_KEY")
self.secret_api_key = req.variables.get("SECRET_API_KEY")

def speech(self, text: str, language: str) -> bytes:
"""
Converts the given text into speech with the AWS text to speech API.

Input:
text: The text to be converted into speech.
language: The language code (BCP-47 format).

Returns:
bytes: The synthezied speech in bytes.
"""
# Call polly client using boto3.session.
polly_client = boto3.Session(
aws_access_key_id=self.api_key,
aws_secret_access_key=self.secret_api_key,
region_name=self.REGION,
).client("polly")

# Get response from polly client.
response = polly_client.synthesize_speech(
VoiceId=AWS.VOICE_ID,
OutputFormat="mp3",
Text=text,
LanguageCode=language,
)
return response["AudioStream"].read()


def validate_common(req: requests) -> tuple[str, str, str]:
"""
Validate common fields in request.

Input:
req (request): The request provided by the user.

Returns:
(tuple): A tuple containing the text and language from the request.

Raises:
ValueError: If any of the common fields (provider, text, language)
are missing in the request payload.
"""
# Check if the payload is empty.
if not req.payload:
raise ValueError("Missing Payload.")

# Check if variables is empty.
if not req.variables:
raise ValueError("Missing Variables.")

# Check if provider is empty.
if not req.payload.get("provider"):
raise ValueError("Missing Provider.")

# Check if text is empty.
if not req.payload.get("text"):
raise ValueError("Missing Text.")

# Check if language is empty.
if not req.payload.get("language"):
raise ValueError("Missing Language.")

# Return the text and langage.
return (
req.payload.get("provider").lower(),
req.payload.get("text"),
req.payload.get("language"),
)


def main(req: requests, res: str) -> str:

"""
Main Function for Text to Speech.

Input:
req(request): The request from the user.
res(json): The response for the user.

Returns:
(json): JSON representing the success value of the text to speech api
containing the synthesized audio in base64 encoded format.
"""
try:
provider, text, language = validate_common(req)
if provider == "google":
provider_class = Google(req)
elif provider == "azure":
provider_class = Azure(req)
elif provider == "aws":
provider_class = AWS(req)
else:
raise ValueError("Invalid Provider.")
except ValueError as value_error:
return res.json({
"success": False,
"error": str(value_error),
})
try:
audio_bytes = provider_class.speech(text, language)
except Exception as error:
return res.json({
"success": False,
"error": f"{type(error).__name__}: {error}",
})
return res.json({
"success": True,
"audio_bytes": base64.b64encode(audio_bytes).decode(),
})
5 changes: 5 additions & 0 deletions python/text-to-speech/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
boto3==1.28.9
botocore==1.31.9
google-cloud-texttospeech==2.14.1
parameterized==0.9.0
requests==2.31.0
1 change: 1 addition & 0 deletions python/text-to-speech/results.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
//NExAASoAEkAUAAAXwGQ/p/gDhGY/4fADM/x/ADAzMfwfAM6/o/AyB+Y/N4A7pkc+A7AyOeH/AMwZD/0+A4P8/4DgDo/4fADo/w/gBgZ4f4/AR63R+AhfA81AUf47wD//NExAgUiyowAZNoAIHidDXtgtAXBH/EmB2CZkr/4WsZZff/+tOHPLw8P/8S8Yc3RN3Hp//+XEEXQNzM3////qNHMzcwLiCac0/////zM3bWncwNC+boKhAsSKDAV90A//NExAgS4cKcAYNQAM+vABFdTDxmRCqimDwaEZaIkBAUgNs/NLnbGEhMNPzrVIFPvJm/ONfYwkE+Ri5l/+Y11WLnAAXV+n6HjBQmc//+tLaagQESJOt3hyTZiQnjMvw6//NExA8TMWKoAc9IAFYEPV3dawoSzDBkasnJWCooQpTjVuz/P9/zz/3/fWQ/j3LQPH1nZe+++0EQcPlKrkoFWGBCMftpbLOTvFRagKXtjORQvVLmEjQcfnk70D7pU44b//NExBUWuTagAMvMcDtwm6jSR/sJNlQTAxWVzG+QdAJrAomaWUlU/yzNX+T3z//9m1medMBsUYkgeGBEYgFAbSFGO0OOAuQiizvfUOD0uoe6fx3u7JlQ3nZYLMs3UOao//NExA0VgWawAMYMlKw4YVjA12JJyT0fBI0uKzxl83MlCXgQieqwO2nyudv6pLP4Um/zt4973h+2fx+9/T03XRCSZPHJp1AIMjsoo0EPtWz/2F3AQwrvLoMXLesIVzrT//NExAoU8W60AMYQlCR77CH4FFF5QIEOLk6ALq4t1JeV0sKALV+zDpq9luT6/3VHl9Bz/sfzfp8L9+8XBrYdg8DUkB5lUJpkSKowQzVc3A1nctiP/xwx39CitfwVQwyz//NExAkSEVq8AMPMlGLWpsVa02Kdgz46LEOgRyRwm07wNAY0rsvMVTqaNSaNaBLqk1fdtfX/////d23VmoGgthZ1zc1CLJHC5Iwd+lXFogOmaGSeL2E8uLOd1kyECQp+//NExBMRIVK8AHvMlQ+zheivslKjEZU+DDLC2nWswY8a943vjWr43vb43b//vrM9MTpIBgqBA2uf6V1A6h+l3xIBQ8u7JU/BYnzwQGKPgGCQmpwHEcgkLWpzrH6vjRP4//NExCESeWa4AHvKlIeQc2jrfKlxfzTyRZtbrjOldd1/qlHFhpzCYsLsGi1GMPMokZxSeu/S3QHENVuJbVvNaxfC+qgvg1lYykKqXQlx/F4LkujlMU41Cca2Jqc4mA8q//NExCoQqS64AHvOcXi00oapjnJtRf/3Z0UdID4lF2jRXMIZNY3dcKxY4X0aFLncnn0n3LFmGErC9H2iDBLYrRNR/k9KxHHyrTdblCaBtgNCwYDxoh5Io7nL6O/+iW9R//NExDoSESasAMPOcKoPBwiJnnR4YDa///9iVbijphYgkTSaw1ltIChmtbtuzeBsrOSMQDMEwSHw4TiQwJKIqCMTAyENltitougYm95rITet3eZC3DO8SlmflZhSkg+c//NExEQSKVqkAMMElP9P//yq0quA2ohqww/Sw/UlkPxyOUTAkojm5Hgotq1QQFREOmA2uC/BAyCQPl0IJhsL6jyfheXFO5y8v7r7J02CGmoQLHQ2FHtO0////JC9kpDN//NExE4SqRakAMPScCRKAx25MZxaO1owbRoARBXA0NBUDMCC2kWrKyUXIVSDBKAoH4FC70CiFfHJt1BiN05fARmQVcPdKkv////91iPR/qXoEElSkhA0OySG4jFK0yWH//NExFYRYPagAMJScMyF0JMIDZKabcwpclYV40vrriFJiUA6fKsmuIzlQulz27tnbT7qFaOoHLmUBUjQKX////zPbSrEypUxIgujlFb0Tuy9/C+lWcqAYWnLdWNI/kta//NExGMRmQqcAMJYcLnXtaUtjdQNLSbQWGdFmSN5azCHw74RtZxtSY4yJlAYIgdH////Q7///0V+DkRhYkxqkghyLSmH5lyn/hxaroW4NooZzz1+HbM/s8/8bV+1eCpd//NExG8RqRKYAMPScOeDQBSwJiwJAqPhVEySSHCAUpszVSn2xcMjCRUBHTVn////pvX29P+UPPDM4HNw/TEZVLxQGpgXdRNjmgcfaU02He+SJPJVIcJhz54mlxax8Lmp//NExHsUQR6QAVhIACczCuCXAKIZYXQTwE/AgKzIyWcF8DOBGi6J6MMLQXygkPUexk7VLMiEamwww9lj2JUpGxK0tfmrLrSqrQuzf/91oqN0klHupa0Eta//1MnP9kTY//NExH0hEqJ4AZpoAdb/bIFpG9ZVgkPqlevVgL5Om9klcaBqFjLNI5RxmxKalnDHdnn/tMvtZtMc5Nevu7ekBeYHYRUIxFqupOqVicgAsbol8L5rXO29ad0wct7aU91A//NExEsbMWoUAdhgAGtcIFUxp9qDKHvUTbKkhEQJE6mXVi1Jo8oDXLJJUKKYp61BEa1KIwLTSqjjMMxqmrTCrDVCpUKoHway4xry8rjHfHNj8lKcSElSCwqRKCoVPQoc//NExDEYwUYAAMJScJaWBIEg0TJiklrylcY+TSjxUioGn15INPLA1iIeCrhMDT+uIpGdeInnmDXM5Iqtj6gaCYhmCuKPnnWWtyq4hGg+XJD7LTKIhLHF2NyeSiqtdXJV//NExCESkbWMAGJElCTXhrOxioh2//oq/KiqjtuUyKn/3KGBgwTMmQEK/xUW/+Kijf6haoWFVUxBTTXYUNNntBQ28i7Tc/GCN14Jo3UyUTcUPONmtTU0xF9lFfBVi5jj//NExCkAAANIAAAAAF5UP5IJoGgaCohx+SxSiw7F3fQXPuixe0kFAeGFnkGIji9wn/yWLDv6Abn5/oiIiIn//T4iJXcO9dw4GLeaF13d3Qrz+Cw8PH+8/4AH8AQnAABH//NExHwAAANIAAAAAIAYe072jOCP8zxCBSiPGaUSFnSoXydOnhkDLKzVx9ejtbWzWtGjkgVKZkCDo6C/j/4q5LBRR++nDZYbjvm2VZFRfG7hdWSy/Oacyi/v//lRJuX+//NExMwH4CQAAPe8AEuqxZZf4+F5V3m/Kb/6/GoAqLBUhFduI/76O27EYo8cL+NGlM7PUnCRQo8y4djSij4dpOEgQEfDtqiL/9nKinb5TBlI7OVP/+YoYGDIfULC4rFR//NExP8Y6f3sAHoGmVEj//rFcVFjVbP4qKcWTEFNRTMuMTAwqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq//NExO4V0LIAAHpMTaqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq//NExOkUQbmYAMGElKqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
Loading