This repo is a framework for a voice command Discord bot. Voice and text commands follow the command pattern, and adding new commands is simple.
This project is built and run using Docker. There are scripts in the repo to build the docker image and run a new container (buildandrundocker.ps1). Before building - go through Configuration to set up your .env file and Cloud Services for any cloud services you're using. Look at the top of the script for configuration parameters. The first time building will take a while.
The following parameters are available:
Parameter Name | Description |
---|---|
CONTAINER_NAME | The name of the Docker container that is created |
RECORDINGS_PATH | OPTIONAL: Absolute path to the recordings folder in the repo. This folder holds the {user_id}.wav command files that are processed for commands. If this variable is set, the recordings folder will be mounted to the Docker host |
You need to create a .env file at the root directory of the bot and set the values marked as 'Required'. If .env values are set to use cloud services, those services must be set up correctly. See Cloud Services. The .env file holds the configuration values for the bot. The following table describes each value:
Rename the '.env.example' file to '.env' to get started.
PROPERTY_NAME | Description | Required? | Default Value |
---|---|---|---|
BOT_TOKEN | The bot token for the Discord bot. This is used so that the Discord servers knows which bot account this bot is logging in to. You can get this by creating a bot at https://discord.com/developers/applications. | Required | |
SPEECH_TO_TEXT_METHOD | What method to use for speech to text (some require API keys). Possible values are: (LOCAL/GOOGLE/IBM_WATSON) | Optional | LOCAL |
TEXT_TO_SPEECH_METHOD | What method to use for text to speech (some require API keys). Possible values are: (LOCAL/GOOGLE) | Optional | LOCAL |
TEXT_TO_SPEECH_VOLUME_GAIN_DB | Optional parameter for Google text-to-speech. Default value is 0, range is (-10, 10). 6dB is approximately twice as loud as 0. -6dB is approximately half as loud as 0. | Optional | 0 |
WAKE_WORD_SENSITIVITY | Value from 0.0-1.0 that determines how sensitive the wake work detection is. 0.0 is not sensitive at all and 1.0 is very sensitive. | Optional | 0.5 |
IBM_WATSON_SERVICE_URL | Service URL for the IBM Speech to Text service. Get from https://cloud.ibm.com/apidocs/assistant/assistant-v2?code=node#endpoint-cloud depending on where you set up your service. | Required if SPEECH_TO_TEXT_METHOD is set to IBM_WATSON | |
MUSIC_CHANNEL_NAME | Name of the music channel for the server. Used to post messages about playlists. | Optional | music |
BOT_CHANNEL_NAME | Name of the bot channel for the discord. Used to post generic bot messages (e.g. Processing Command: ${command_text} ) |
Optional | bot |
BIRTHDAY_CHANNEL_NAME | Name of the birthday channel for the server. Used to post birthday wishes | Optional | general |
FREE_GAMES_CHANNEL_NAME | Name of the channel to post free games updates to (used with the free games command which will periodically scan for free games and post results to this channel) | Optional | free-games |
LEAGUE_META_CHANNEL_NAME | Name of the channel to post daily updates for the current League of Legends meta. Used by the LeagueMetaCommand. | Optional | league-meta |
SPOTIFY_CLIENT_ID | Client ID for access to the Spotify API. This is used to find similar songs in the radio command | Required for the radio command | |
SPOTIFY_CLIENT_SECRET | Client Secret for access to the Spotify API. This is used to find similar songs in the radio command | Required for the radio command |
Both TEXT_TO_SPEECH_METHOD and SPEECH_TO_TEXT_METHOD have a 'LOCAL' option. The container for Jarvis contains software for performing speech-to-text and text-to-speech, and selecting the 'LOCAL' option will use these local services.
These options are completely free and do not require API keys.
Depending on the specs of your system where the bot is running, both of these tasks could take a while to complete (on my system, using LOCAL speech-to-text and text-to-speech takes about 4 seconds for speech-to-text and 6 seconds for text-to-speech). Using cloud services is much quicker and more accurate than using the local services.
Another consideration is image size. Using the LOCAL services will pull models that significantly increase the size of the image. If you don't set any service to LOCAL, those models will not be pulled.
I was unable to get the local services installed correctly on a Raspberry Pi. If you're using the cloud services instead, then you should have no issues.
Google cloud services are used for speech-to-text (if GOOGLE is set as the TEXT_TO_SPEECH_METHOD) and text-to-speech (if GOOGLE is set as the TEXT_TO_SPEECH_METHOD). To use those services, you need to create a Google cloud account, create a new project and enable the text-to-speech and speech-to-text APIs. See APIs Dashboard for enabling Google APIs. See Creating an API key
Once the key is created, you need to download the google_key.json file into your keys/ folder in the root of the project - /keys/google_key.json
IBM Watson cloud services are used for speech-to-text (if IBM_WATSON is set as the speech-to-text method). See IBM Watson speech-to-text to create a speech-to-text service instance. After you created the instance, visit the service instance overview page. Click on the 'Manage' tab and find the 'Credentials' section. Click the 'Download' button to get the ibm_credentials.env credentials file.
Place the ibm_credentials.env file in the keys/ folder in the rood of the project - /keys/ibm_credentials.env
Spotify is used to find similar songs for the radio command. This is a free service, and just requires you to create an application in the Spotify developer portal. See Spotify Developer Portal to create an application. Once the application is created, copy the client ID and client secret to the .env file at the root of the project.
To run the bot, run the buildandrundocker.ps1 script in the root directory. This will build the docker image and run it in a new container. After the container is running, the bot should log in to the bot account, and listen for chat messages on any of the joined servers
After joining a voice channel, the bot will listen for a wake word from any user (each user audio stream is separate). The wake word is 'Jarvis' by default. If the wake word is heard on the user's audio stream, the bot will respond with "Yes {USER_NAME}". The bot will then listen for a command on that user audio stream. The command audio will be saved to ./recordings/{userId}.wav. After 2 seconds of silence, the command audio will be processed using speech-to-text. The first word of the command denotes the command type (e.g. 'play').
Some commands have special behavior is the wake word is detected:
- Play command - Stop the song if a song is currently playing
- If a song is stopped, the bot will stop listening for commands (you have to say the wake word again for a command)
- Radio command - Stop the radio if the radio is currently playing
- If the radio is stopped, the bot will stop listening for commands (you have to say the wake word again for a command)
Voice and text commands are handled through the same command handler. This means that any voice command can also be a text command. For example, you can say 'Jarvis. Play music best hits' or simply type ';;play music best hits' to achieve the same result.
Name | Description | Example |
---|---|---|
join | Have the bot join the voice channel of the user who typed the command | ;;join |
leave | Have the bot leave the current voice channel | ;;leave |
play | Stream the audio for a YouTube video to the channel | ;;play music best hits |
radio | Generate and play a radio station based on a song query. The bot will check Spotify for similar songs based on your query, and queue them up in the song queue. The music channel will show the current radio station and what the currently played song is | ;;radio lincoln park in the end |
who, what, when, where, why, how, is, do, was, will, would, can, could, did, should, whose, which, whom, are | Ask a question to the bot. The bot will search Google and speak the snippet to the channel | ;;what is the largest tech company in the world |
freegames | Search for free games by scanning rss feeds for popular game sites. A list of current articles on free games will be posted in the channel that the command was typed in. After 60 seconds, the post is deleted | ;;freegames |
birthday | Register your birthday. The bot will wish you a happy birthday on your birthday. The bot will also wish you a happy birthday if you join the server on your birthday | ;;birthday 12 25 |