Skip to content

Latest commit

 

History

History

kaldi-speechkit

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 

Kaldi Speechkit for Aimybox Android SDK

Speech-to-text engine developed by Kaldi and Vosk projects. This module provides both speech-to-text and voice trigger components.

Example

Here is a working example of Android voice assistant powered by this module.

How to start using

Kaldi engine can work in both offline and online modes.

Offline mode

In offline mode Kaldi utilizes a local model that is restricted due its tiny size and can produce less accurate results.

  1. Download model for your language from here
  2. Unzip a downloaded package to assets folder of your Android project
  3. Add dependencies to your module's build.gradle:
repositories {
    mavenCentral()
}

dependencies {
    implementation("com.just-ai.aimybox:core:${version}")
    implementation("com.just-ai.aimybox:dummy-api:${version}") // or any other Dialog API
    implementation("com.just-ai.aimybox:kaldi-speechkit:${version}")
}
  1. Provide Kaldi Speechkit component into Aimybox configuration object:
fun createAimybox(context: Context): Aimybox {
    val assets = KaldiAssets.fromApkAssets(this, "model")
    val speechToText = KaldiSpeechToText(assets)
    val textToSpeech = GooglePlatformTextToSpeech(context, Locale.getDefault()) // or any other TTS

    val dialogApi = DummyDialogApi() // or any other Dialog API

    return Aimybox(Config.create(speechToText, textToSpeech, dialogApi))
}

Online mode

In online mode Kaldi connects to the remote hosting with running Kaldi websocket server. In this case the size of the model hasn't to be tiny, that is why it can produce more accurate results.

You don't have to download and serve any model data in this case.

  1. Run Kaldi server as described here
  2. Add dependencies to your module's build.gradle as described above
  3. Provide Kaldi Speechkit component into Aimybox configuration object:
fun createAimybox(context: Context): Aimybox {
    val textToSpeech = GooglePlatformTextToSpeech(context, Locale.getDefault()) // or any other TTS
    val speechToText = KaldiWebsocketSpeechToText("your Kaldi server URL here") // or use wss:https://api.alphacephei.com/asr/en/ for testing purposes

    val dialogApi = DummyDialogApi() // or any other Dialog API

    return Aimybox(Config.create(speechToText, textToSpeech, dialogApi))
}

Voice trigger

To use this engine as a word trigger you have to download a model as described here and then initialize a voice trigger:

fun createAimybox(context: Context): Aimybox {
    val assets = KaldiAssets.fromApkAssets(this, "model")
    val voiceTrigger = KaldiVoiceTrigger(assets, listOf("listen", "hey"))    

    val textToSpeech = GooglePlatformTextToSpeech(context, Locale.getDefault()) // or any other TTS
    val speechToText = GooglePlatformSpeechToText(context, Locale.getDefault()) // or any other STT

    val dialogApi = DummyDialogApi() // or any other Dialog API

    return Aimybox(Config.create(speechToText, textToSpeech, dialogApi) {
        this.voiceTrigger = voiceTrigger
    })
}

Documentation

There is a full Aimybox documentation available here