Speech-to-text engine developed by Kaldi and Vosk projects. This module provides both speech-to-text and voice trigger components.
Here is a working example of Android voice assistant powered by this module.
Kaldi engine can work in both offline and online modes.
In offline mode Kaldi utilizes a local model that is restricted due its tiny size and can produce less accurate results.
- Download model for your language from here
- Unzip a downloaded package to assets folder of your Android project
- Add dependencies to your module's build.gradle:
repositories {
mavenCentral()
}
dependencies {
implementation("com.just-ai.aimybox:core:${version}")
implementation("com.just-ai.aimybox:dummy-api:${version}") // or any other Dialog API
implementation("com.just-ai.aimybox:kaldi-speechkit:${version}")
}
- Provide Kaldi Speechkit component into Aimybox configuration object:
fun createAimybox(context: Context): Aimybox {
val assets = KaldiAssets.fromApkAssets(this, "model")
val speechToText = KaldiSpeechToText(assets)
val textToSpeech = GooglePlatformTextToSpeech(context, Locale.getDefault()) // or any other TTS
val dialogApi = DummyDialogApi() // or any other Dialog API
return Aimybox(Config.create(speechToText, textToSpeech, dialogApi))
}
In online mode Kaldi connects to the remote hosting with running Kaldi websocket server. In this case the size of the model hasn't to be tiny, that is why it can produce more accurate results.
You don't have to download and serve any model data in this case.
- Run Kaldi server as described here
- Add dependencies to your module's build.gradle as described above
- Provide Kaldi Speechkit component into Aimybox configuration object:
fun createAimybox(context: Context): Aimybox {
val textToSpeech = GooglePlatformTextToSpeech(context, Locale.getDefault()) // or any other TTS
val speechToText = KaldiWebsocketSpeechToText("your Kaldi server URL here") // or use wss:https://api.alphacephei.com/asr/en/ for testing purposes
val dialogApi = DummyDialogApi() // or any other Dialog API
return Aimybox(Config.create(speechToText, textToSpeech, dialogApi))
}
To use this engine as a word trigger you have to download a model as described here and then initialize a voice trigger:
fun createAimybox(context: Context): Aimybox {
val assets = KaldiAssets.fromApkAssets(this, "model")
val voiceTrigger = KaldiVoiceTrigger(assets, listOf("listen", "hey"))
val textToSpeech = GooglePlatformTextToSpeech(context, Locale.getDefault()) // or any other TTS
val speechToText = GooglePlatformSpeechToText(context, Locale.getDefault()) // or any other STT
val dialogApi = DummyDialogApi() // or any other Dialog API
return Aimybox(Config.create(speechToText, textToSpeech, dialogApi) {
this.voiceTrigger = voiceTrigger
})
}
There is a full Aimybox documentation available here