Hi, I'm the co-founder of https://snips.ai and we are building a 100% on-device Voice AI platform which runs on the Raspberry Pi 3
It is free to use for makers, and we will start open-sourcing the components a few weeks from now.
We are using Rust because it is safe and very fast!
We would love to tell more about what we are building if you have questions.
The whole platform runs on-device which makes it ideal for privacy, cost, and to allow it to run when there is no network
We are available in English, French, German, and soon Japanese and Korean and we are working on other European languages!
We would love to see what you build with our platform to feature it on our website
This sounds very interesting. Being capable of working unconnected is the only way to get true privacy from these devices. Having them open sourced (hopefully 100% though I can understand the reasons for keeping some parts closed) would be even better. I have always been strongly opposed to networked voice assistants because the risk of them being used as spying tools by design is simply too high, And surely no Google, Amazon, Apple etc. assistant will never enter my home, but that one is a different beast. Keep up the good work!
I'm Italian. There are applications in the automotive field for a mostly disconnected device such as this one. If it can deal with a noisy environment like a car that would make it even more interesting.
Do you use Deep Learning, HMM or something else for your voice recognition? If so, how did you make it fast on RPi3? Can I use your tech on RPi2 as well (for controlling a humanoid robot)?
Yes, this is a DL model! For the ASR and the wakeword recognition. We are doing some low-level optimizations
For the Raspberry Pi 2 the model is a bit too large, but we will be able to handle them as hubs: your Raspberry Pi 2 in multiple rooms will detect the wakeword and stream the voice to a "hub" Raspberry Pi 3 which will do the speech recognition
I signed up for an account and plan on trying it out later, and I just want to say: thank you for explaining what your newsletter is, and for making it opt-in, instead of opt-out.
Well, to be fair, Rust's toolchain "speaks" cross-compilation natively. I haven't found any other toolchain with such a well-designed cross-compilation experience.
The piece seems to have been a bit over-simplified for non-technical readers, though.
So you're taking the parameters from something like tensorflow and then exporting them to be executed in Rust ? and that is more efficient than the c++ backend of tensorflow ?.
I used to train my parser (before switching to Tensorflow) using Caffe, dumped the parameters using a small program, and loaded them up in Go arrays slices and applied the network using simple C BLAS operations. This works fine, especially when you are using simpler networks. As a bonus, you don't have the overhead of Tensorflow session runs.
It does become a bit of a drag when you are building more complex networks (e.g. with multiple RNN layers, batch normalization, etc.). In that case there are two straightforward options for Rust. There is a Tensorflow Rust binding against the tensorflow C API [1] with which you can just load and run a frozen Tensorflow graph. This is the approach that I am currently using, though I am running graphs on workstations/servers.
Another option is compiling the graph with Tensorflow's XLA AOT compilation, which compiles the network to C++ classes (that you could bind from Rust).
This only supports a small subset of ops. It is pretty much corresponds to the first option that I mentioned - train with Tensorflow, extract the parameters and provide implemenations of ops in your native language.
I really love the idea of an off-line voice assistant, however most of my tech friends in France and I feel that Snips (which is very visible on the press here) still has to deliver something tangible (). This might be the first time I want to try one of their product though.
they had announced a keyboard app for iOS and Android which did seemingly the same thing as the swipe keyboard.
Their IA assistant seems really interesting, and I would really love to give it a try, but I don't want to invest energy if there is a high risk that the service shuts down soon.
I remember the Tranquilien app [0], which was also developed by Snips. There was a lot of media talking about it, but it has been quickly stopped :(.
Yeah, but that's until some library makes a backwards-incompatible change and the whole things stops building. Sure, it's better than a completely proprietary service, but support and active maintenance are very important too.
The Rust ecosystem and build system, is designed to prevent this. If you're using a library and want to code to keep compiling in 10 years, you can (and should) just lock the dependencies.
Since the libraries are all mirrored on crates.io and, by design, cannot be removed once they are published, you should be good, at least until the day the owners of crates.io stop being able to pay the server costs.
I believe that you can also host local mirrors, but I haven't checked that.
I think that they shifted dramatically from experimenting around `smart cities` to be an AI Startup focusing on building voice assistants.
On Techcrunch [0]
> The team didn’t have a product or business model in mind. It experimented for a while, launching projects around smart cities, such as Tranquilien, a service that predicts how busy your train is going to be, or a smart empty parking spot prediction service.``
Could you expand on why CMUSphinx[0]/Julius[1] not used?
I'm all for using rust for the speed and safety it allows, and then FFI-ing to CMUSphinx (this is exactly what I want to do, given more free time in the future), could you explain why the available open source libraries (CMUsphinx is just one, there's also Julius) couldn't be used?
Have you used Sphinx? Remember in the 90s and early 00's when speech recognition was laughable and barely worked, and it seemed like one of those thing that would never really work?
Yes, I've used sphinx, not quite recently, and when I last used it (which was years ago), it was trivial to set up a working english install with the pre-provided models, as an amateur. From there it's only the problem of improving the model (which is obviously a pretty hard problem).
Are you saying that the sphinx progress is at that same level currently as the state of the art in the 90s? Surely that's hyperbole.
Also, using AI to generate even better models doesn't seem like a bad idea, if you're looking to improve it -- the fundamentals of speech recognition haven't changed, why build a completely separate open source product instead of contributing to something already well understood and accessible?
As far as I can tell, CMU Sphinx is still based on HMMs, which were the previous state of the art before neural networks brought a breakthrough in model performance. So it is likely that CMU Sphinx is currently not much better than what was possible in the 90s. When I last looked into this, I found a mailing-list message by one of the maintainers, where he explicitly recommended to use Kaldi if you want better results.
Kaldi does support neural network models in addition to good old HMMs, but it is very "researchy": everything is set up so that you can replace any step in the pipeline by your components (and then publish a paper about your results). But that also means that you pretty much have to be an expert to correctly assemble a working product from the available components, and it's pretty much assumed that you will be training your own models, which can be difficult when you don't have access to lots of data.
So yes, the existing landscape for open-source speech recognition leaves something to be desired, and the focus of existing projects doesn't necessarily lend itself to turning them into what you want.
Sorry, I wasn't clear -- what I meant to say is that even inside the limitations of HMM, it's absurd to imply that CMUSphinx has made no progress on the state of the art in the 90s. A greatly improved methodology/approach to HMM is hard won, and it's unfair to minimize that effort that produced progreess in an old method just because a new method has been developed.
neural network models have their own downsides, the biggest one of which being training -- why throw the baby out with the bathwater instead of taking the approach kaldi has taken, possibly building a neural network model alternative inside sphinx?
Let CMU plug away at making HMMs better, while you plug away at making neural networks better, but interoperate so everyone gets both benefits? You can even maybe make some headway with the neural network bootstrapping problem with some help from the progress CMU has already made.
I understand that the project is using AI, but why not feed that learning into sphinx, or some other tool? Couldn't this product have just been essentially an extension to make one of those other open source, research-backed efforts smarter?
How does any other project benefit from the models you build? Or is that the business model -- produce open source software that no one else can really extend or use with anything else, but hopefully people will then buy into your modelling strategy + tooling?
I do realize that you have absolutely no obligation to any other voice recognition effort, but I wonder how easy it is for anyone else to use the model you're building.
Thanks so much for being open about it, I see why you didn't go with trying to extend it.
Again, I want to express that you don't owe me anything (and it was entitled of me to imply that you did) -- but I wanted to know. Maybe in the future writing that thing that can enrich other models is possible.
Does the voice recognition AI use all of the Pi's processing power while running? I mean, can the Pi do anything else while the voice recog is running? Or does it context switch after a voice command? Sorry if this is a silly question...
We are using Rust because it is safe and very fast!
We would love to tell more about what we are building if you have questions.
The whole platform runs on-device which makes it ideal for privacy, cost, and to allow it to run when there is no network
We are available in English, French, German, and soon Japanese and Korean and we are working on other European languages!
We would love to see what you build with our platform to feature it on our website
Take a look at what some people have built with it: https://github.com/snipsco/awesome-snips
and a few tutorials to get you started: https://medium.com/snips-ai/building-a-voice-controlled-home...