Snips Uses Rust to Build an Embedded Voice Assistant

oulipo · on Feb 22, 2018

Hi, I'm the co-founder of https://snips.ai and we are building a 100% on-device Voice AI platform which runs on the Raspberry Pi 3 It is free to use for makers, and we will start open-sourcing the components a few weeks from now.

We are using Rust because it is safe and very fast!

We would love to tell more about what we are building if you have questions.

The whole platform runs on-device which makes it ideal for privacy, cost, and to allow it to run when there is no network

We are available in English, French, German, and soon Japanese and Korean and we are working on other European languages!

We would love to see what you build with our platform to feature it on our website

Take a look at what some people have built with it: https://github.com/snipsco/awesome-snips

and a few tutorials to get you started: https://medium.com/snips-ai/building-a-voice-controlled-home...

squarefoot · on Feb 22, 2018

This sounds very interesting. Being capable of working unconnected is the only way to get true privacy from these devices. Having them open sourced (hopefully 100% though I can understand the reasons for keeping some parts closed) would be even better. I have always been strongly opposed to networked voice assistants because the risk of them being used as spying tools by design is simply too high, And surely no Google, Amazon, Apple etc. assistant will never enter my home, but that one is a different beast. Keep up the good work!

ps. Please add other European languages soon :)

oulipo · on Feb 22, 2018

We are working on this! What language would you want?

squarefoot · on Feb 22, 2018

I'm Italian. There are applications in the automotive field for a mostly disconnected device such as this one. If it can deal with a noisy environment like a car that would make it even more interesting.

oulipo · on Feb 22, 2018

We are also working on car environment, and we expect to start italian this year :) Subscribe to our newsletter!

bitL · on Feb 22, 2018

Do you use Deep Learning, HMM or something else for your voice recognition? If so, how did you make it fast on RPi3? Can I use your tech on RPi2 as well (for controlling a humanoid robot)?

oulipo · on Feb 22, 2018

Yes, this is a DL model! For the ASR and the wakeword recognition. We are doing some low-level optimizations

For the Raspberry Pi 2 the model is a bit too large, but we will be able to handle them as hubs: your Raspberry Pi 2 in multiple rooms will detect the wakeword and stream the voice to a "hub" Raspberry Pi 3 which will do the speech recognition

riku_iki · on Feb 22, 2018

what tools you use for building/training DL models? How do you connect them to Rust?

woodson · on Feb 22, 2018

Judging by their employees' posts to the kaldi-help group, they most likely use the Kaldi toolkit.

nerdponx · on Feb 22, 2018

+1 for fully on-device.

Where, generally, does a company like this get training data? Do you just hire a team to tag TV clips and YouTube videos?

oulipo · on Feb 22, 2018

We do a mix of having people generate data, gathering data from public and commercial sources

nerdponx · on Feb 22, 2018

And another +1 for transparency. This is a product I can get behind.

EugeneOZ · on Feb 22, 2018

Rust's algebraic data types and patterns matching allows to write mathematically-complete decisions flows - do you use it?

jffry · on Feb 22, 2018

I signed up for an account and plan on trying it out later, and I just want to say: thank you for explaining what your newsletter is, and for making it opt-in, instead of opt-out.

bluGill · on Feb 22, 2018

Are you aware of mycroft.ai? Your goals are similar, but the implementation is different. Any possibility of collaboration.

JustFinishedBSG · on Feb 22, 2018

Hope you still exist in Jan 2021 when I'm getting (hopefully ah) my PhD ;) so I can apply

just_testing · on Feb 22, 2018

amazing! Do you have any timelines for portuguese? (And, quite specifically, brazilian portuguese?)

Thanks!

jodureau · on Feb 22, 2018

Hi! We have quite a few languages on our plate, but it's very likely we'll work on Brazilian Portuguese before the end of the year.

t_fatus · on Feb 22, 2018

nice to have projects linked, any live demo ?

mfester · on Feb 22, 2018

Here are a few:

- Music player with Spotify and Sonos: https://vimeo.com/237742054

- News and weather forecasts: https://www.youtube.com/watch?v=IFQzDSWr2l0

- Controlling blinds: https://www.youtube.com/watch?v=ukkOLqcm2CY&t=9s

- Zork game: https://www.youtube.com/watch?v=5RX4Dm9TmCY

- Prayer assistant: https://www.youtube.com/watch?v=x1AGaYBLzkA&feature=youtu.be

coldtea · on Feb 22, 2018

>How can Snips embed all the code for a voice assistant onto a single device? They wrote it using the Rust systems programming language.

And that's what we call an "non sequitur".

Jyaif · on Feb 22, 2018

And it taints the whole piece.

"Instead of having to write and then rewrite for each platform, he used Rust’s cross-compilation capability" also does not make a lot of sense.

Yoric · on Feb 22, 2018

Well, to be fair, Rust's toolchain "speaks" cross-compilation natively. I haven't found any other toolchain with such a well-designed cross-compilation experience.

The piece seems to have been a bit over-simplified for non-technical readers, though.

luciocorrea · on Feb 22, 2018

Go has superb support for cross-compilation too.

quadrature · on Feb 22, 2018

Great project, definitely a worthy goal. I know several people who are skeptical of voice assistants because they aren't fully on device.

The blog article was very lean on the technical details apart from some basics about rust's safety and portability features.

could you elaborate on where the gains from rust come into play ?, is your model built on a DL framework that is written in rust ?.

oulipo · on Feb 22, 2018

The DL framework is classical, but the models can then be reimplemented in pure Rust for efficiency

quadrature · on Feb 22, 2018

So you're taking the parameters from something like tensorflow and then exporting them to be executed in Rust ? and that is more efficient than the c++ backend of tensorflow ?.

oulipo · on Feb 22, 2018

It is useful for very small devices where you don't want to ship a bigger C++ lib

quadrature · on Feb 22, 2018

Neat, would be great to have a technical blog post on what your team has built. it sounds really interesting.

jodureau · on Feb 22, 2018

Hi! Keep an eye on our Medium: https://medium.com/snips-ai We have a few technical blog posts out there already, and a few more coming.

throwawayendy · on Feb 22, 2018

And also removing all of the overhead of the machine learning abstraction. Although I am skeptical that anyone would ever actually do that

danieldk · on Feb 22, 2018

I used to train my parser (before switching to Tensorflow) using Caffe, dumped the parameters using a small program, and loaded them up in Go arrays slices and applied the network using simple C BLAS operations. This works fine, especially when you are using simpler networks. As a bonus, you don't have the overhead of Tensorflow session runs.

It does become a bit of a drag when you are building more complex networks (e.g. with multiple RNN layers, batch normalization, etc.). In that case there are two straightforward options for Rust. There is a Tensorflow Rust binding against the tensorflow C API [1] with which you can just load and run a frozen Tensorflow graph. This is the approach that I am currently using, though I am running graphs on workstations/servers.

Another option is compiling the graph with Tensorflow's XLA AOT compilation, which compiles the network to C++ classes (that you could bind from Rust).

[1] https://github.com/tensorflow/rust

albi_lander · on Feb 23, 2018

There seems to be a third option provided by someone, kali, working at Snips: https://github.com/kali/tensorflow-deploy-rust

danieldk · on Feb 24, 2018

This only supports a small subset of ops. It is pretty much corresponds to the first option that I mentioned - train with Tensorflow, extract the parameters and provide implemenations of ops in your native language.

quadrature · on Feb 22, 2018

Not exactly sure what you mean. After the model is trained there isn't much "machine learning abstraction".

you can serve the model using tensorflow serving using exported parameters. Its exactly what you want for serving the model.

tmzt · on Feb 22, 2018

have you seen RLSL? It's a subset of Rust targetting SPIR-V.

https://github.com/MaikKlein/rlsl

t_fatus · on Feb 22, 2018

I really love the idea of an off-line voice assistant, however most of my tech friends in France and I feel that Snips (which is very visible on the press here) still has to deliver something tangible (). This might be the first time I want to try one of their product though.

they had announced a keyboard app for iOS and Android which did seemingly the same thing as the swipe keyboard.

altharaz · on Feb 22, 2018

I agree.

Their IA assistant seems really interesting, and I would really love to give it a try, but I don't want to invest energy if there is a high risk that the service shuts down soon.

I remember the Tranquilien app [0], which was also developed by Snips. There was a lot of media talking about it, but it has been quickly stopped :(.

[0] https://www.digital.sncf.com/store/applications/tranquilien-...

albi_lander · on Feb 22, 2018

Actually the good thing is that if Snips disappears, you will still be able to run your snips assistant as everything runs locally.

stavros · on Feb 22, 2018

Yeah, but that's until some library makes a backwards-incompatible change and the whole things stops building. Sure, it's better than a completely proprietary service, but support and active maintenance are very important too.

Yoric · on Feb 22, 2018

The Rust ecosystem and build system, is designed to prevent this. If you're using a library and want to code to keep compiling in 10 years, you can (and should) just lock the dependencies.

Since the libraries are all mirrored on crates.io and, by design, cannot be removed once they are published, you should be good, at least until the day the owners of crates.io stop being able to pay the server costs.

I believe that you can also host local mirrors, but I haven't checked that.

__bee · on Feb 22, 2018

I think that they shifted dramatically from experimenting around `smart cities` to be an AI Startup focusing on building voice assistants.

On Techcrunch [0] > The team didn’t have a product or business model in mind. It experimented for a while, launching projects around smart cities, such as Tranquilien, a service that predicts how busy your train is going to be, or a smart empty parking spot prediction service.``

[0] https://techcrunch.com/2015/06/26/snips-grabs-6-3-million-to...

[1] https://www.crunchbase.com/organization/snips-2

hardwaresofton · on Feb 22, 2018

Could you expand on why CMUSphinx[0]/Julius[1] not used?

I'm all for using rust for the speed and safety it allows, and then FFI-ing to CMUSphinx (this is exactly what I want to do, given more free time in the future), could you explain why the available open source libraries (CMUsphinx is just one, there's also Julius) couldn't be used?

[0] - http://cmusphinx.github.io/

[1] - https://github.com/julius-speech/julius

IshKebab · on Feb 22, 2018

Have you used Sphinx? Remember in the 90s and early 00's when speech recognition was laughable and barely worked, and it seemed like one of those thing that would never really work?

Sphinx is like that.

hardwaresofton · on Feb 22, 2018

Yes, I've used sphinx, not quite recently, and when I last used it (which was years ago), it was trivial to set up a working english install with the pre-provided models, as an amateur. From there it's only the problem of improving the model (which is obviously a pretty hard problem).

Are you saying that the sphinx progress is at that same level currently as the state of the art in the 90s? Surely that's hyperbole.

Also, using AI to generate even better models doesn't seem like a bad idea, if you're looking to improve it -- the fundamentals of speech recognition haven't changed, why build a completely separate open source product instead of contributing to something already well understood and accessible?

yorwba · on Feb 22, 2018

As far as I can tell, CMU Sphinx is still based on HMMs, which were the previous state of the art before neural networks brought a breakthrough in model performance. So it is likely that CMU Sphinx is currently not much better than what was possible in the 90s. When I last looked into this, I found a mailing-list message by one of the maintainers, where he explicitly recommended to use Kaldi if you want better results.

Kaldi does support neural network models in addition to good old HMMs, but it is very "researchy": everything is set up so that you can replace any step in the pipeline by your components (and then publish a paper about your results). But that also means that you pretty much have to be an expert to correctly assemble a working product from the available components, and it's pretty much assumed that you will be training your own models, which can be difficult when you don't have access to lots of data.

So yes, the existing landscape for open-source speech recognition leaves something to be desired, and the focus of existing projects doesn't necessarily lend itself to turning them into what you want.

hardwaresofton · on Feb 22, 2018

Sorry, I wasn't clear -- what I meant to say is that even inside the limitations of HMM, it's absurd to imply that CMUSphinx has made no progress on the state of the art in the 90s. A greatly improved methodology/approach to HMM is hard won, and it's unfair to minimize that effort that produced progreess in an old method just because a new method has been developed.

neural network models have their own downsides, the biggest one of which being training -- why throw the baby out with the bathwater instead of taking the approach kaldi has taken, possibly building a neural network model alternative inside sphinx?

Let CMU plug away at making HMMs better, while you plug away at making neural networks better, but interoperate so everyone gets both benefits? You can even maybe make some headway with the neural network bootstrapping problem with some help from the progress CMU has already made.

oulipo · on Feb 22, 2018

We are using deep-learning models which have much better accuracy for the speech recognition

hardwaresofton · on Feb 22, 2018

I understand that the project is using AI, but why not feed that learning into sphinx, or some other tool? Couldn't this product have just been essentially an extension to make one of those other open source, research-backed efforts smarter?

How does any other project benefit from the models you build? Or is that the business model -- produce open source software that no one else can really extend or use with anything else, but hopefully people will then buy into your modelling strategy + tooling?

I do realize that you have absolutely no obligation to any other voice recognition effort, but I wonder how easy it is for anyone else to use the model you're building.

oulipo · on Feb 22, 2018

Sphinx has its own models, it is not easy to extend it with the frameworks we are using

We will be open-sourcing more of the platform over time and give back to the community, this will start with the NLU in the coming weeks

hardwaresofton · on Feb 22, 2018

Thanks so much for being open about it, I see why you didn't go with trying to extend it.

Again, I want to express that you don't owe me anything (and it was entitled of me to imply that you did) -- but I wanted to know. Maybe in the future writing that thing that can enrich other models is possible.

__bee · on Feb 22, 2018

> deep-learning models

What kind of models do you use ?

dustindriver · on Feb 22, 2018

Does the voice recognition AI use all of the Pi's processing power while running? I mean, can the Pi do anything else while the voice recog is running? Or does it context switch after a voice command? Sorry if this is a silly question...

oulipo · on Feb 22, 2018

No, the Pi can still do computations while the speech recognition runs

bfrog · on Feb 22, 2018

This sounds amazing. Looking forward to trying it out on my rpi

ziftface · on Feb 22, 2018

> It had the traits Snips needed

Well done.