Hacker News new | past | comments | ask | show | jobs | submit login
Snips Uses Rust to Build an Embedded Voice Assistant (blog.mozilla.org)
209 points by albi_lander on Feb 22, 2018 | hide | past | favorite | 56 comments



Hi, I'm the co-founder of https://snips.ai and we are building a 100% on-device Voice AI platform which runs on the Raspberry Pi 3 It is free to use for makers, and we will start open-sourcing the components a few weeks from now.

We are using Rust because it is safe and very fast!

We would love to tell more about what we are building if you have questions.

The whole platform runs on-device which makes it ideal for privacy, cost, and to allow it to run when there is no network

We are available in English, French, German, and soon Japanese and Korean and we are working on other European languages!

We would love to see what you build with our platform to feature it on our website

Take a look at what some people have built with it: https://github.com/snipsco/awesome-snips

and a few tutorials to get you started: https://medium.com/snips-ai/building-a-voice-controlled-home...


This sounds very interesting. Being capable of working unconnected is the only way to get true privacy from these devices. Having them open sourced (hopefully 100% though I can understand the reasons for keeping some parts closed) would be even better. I have always been strongly opposed to networked voice assistants because the risk of them being used as spying tools by design is simply too high, And surely no Google, Amazon, Apple etc. assistant will never enter my home, but that one is a different beast. Keep up the good work!

ps. Please add other European languages soon :)


We are working on this! What language would you want?


I'm Italian. There are applications in the automotive field for a mostly disconnected device such as this one. If it can deal with a noisy environment like a car that would make it even more interesting.


We are also working on car environment, and we expect to start italian this year :) Subscribe to our newsletter!


Do you use Deep Learning, HMM or something else for your voice recognition? If so, how did you make it fast on RPi3? Can I use your tech on RPi2 as well (for controlling a humanoid robot)?


Yes, this is a DL model! For the ASR and the wakeword recognition. We are doing some low-level optimizations

For the Raspberry Pi 2 the model is a bit too large, but we will be able to handle them as hubs: your Raspberry Pi 2 in multiple rooms will detect the wakeword and stream the voice to a "hub" Raspberry Pi 3 which will do the speech recognition


what tools you use for building/training DL models? How do you connect them to Rust?


Judging by their employees' posts to the kaldi-help group, they most likely use the Kaldi toolkit.


+1 for fully on-device.

Where, generally, does a company like this get training data? Do you just hire a team to tag TV clips and YouTube videos?


We do a mix of having people generate data, gathering data from public and commercial sources


And another +1 for transparency. This is a product I can get behind.


Rust's algebraic data types and patterns matching allows to write mathematically-complete decisions flows - do you use it?


I signed up for an account and plan on trying it out later, and I just want to say: thank you for explaining what your newsletter is, and for making it opt-in, instead of opt-out.


Are you aware of mycroft.ai? Your goals are similar, but the implementation is different. Any possibility of collaboration.


Hope you still exist in Jan 2021 when I'm getting (hopefully ah) my PhD ;) so I can apply


amazing! Do you have any timelines for portuguese? (And, quite specifically, brazilian portuguese?)

Thanks!


Hi! We have quite a few languages on our plate, but it's very likely we'll work on Brazilian Portuguese before the end of the year.


nice to have projects linked, any live demo ?



>How can Snips embed all the code for a voice assistant onto a single device? They wrote it using the Rust systems programming language.

And that's what we call an "non sequitur".


And it taints the whole piece.

"Instead of having to write and then rewrite for each platform, he used Rust’s cross-compilation capability" also does not make a lot of sense.


Well, to be fair, Rust's toolchain "speaks" cross-compilation natively. I haven't found any other toolchain with such a well-designed cross-compilation experience.

The piece seems to have been a bit over-simplified for non-technical readers, though.


Go has superb support for cross-compilation too.


Great project, definitely a worthy goal. I know several people who are skeptical of voice assistants because they aren't fully on device.

The blog article was very lean on the technical details apart from some basics about rust's safety and portability features.

could you elaborate on where the gains from rust come into play ?, is your model built on a DL framework that is written in rust ?.


The DL framework is classical, but the models can then be reimplemented in pure Rust for efficiency


So you're taking the parameters from something like tensorflow and then exporting them to be executed in Rust ? and that is more efficient than the c++ backend of tensorflow ?.


It is useful for very small devices where you don't want to ship a bigger C++ lib


Neat, would be great to have a technical blog post on what your team has built. it sounds really interesting.


Hi! Keep an eye on our Medium: https://medium.com/snips-ai We have a few technical blog posts out there already, and a few more coming.


And also removing all of the overhead of the machine learning abstraction. Although I am skeptical that anyone would ever actually do that


I used to train my parser (before switching to Tensorflow) using Caffe, dumped the parameters using a small program, and loaded them up in Go arrays slices and applied the network using simple C BLAS operations. This works fine, especially when you are using simpler networks. As a bonus, you don't have the overhead of Tensorflow session runs.

It does become a bit of a drag when you are building more complex networks (e.g. with multiple RNN layers, batch normalization, etc.). In that case there are two straightforward options for Rust. There is a Tensorflow Rust binding against the tensorflow C API [1] with which you can just load and run a frozen Tensorflow graph. This is the approach that I am currently using, though I am running graphs on workstations/servers.

Another option is compiling the graph with Tensorflow's XLA AOT compilation, which compiles the network to C++ classes (that you could bind from Rust).

[1] https://github.com/tensorflow/rust


There seems to be a third option provided by someone, kali, working at Snips: https://github.com/kali/tensorflow-deploy-rust


This only supports a small subset of ops. It is pretty much corresponds to the first option that I mentioned - train with Tensorflow, extract the parameters and provide implemenations of ops in your native language.


Not exactly sure what you mean. After the model is trained there isn't much "machine learning abstraction".

you can serve the model using tensorflow serving using exported parameters. Its exactly what you want for serving the model.


have you seen RLSL? It's a subset of Rust targetting SPIR-V.

https://github.com/MaikKlein/rlsl


I really love the idea of an off-line voice assistant, however most of my tech friends in France and I feel that Snips (which is very visible on the press here) still has to deliver something tangible (). This might be the first time I want to try one of their product though.

they had announced a keyboard app for iOS and Android which did seemingly the same thing as the swipe keyboard.


I agree.

Their IA assistant seems really interesting, and I would really love to give it a try, but I don't want to invest energy if there is a high risk that the service shuts down soon.

I remember the Tranquilien app [0], which was also developed by Snips. There was a lot of media talking about it, but it has been quickly stopped :(.

[0] https://www.digital.sncf.com/store/applications/tranquilien-...


Actually the good thing is that if Snips disappears, you will still be able to run your snips assistant as everything runs locally.


Yeah, but that's until some library makes a backwards-incompatible change and the whole things stops building. Sure, it's better than a completely proprietary service, but support and active maintenance are very important too.


The Rust ecosystem and build system, is designed to prevent this. If you're using a library and want to code to keep compiling in 10 years, you can (and should) just lock the dependencies.

Since the libraries are all mirrored on crates.io and, by design, cannot be removed once they are published, you should be good, at least until the day the owners of crates.io stop being able to pay the server costs.

I believe that you can also host local mirrors, but I haven't checked that.


I think that they shifted dramatically from experimenting around `smart cities` to be an AI Startup focusing on building voice assistants.

On Techcrunch [0] > The team didn’t have a product or business model in mind. It experimented for a while, launching projects around smart cities, such as Tranquilien, a service that predicts how busy your train is going to be, or a smart empty parking spot prediction service.``

[0] https://techcrunch.com/2015/06/26/snips-grabs-6-3-million-to...

[1] https://www.crunchbase.com/organization/snips-2


Could you expand on why CMUSphinx[0]/Julius[1] not used?

I'm all for using rust for the speed and safety it allows, and then FFI-ing to CMUSphinx (this is exactly what I want to do, given more free time in the future), could you explain why the available open source libraries (CMUsphinx is just one, there's also Julius) couldn't be used?

[0] - http://cmusphinx.github.io/

[1] - https://github.com/julius-speech/julius


Have you used Sphinx? Remember in the 90s and early 00's when speech recognition was laughable and barely worked, and it seemed like one of those thing that would never really work?

Sphinx is like that.


Yes, I've used sphinx, not quite recently, and when I last used it (which was years ago), it was trivial to set up a working english install with the pre-provided models, as an amateur. From there it's only the problem of improving the model (which is obviously a pretty hard problem).

Are you saying that the sphinx progress is at that same level currently as the state of the art in the 90s? Surely that's hyperbole.

Also, using AI to generate even better models doesn't seem like a bad idea, if you're looking to improve it -- the fundamentals of speech recognition haven't changed, why build a completely separate open source product instead of contributing to something already well understood and accessible?


As far as I can tell, CMU Sphinx is still based on HMMs, which were the previous state of the art before neural networks brought a breakthrough in model performance. So it is likely that CMU Sphinx is currently not much better than what was possible in the 90s. When I last looked into this, I found a mailing-list message by one of the maintainers, where he explicitly recommended to use Kaldi if you want better results.

Kaldi does support neural network models in addition to good old HMMs, but it is very "researchy": everything is set up so that you can replace any step in the pipeline by your components (and then publish a paper about your results). But that also means that you pretty much have to be an expert to correctly assemble a working product from the available components, and it's pretty much assumed that you will be training your own models, which can be difficult when you don't have access to lots of data.

So yes, the existing landscape for open-source speech recognition leaves something to be desired, and the focus of existing projects doesn't necessarily lend itself to turning them into what you want.


Sorry, I wasn't clear -- what I meant to say is that even inside the limitations of HMM, it's absurd to imply that CMUSphinx has made no progress on the state of the art in the 90s. A greatly improved methodology/approach to HMM is hard won, and it's unfair to minimize that effort that produced progreess in an old method just because a new method has been developed.

neural network models have their own downsides, the biggest one of which being training -- why throw the baby out with the bathwater instead of taking the approach kaldi has taken, possibly building a neural network model alternative inside sphinx?

Let CMU plug away at making HMMs better, while you plug away at making neural networks better, but interoperate so everyone gets both benefits? You can even maybe make some headway with the neural network bootstrapping problem with some help from the progress CMU has already made.


We are using deep-learning models which have much better accuracy for the speech recognition


I understand that the project is using AI, but why not feed that learning into sphinx, or some other tool? Couldn't this product have just been essentially an extension to make one of those other open source, research-backed efforts smarter?

How does any other project benefit from the models you build? Or is that the business model -- produce open source software that no one else can really extend or use with anything else, but hopefully people will then buy into your modelling strategy + tooling?

I do realize that you have absolutely no obligation to any other voice recognition effort, but I wonder how easy it is for anyone else to use the model you're building.


Sphinx has its own models, it is not easy to extend it with the frameworks we are using

We will be open-sourcing more of the platform over time and give back to the community, this will start with the NLU in the coming weeks


Thanks so much for being open about it, I see why you didn't go with trying to extend it.

Again, I want to express that you don't owe me anything (and it was entitled of me to imply that you did) -- but I wanted to know. Maybe in the future writing that thing that can enrich other models is possible.


> deep-learning models

What kind of models do you use ?


Does the voice recognition AI use all of the Pi's processing power while running? I mean, can the Pi do anything else while the voice recog is running? Or does it context switch after a voice command? Sorry if this is a silly question...


No, the Pi can still do computations while the speech recognition runs


This sounds amazing. Looking forward to trying it out on my rpi


> It had the traits Snips needed

Well done.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: