TensorFlow on a 256K RAM microcontroller

kazami · on Nov 4, 2017

Hey guys,

This project is at its infancy. We are barely able to walk!

As a proof of concept, quantized MNIST is happily running in Mbed enabled MCUs. Upcoming work will include: reference counting, memory abstraction, tensorflow-to-mbed exporter and more ops.

These should be enough for us to run most DL models out there.

We started with the idea of putting AI everywhere and help people to build cooler things.

Inputs and collaborations are welcome.

Neil Tan, Kazami Hsieh, Dboy Liao, Michael Bartling

sandGorgon · on Nov 4, 2017

Hi, one of the biggest issues we have consistently had with TF is the serialization. How do you export from training and take that and deploy it on hardware.

Could you share your training scripts - would love to look at the piece which does quantization/SavedModel/FreezeGraph

kazami · on Nov 4, 2017

Hello, in ordert to export node information from tensor flow, we create another project tf-node which could be useful for you

https://github.com/neil-tan/tf-node-viewer

katastic · on Nov 5, 2017

Have you considered rewriting it in Rust?

kfihihc · on Nov 4, 2017

It seems Microsoft has another ML library[1] for MCU.

[1]: Embedded Learning Library, https://github.com/Microsoft/ELL

kazami · on Nov 4, 2017

Embedded learning library is good. But, this project aims to integrate with tensorflow. If you need run machine learning on MCU, these could be your choices.

leggomylibro · on Nov 4, 2017

Holy cow, an F767. I thought the 'I' postfixes were for somewhere around 512K-1M, though? Anyways, I guess you'd want a nice chip for this sort of thing, but could this also work on a cheaper F303CC?

Edit: Oh, 256KB RAM. Nevermind, although 2Mb RAM modules are pretty cheap and have as little as 32 pins...

neiltan · on Nov 4, 2017

You are right that the F767ZI has 512kb of RAM. However, the MLP code has been tested on boards with 256kb of RAM. Changes are on the way to pull that number even lower, significantly.

leggomylibro · on Nov 4, 2017

Neat! Have you looked at the upcoming H7 lines yet? I think they were running into production issues or problems with the smaller 40nm process or something, but apparently they can clock up to 400MHz and have a whole MB.

meta_AU · on Nov 4, 2017

Am I right in understanding that this implementation is aimed at running the network (vs training one)? Sounds very exciting to be able to train a system and then be able to apply that to a hardware based problem without needing to so so much porting of the NN code.

stolendoggy · on Nov 4, 2017

Inferencing is the focus for now. We should able to go from TF to MCU (mbed) without writing any NN code.

alvis · on Nov 4, 2017

Haven't got a mbed to test, but if it does work, it's a game changer! There are many potentially benefited applications (e.g. medical imaging in remote area) where speed is not essential.

Good work guys!

neiltan · on Nov 4, 2017

Thanks man! You summed it up nicely! This project is about making the trade-off between speed and cost. There is greatness in either-end of the spectrum.

smallhands · on Nov 4, 2017

amazing project ! as we are at it how do i run tensorflow natively on 32bit .net window 7

akhilcacharya · on Nov 4, 2017

This is really cool! What does prediction performance look like?

neiltan · on Nov 4, 2017

Roughly 97.1%. Check out this preceding project where the MLP example is based on: https://github.com/neil-tan/tf-node-viewer

akhilcacharya · on Nov 5, 2017

I more meant speed.

visarga · on Nov 4, 2017

In 256K I can't even fart. It's like one or two JPEG images. If you write 2-3 longer articles, it's full.

Kelbit · on Nov 4, 2017

That's a huge amount of RAM for most microcontrollers.

dbcurtis · on Nov 4, 2017

And when I took the operating systems lab class, we had block time access to a PDP-11 with half that amount of total memory and a keypad for entering the bootloader -- a big step up from toggle switches. One of the reasons old grey-beards like me can find a happy home doing embedded software is that we don't get that deer-in-headlights reaction to memory counted in KBytes and single-digit MBytes.

Oh... almost forgot... get off my lawn...

kazami · on Nov 4, 2017

It is not just MNIST specialized, it will be general purpose framework aim to make inference. We will make more ops to let developer can create more models on device, which could be trained from tensorflow. This project is still ongoing.

mr_toad · on Nov 6, 2017

My first computer had just 1KB of RAM. My second had 64K. My third had 640K (and a massive 20MB hard drive!)

(Older readers might be able to guess the make & model from those specs).