FREE TPU for FPGA

Free TPU is the free version of a commercial TPU design for Deep Learning EDGE Inference, which can deploy at any FPGA device including Xilinx Zynq-7020 or Kintex7-160T (both good choices for production). Actually, not just only a TPU logic design, the Free TPU also include the EEP Accelerating Framework supporting all caffe layers, which can run at any CPU (such as the ARM A9 of Zynq-7020 or INTEL/AMD). TPU and CPU co-work with each other under the schedule of the Deep Learning Inference Framework (any order of alternation). Then, you can do anything you want with FREE! (We release the Free TPU with MIT LICENSE). After a comprehensive pressure testing, we are finally ready to formally release the FREE-TPU with commercial quality. Enjoy it!
More details, please visit https://www.embedeep.com

FPGA demo for FREE-TPU

System Structure

Application environment of FREE-TPU

HARD IP CORE

	FREE-TPU	EEP-TPU-FS020	EEP-TPU-FS035-152	EEP-TPU-FS035-272	EEP-TPU-FS035-528	EEP-TPU-FS035-1040
Binding Hardware	No	EEP-TPU-M020	EEP-TPU-EV-S Kits	EEP-TPU-EV-S Kits	EEP-TPU-EV-S Kits	EEP-TPU-EV-S Kits
Target Device	Xilinx xc7z020-1	Xilinx xc7z020-1	Xilinx xc7z035-2	Xilinx xc7z035-2	Xilinx xc7z035-2	Xilinx xc7z035-2
FPGA utilization	LUT: 62.96% FF: 56.08% BRAM: 80.00% DSP: 73.18%	LUT: 66.12% FF: 56.92% BRAM: 80.00% DSP: 73.18%	LUT: 26.54% FF: 23.34% BRAM: 37.40% DSP: 17.89%	LUT: 26.52% FF: 23.35% BRAM: 37.40% DSP: 31.89%	LUT: 28.89% FF: 24.02% BRAM: 37.40% DSP: 59.67%	LUT: 69.85% FF: 40.90% BRAM: 40.60% DSP: 100%
TPU Layer set	LS-V1.0	LS-V2.0	LS-V2.0	LS-V2.0	LS-V2.0	LS-V2.0
FP16 MACs	152	152	152	272	528	1040
Frequency	100 MHz	100 MHz	200 MHz	200MHZ	200MHZ	200MHZ
AXI port	1	1	2	2	2	2
Off-chip DDR bandwidth	12.8 Gbps (Shared)	12.8 Gbps (Shared)	51.2 Gbps (Dedicated)	51.2 Gbps (Dedicated)	51.2 Gbps (Dedicated)	51.2 Gbps (Dedicated)
On-chip Memory	256 KByte	256 KByte	512 KByte	512KByte	512KByte	512KByte
OS	Linux	Linux	Linux	Linux	Linux	Linux
Compiler	NO	YES	YES	YES	YES	YES
EEP Accelerating Framework	YES	YES	YES	YES	YES	YES
EEP NNAPI	YES	YES	YES	YES	YES	YES
Android NNAPI	NO	Release in future	Release in future	Release in future	Release in future	Release in future
License	MIT	Commercial	Commercial	Commercial	Commercial	Commercial

From the view of users, Free-TPU and EEP-TPU has the same functionality, but with different inference time. If the inference time of Free-TPU is NOT good enough for your applications, you can contact us in any time. We will be happy to share our experience about algorithm, software or hardware with you.

TIPS

We plan to release two implementations targeted to two different FPGA chip: Xilinx Zynq-7020 and Xilinx Kintex7-160T. For now, the implementation for Xilinx Zynq-7020 has been released.
For FREE-TPU users who do not have a FPGA board with Xilinx Zynq-7020 chip, you are free to choose a FPGA board according to the supporting board lists declared at the Appendix section of Free-TPU-OS, or you can choose our official $99 EEP-TPU-M020 board.
For users who want to try any other NN network, you can send the corresponding **.prototxt and **.caffemodel to us for BIN generation, or, our official $99 EEP-TPU-M020 board that including a EEP-TPU-Compiler software is ready for you.
For advanced users who intend to modify the logic design in FPGA, including add new IP in PL side of Zynq (such as a video DMA), use PCIe based hardware scheme or configure the EEP-TPU with different params, we are providing three kinds of $999 EEP-TPU-EV-S/A/H Kits that including one Soft IP which CAN be merged into user design through encrypted code.
For advanced users who intend to modify the hardware but had limited experience, we are providing the $1999/year technical support that including the EEP-TPU hardware customization service.
Please visit https://www.embedeep.com for more info, which including the technical detail of FREE TPU, the definition of Layer set, and the product of EMBEDEEP which could be helpful for your research and development.

Benchmark

	667MHZ ARM A9 with NEON	FREE-TPU on EEP-TPU-M020	EEP-TPU-FS020 on EEP-TPU-M020	EEP-TPU-FS035-152 on EEP-TPU-EV-S	EEP-TPU-FS035-272 on EEP-TPU-EV-S	EEP-TPU-FS035-528 on EEP-TPU-EV-S
FP16 computing resource	NA	30.4 GOPS	30.4 GOPS	60.8 GOPS	108.8 GOPS	211.2 GOPS
on-chip memory	NA	256 KByte	256 KByte	512 KByte	512 KByte	512 KByte
off-chip DDR bandwidth	33 Gbps (Shared)	12.8 Gbps (Shared)	12.8 Gbps (Shared)	51.2 Gbps (Dedicated)	51.2 Gbps (Dedicated)	51.2 Gbps (Dedicated)
lenet-5	4.6ms	1.359ms	1.345ms	0.703ms	0.657ms	0.619ms
Mobilenet-V1	768ms	74.754ms	74.739ms	40.731ms	30.196ms	25.633ms
Mobilenet-V1 with mergeBN	679.6ms	66.571ms	66.564ms	34.391ms	23.883ms	19.317ms
Mobilenet-V2	810ms	83.896ms	83.867ms	48.468ms	41.337ms	38.470ms
Mobilenet-V2 with mergeBN	660ms	69.540ms	69.541ms	37.290ms	30.153ms	27.255ms
Squeezenet-v1.1	416.4ms	46.216ms	46.235ms	25.356ms	17.890ms	14.895ms
Resnet-50 with mergeBN	4753ms	367.696ms	367.674ms	188.413ms	115.299ms	82.900ms
Inception-V3	21493.7ms	516.362ms	516.539ms	265.305ms	154.225ms	102.281ms
Mobilenet-YOLOV3	2058.6ms	207.849ms	179.422ms	91.311ms	59.827ms	44.578ms
ICNet	3347.8ms	1342.354ms	228.263ms	122.257ms	83.348ms	64.173ms

FREE-TPU and EEP-TPU all use FP16 model to get the same accuracy with the original FP32 model.
The inference time includes the time to fetch an image from HOST memory, computing, and export the result to HOST memory.

Software Development Kits

Free TPU SDK imports the trained model from CAFFE (**.prototxt and **.caffemodel) directly to generate the BIN file. No further re-training or fine-tuning necessary. We DO NOT release the SDK for FREE TPU yet. Instead, we provide the BIN file of typical NN network for you. For users who want to try other NN network, you can send the corresponding **.prototxt and **.caffemodel to us for BIN generation.

EEP NNAPI

By using EEP NNAPI, users can develop your own APPs with TPU inference capability.
We will release the EEP NNAPI In the next coming update.

Development Resource

FPGA BIN

FPGA device	BIN	Version
Xilinx Zynq-7020	boot.BIN	v0.6.0-r28-p16

FREE TPU DO NOT use any PIN from PL side of Zynq chip. Hence, in general, you can use any board including a Xilinx Zynq-7000 series chip to run the FREE TPU. If you are using other FPGA chips, please let us know through issues, we are happy to release corresponding BIT file if possible.

Run steps

Clone this repository

git clone https://github.com/embedeep/Free-TPU

Launch Linux OS (please refer to Free-TPU-OS).
Connect the system through SSH or UART, and execute the demo application as follow (more detail about demo application, please refer to Runtime_Software )

eepdemo_arm  --bin BIN_file --image IMG_file

If everything right, you will see the result through terminal or saved image. Enjoy!

License

MIT LICENSE

Contact

Questions can email us or be left as issues in the repository, we will be happy to answer them.

Contributors

Luo ([email protected])
Zhou ([email protected])
He ([email protected])

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
Deep_Learning_Algorithm		Deep_Learning_Algorithm
Runtime_Software		Runtime_Software
wiki		wiki
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FREE TPU for FPGA

System Structure

HARD IP CORE

TIPS

Benchmark

Software Development Kits

EEP NNAPI

Development Resource

FPGA BIN

Run steps

License

Contact

Contributors

About

Releases

Packages

Languages

License

zyx1996/Free-TPU

Folders and files

Latest commit

History

Repository files navigation

FREE TPU for FPGA

System Structure

HARD IP CORE

TIPS

Benchmark

Software Development Kits

EEP NNAPI

Development Resource

FPGA BIN

Run steps

License

Contact

Contributors

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages