Free TPU is the free version of a commercial TPU design for Deep Learning EDGE Inference, which can deploy at any FPGA device including Xilinx Zynq-7020 or Kintex7-160T (both good choices for production). Actually, not just only a TPU logic design, the Free TPU also include the EEP Accelerating Framework supporting all caffe layers, which can run at any CPU (such as the ARM A9 of Zynq-7020 or INTEL/AMD). TPU and CPU co-work with each other under the schedule of the Deep Learning Inference Framework (any order of alternation). Then, you can do anything you want with FREE! (We release the Free TPU with MIT LICENSE). After a comprehensive pressure testing, we are finally ready to formally release the FREE-TPU with commercial quality. Enjoy it!
More details, please visit https://www.embedeep.com
FREE-TPU | EEP-TPU-FS020 | EEP-TPU-FS035-152 | EEP-TPU-FS035-272 | EEP-TPU-FS035-528 | EEP-TPU-FS035-1040 | |
---|---|---|---|---|---|---|
Binding Hardware | No | EEP-TPU-M020 | EEP-TPU-EV-S Kits | EEP-TPU-EV-S Kits | EEP-TPU-EV-S Kits | EEP-TPU-EV-S Kits |
Target Device | Xilinx xc7z020-1 | Xilinx xc7z020-1 | Xilinx xc7z035-2 | Xilinx xc7z035-2 | Xilinx xc7z035-2 | Xilinx xc7z035-2 |
FPGA utilization | LUT: 62.96% FF: 56.08% BRAM: 80.00% DSP: 73.18% |
LUT: 66.12% FF: 56.92% BRAM: 80.00% DSP: 73.18% |
LUT: 26.54% FF: 23.34% BRAM: 37.40% DSP: 17.89% |
LUT: 26.52% FF: 23.35% BRAM: 37.40% DSP: 31.89% |
LUT: 28.89% FF: 24.02% BRAM: 37.40% DSP: 59.67% |
LUT: 69.85% FF: 40.90% BRAM: 40.60% DSP: 100% |
TPU Layer set | LS-V1.0 | LS-V2.0 | LS-V2.0 | LS-V2.0 | LS-V2.0 | LS-V2.0 |
FP16 MACs | 152 | 152 | 152 | 272 | 528 | 1040 |
Frequency | 100 MHz | 100 MHz | 200 MHz | 200MHZ | 200MHZ | 200MHZ |
AXI port | 1 | 1 | 2 | 2 | 2 | 2 |
Off-chip DDR bandwidth | 12.8 Gbps (Shared) | 12.8 Gbps (Shared) | 51.2 Gbps (Dedicated) | 51.2 Gbps (Dedicated) | 51.2 Gbps (Dedicated) | 51.2 Gbps (Dedicated) |
On-chip Memory | 256 KByte | 256 KByte | 512 KByte | 512KByte | 512KByte | 512KByte |
OS | Linux | Linux | Linux | Linux | Linux | Linux |
Compiler | NO | YES | YES | YES | YES | YES |
EEP Accelerating Framework | YES | YES | YES | YES | YES | YES |
EEP NNAPI | YES | YES | YES | YES | YES | YES |
Android NNAPI | NO | Release in future | Release in future | Release in future | Release in future | Release in future |
License | MIT | Commercial | Commercial | Commercial | Commercial | Commercial |
From the view of users, Free-TPU and EEP-TPU has the same functionality, but with different inference time. If the inference time of Free-TPU is NOT good enough for your applications, you can contact us in any time. We will be happy to share our experience about algorithm, software or hardware with you.
- We plan to release two implementations targeted to two different FPGA chip: Xilinx Zynq-7020 and Xilinx Kintex7-160T. For now, the implementation for Xilinx Zynq-7020 has been released.
- For FREE-TPU users who do not have a FPGA board with Xilinx Zynq-7020 chip, you are free to choose a FPGA board according to the supporting board lists declared at the Appendix section of Free-TPU-OS, or you can choose our official $99 EEP-TPU-M020 board.
- For users who want to try any other NN network, you can send the corresponding **.prototxt and **.caffemodel to us for BIN generation, or, our official $99 EEP-TPU-M020 board that including a EEP-TPU-Compiler software is ready for you.
- For advanced users who intend to modify the logic design in FPGA, including add new IP in PL side of Zynq (such as a video DMA), use PCIe based hardware scheme or configure the EEP-TPU with different params, we are providing three kinds of $999 EEP-TPU-EV-S/A/H Kits that including one Soft IP which CAN be merged into user design through encrypted code.
- For advanced users who intend to modify the hardware but had limited experience, we are providing the $1999/year technical support that including the EEP-TPU hardware customization service.
- Please visit https://www.embedeep.com for more info, which including the technical detail of FREE TPU, the definition of Layer set, and the product of EMBEDEEP which could be helpful for your research and development.
667MHZ ARM A9 with NEON |
FREE-TPU on EEP-TPU-M020 |
EEP-TPU-FS020 on EEP-TPU-M020 |
EEP-TPU-FS035-152 on EEP-TPU-EV-S |
EEP-TPU-FS035-272 on EEP-TPU-EV-S |
EEP-TPU-FS035-528 on EEP-TPU-EV-S |
|
---|---|---|---|---|---|---|
FP16 computing resource | NA | 30.4 GOPS | 30.4 GOPS | 60.8 GOPS | 108.8 GOPS | 211.2 GOPS |
on-chip memory | NA | 256 KByte | 256 KByte | 512 KByte | 512 KByte | 512 KByte |
off-chip DDR bandwidth | 33 Gbps (Shared) |
12.8 Gbps (Shared) |
12.8 Gbps (Shared) |
51.2 Gbps (Dedicated) |
51.2 Gbps (Dedicated) |
51.2 Gbps (Dedicated) |
lenet-5 | 4.6ms | 1.359ms | 1.345ms | 0.703ms | 0.657ms | 0.619ms |
Mobilenet-V1 | 768ms | 74.754ms | 74.739ms | 40.731ms | 30.196ms | 25.633ms |
Mobilenet-V1 with mergeBN | 679.6ms | 66.571ms | 66.564ms | 34.391ms | 23.883ms | 19.317ms |
Mobilenet-V2 | 810ms | 83.896ms | 83.867ms | 48.468ms | 41.337ms | 38.470ms |
Mobilenet-V2 with mergeBN | 660ms | 69.540ms | 69.541ms | 37.290ms | 30.153ms | 27.255ms |
Squeezenet-v1.1 | 416.4ms | 46.216ms | 46.235ms | 25.356ms | 17.890ms | 14.895ms |
Resnet-50 with mergeBN | 4753ms | 367.696ms | 367.674ms | 188.413ms | 115.299ms | 82.900ms |
Inception-V3 | 21493.7ms | 516.362ms | 516.539ms | 265.305ms | 154.225ms | 102.281ms |
Mobilenet-YOLOV3 | 2058.6ms | 207.849ms | 179.422ms | 91.311ms | 59.827ms | 44.578ms |
ICNet | 3347.8ms | 1342.354ms | 228.263ms | 122.257ms | 83.348ms | 64.173ms |
- FREE-TPU and EEP-TPU all use FP16 model to get the same accuracy with the original FP32 model.
- The inference time includes the time to fetch an image from HOST memory, computing, and export the result to HOST memory.
Free TPU SDK imports the trained model from CAFFE (**.prototxt and **.caffemodel) directly to generate the BIN file. No further re-training or fine-tuning necessary. We DO NOT release the SDK for FREE TPU yet. Instead, we provide the BIN file of typical NN network for you. For users who want to try other NN network, you can send the corresponding **.prototxt and **.caffemodel to us for BIN generation.
By using EEP NNAPI, users can develop your own APPs with TPU inference capability.
We will release the EEP NNAPI In the next coming update.
FPGA device | BIN | Version |
---|---|---|
Xilinx Zynq-7020 | boot.BIN | v0.6.0-r28-p16 |
FREE TPU DO NOT use any PIN from PL side of Zynq chip. Hence, in general, you can use any board including a Xilinx Zynq-7000 series chip to run the FREE TPU. If you are using other FPGA chips, please let us know through issues, we are happy to release corresponding BIT file if possible.
- Clone this repository
git clone https://github.com/embedeep/Free-TPU
-
Launch Linux OS (please refer to Free-TPU-OS).
-
Connect the system through SSH or UART, and execute the demo application as follow (more detail about demo application, please refer to Runtime_Software )
eepdemo_arm --bin BIN_file --image IMG_file
- If everything right, you will see the result through terminal or saved image. Enjoy!
MIT LICENSE
Questions can email us or be left as issues in the repository, we will be happy to answer them.
Luo ([email protected])
Zhou ([email protected])
He ([email protected])