Skip to content

sunnycs/Deep-Learning-Processor-List

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 

Repository files navigation

AI/ML/DL ICs and IPs

Editor T.S.
微信公众号StarryHeavensAbove

Latest updates


Shortcut


<style> table, th, td { border: 1px solid black; } </style>                        
IC GiantsIntel, Qualcomm, Nvidia, AMD, Apple, Xilinx, IBM, STMicroelectronics, HiSilicon9
Cloud/HPCGoogle, Amazon_AWS, Microsoft, Aliyun, Tencent Cloud, Baidu, Baidu Cloud, HUAWEI Cloud, Fujitsu9
IP VendorsARM, Synopsys, Imagination, CEVA, Cadence, VeriSilicon6
Startups in ChinaCambricon, Horizon Robotics, DeePhi, Bitmain, Chipintelli5
Startups WorldwideCerebras, Wave Computing, Graphcore, PEZY, KnuEdge, Tenstorrent, ThinCI, Koniku, Adapteva, Knowm, Mythic, Kalray, BrainChip, AImotive, DeepScale, Leepmind, Krtkl, NovuMind, REM, TERADEEP, DEEP VISION, Groq, KAIST DNPU23

I. IC Giants


Nervana

Intel purchased Nervana Systems who was developing both a GPU/software approach in addition to their Nervana Engine ASIC. Comparable performance is unclear. Intel is also planning in integrating into the Phi platform via a [Knights Crest](https://www.nextplatform.com/2016/11/22/inside-intels-strategy-integrate-nervana-deep-learning-assets/) project. [NextPlatform](https://www.nextplatform.com/2016/08/08/deep-learning-chip-upstart-set-take-gpus-task/) suggested the 2017 target on 28nm may be 55 TOPS/s for some width of OP. There is a NervanaCon Intel has scheduled for December, so perhaps we'll see the first fruits then.

[Intel® Nervana™ Neural Network Processors (NNP) Redefine AI Silicon](https://www.intelnervana.com/intel-nervana-neural-network-processors-nnp-redefine-ai-silicon/) > As our Intel CEO Brian Krzanich discussed earlier today at Wall Street Journal’s D.Live event, Intel will soon be shipping the world’s first family of processors designed from the ground up for artificial intelligence (AI): the [Intel® Nervana™ Neural Network Processor family](https://newsroom.intel.com/editorials/intel-pioneers-new-technologies-advance-artificial-intelligence/) (formerly known as “Lake Crest”). This family of processors is over 3 years in the making, and on behalf of the team building it, I’d like to share a bit more insight on the motivation and design behind the world’s first neural network processor.

Mobileye EyeQ

> Mobileye is currently developing its fifth generation SoC, the [EyeQ®5](https://www.mobileye.com/our-technology/evolution-eyeq-chip/), to act as the vision central computer performing sensor fusion for Fully Autonomous Driving (Level 5) vehicles that will hit the road in 2020. To meet power consumption and performance targets, EyeQ® SoCs are designed in most advanced VLSI process technology nodes – down to 7nm FinFET in the 5th generation.

Movidius

> [MYRIAD 2](https://pdfs.semanticscholar.org/32d5/405ac92a13d7f38e2313574dfd6238125a94.pdf) IS A MULTICORE, ALWAYS-ON SYSTEM ON CHIP THAT SUPPORTS COMPUTATIONAL IMAGING AND VISUAL AWARENESS FOR MOBILE, WEARABLE, AND EMBEDDED APPLICATIONS. THE VISION PROCESSING UNIT INCORPORATES PARALLELISM, INSTRUCTION SET ARCHITECTURE, AND MICROARCHITECTURAL FEATURES TO PROVIDE HIGHLY SUSTAINABLE PERFORMANCE EFFICIENCY ACROSS A RANGE OF COMPUTATIONAL IMAGING AND COMPUTER VISION APPLICATIONS, INCLUDING THOSE WITH LOW LATENCY REQUIREMENTS ON THE ORDER OF MILLISECONDS.

Myriad™ X is the first VPU to feature the Neural Compute Engine - a dedicated hardware accelerator for running on-device deep neural network applications. Interfacing directly with other key components via the intelligent memory fabric, the Neural Compute Engine is able to deliver industry leading performance per Watt without encountering common data flow bottlenecks encountered by other architectures.

FPGA

Intel [FPGA OpenCL](https://www.altera.com/products/design-software/embedded-software-developers/opencl/overview.html) and [Solutions](https://www.altera.com/solutions/technology/artificial-intelligence/solutions.html).

Loihi

Intel's Loihi test chip is the [First-of-Its-Kind Self-Learning Chip](https://newsroom.intel.com/editorials/intels-new-self-learning-chip-promises-accelerate-artificial-intelligence/). > The Loihi research test chip includes digital circuits that mimic the brain’s basic mechanics, making machine learning faster and more efficient while requiring lower compute power. Neuromorphic chip models draw inspiration from how neurons communicate and learn, using spikes and plastic synapses that can be modulated based on timing. This could help computers self-organize and make decisions based on patterns and associations.

Qualcomm has been fussing around ML for a while with the Zeroth SDK and Snapdragon Neural Processing Engine. The NPE certainly works reasonably well on the Hexagon DSP that Qualcomm use. The Hexagon DSP is far from a very wide parallel platform and it has been confirmed by Yann LeCun that Qualcomm and Facebook are working together on a better way in Wired's "The Race To Build An AI Chip For Everything Just Got Real", "And more recently, Qualcomm has started building chips specifically for executing neural networks, according to LeCun, who is familiar with Qualcomm's plans because Facebook is helping the chip maker develop technologies related to machine learning. Qualcomm vice president of technology Jeff Gehlhaar confirms the project. "We're very far along in our prototyping and development," he says." Perhaps we'll see something soon beyond the Kryo CPU, Adreno GPU, Hexagon DSP, and Hexagon Vector Extensions. It is going to be hard to be a start-up in this space if you're competing against Qualcomm's machine learning.

In a recent blog, "We are making on-device AI ubiquitous" shows its AI roadmap.

GPU

Nvidia's latest GPU can do 15 TFlops of SP or 120 TFlops with its new Tensor core architecture which is a FP16 multiply and FP32 accumulate or add to suit ML.

Nvidia is packing up 8 boards into their DGX-1for 960 Tensor TFlops.

Nvidia Volta - 架构看点 gives some insights of Volta architecture.

SoC

On edge, Nvidia provide NVIDIA DRIVE™ PX, [The AI Car Computer for Autonomous Driving](http:https://www.nvidia.com/object/drive-px.html) and JETSON TX1/TX2 MODULE, "[The embedded platform for autonomous everything](http:https://www.nvidia.com/object/embedded-systems-dev-kits-modules.html)".

NVDLA

Nvidia anouced "XAVIER DLA NOW OPEN SOURCE" on GTC2017. We did not see Early Access verion yet. Hopefully, the general release will be avaliable on Sep. as promised. For more analysis, you may want to read [从Nvidia开源深度学习加速器说起](http:https://mp.weixin.qq.com/s/XEb5xNeSV_oPs08kDgQg8Q).


Now the open source DLA is available on Github and more information can be found here.

The NVIDIA Deep Learning Accelerator (NVDLA) is a free and open architecture that promotes a standard way to design deep learning inference accelerators. With its modular architecture, NVDLA is scalable, highly configurable, and designed to simplify integration and portability. The hardware supports a wide range of IoT devices. Delivered as an open source project under the NVIDIA Open NVDLA License, all of the software, hardware, and documentation will be available on GitHub. Contributions are welcome.

The soon to be released [AMD Radeon Instinct MI25](https://instinct.radeon.com/en-us/product/mi/radeon-instinct-mi25/) is promising 12.3 TFlops of SP or 24.6 TFlops of FP16. If your calculations are amenable to Nvidia's Tensors, then AMD can't compete. Nvidia also does twice the bandwidth with 900GB/s versus AMD's 484 GB/s.

AMD has put a very good X86 server processor into the market for the first time in nine years, and it also has a matching GPU that gives its OEM and ODM partners a credible alternative for HPC and AI workload to the combination of Intel Xeons and Nvidia Teslas that dominate hybrid computing these days.

Tesla is reportedly developing its own processor for artificial intelligence, intended for use with its self-driving systems, in partnership with AMD. Tesla has an existing relationship with Nvidia, whose GPUs power its Autopilot system, but this new in-house chip reported by CNBC could potentially reduce its reliance on third-party AI processing hardware.

Apple unveiled the new processor powering the new iPhone 8 and iPhone X - the A11 Bionic. [The A11](https://en.wikipedia.org/wiki/Apple_A11) also includes dedicated neural network hardware that Apple calls a "neural engine", which can perform up to 600 billion operations per second.
[Core ML](https://developer.apple.com/machine-learning/) is Apple's current sulotion for machine learning application.

Xilinx provide "Machine Learning Inference Solutions from Edge to Cloud" and naturally claim their FPGA's are best for INT8 with one of their white papers.

Whilst performance per Watt is impressive for FPGAs, the vendors' larger chips have long had earth shatteringly high chip prices for the larger chips. Finding a balance between price and capability is the main challenge with the FPGAs.

[TrueNorth](http:https://www.research.ibm.com/articles/brain-chip.shtml) is IBM's Neuromorphic CMOS ASIC developed in conjunction with the DARPA [SyNAPSE](https://en.wikipedia.org/wiki/SyNAPSE) program. > It is a manycore processor network on a chip design, with 4096 cores, each one simulating 256 programmable silicon "neurons" for a total of just over a million neurons. In turn, each neuron has 256 programmable "synapses" that convey the signals between them. Hence, the total number of programmable synapses is just over 268 million (228). In terms of basic building blocks, its transistor count is 5.4 billion. Since memory, computation, and communication are handled in each of the 4096 neurosynaptic cores, TrueNorth circumvents the von-Neumann-architecture bottlenecks and is very energy-efficient, consuming 70 milliwatts, about 1/10,000th the power density of conventional microprocessors. [Wikipedia](https://en.wikipedia.org/wiki/TrueNorth)

[ST preps second neural network IC](http:https://www.eenewseurope.com/news/st-preps-second-neural-network-ic-0) > STMicroelectronics is designing a second iteration of the neural networking technology that the company reported on at the International Solid-State Circuits Conference (ISSCC) in February 2017.

ISSCC2017 Deep-Learning Processors文章学习 (一) is a reference.

**Kirin for Smart Phone**
HiSilicon [Kirin 970 Processor](http:https://consumer.huawei.com/minisite/worldwide/huawei-ifa2017-global-launch-event-presentation/) annouced fearturing with dedicated Neural-network Processing Unit. In [this article](https://www.anandtech.com/show/11815/huawei-mate-10-and-mate-10-pro-launch-on-october-16th-more-kirin-970-details),we can find more details about NPU in Kirin970.

Mobile Camera SoC
According to a Brief Data Sheet of Hi3559A V100ESultra-HD Mobile Camera SoC, it has:

Dual-core CNN@700 MHz neural network acceleration engine

II. Internet/Cloud/HPC Giants


Google's original TPU had a big lead over GPUs and helped power DeepMind's AlphaGo victory over Lee Sedol in a Go tournament. The original 700MHz TPU is described as having 95 TFlops for 8-bit calculations or 23 TFlops for 16-bit whilst drawing only 40W. This was much faster than GPUs on release but is now slower than Nvidia's V100, but not on a per W basis. The new TPU2 is referred to as a TPU device with four chips and can do around 180 TFlops. Each chip's performance has been doubled to 45 TFlops for 16-bits. You can see the gap to Nvidia's V100 is closing. You can't buy a TPU or TPU2. Google is making them available for use in their cloud with TPU pods containing 64 devices for up to 11.5 PetaFlops of performance.

Pixel Visual Core is Google’s first custom-designed co-processor for consumer products. It’s built into every Pixel 2, and in the coming months, we’ll turn it on through a software update to enable more applications to use Pixel 2’s camera for taking HDR+ quality pictures.

Other references are:
Google TPU 揭密

Google的神经网络处理器专利

脉动阵列 - 因Google TPU获得新生

Should We All Embrace Systolic Arrays?

> [Amazon EC2 F1](https://aws.amazon.com/ec2/instance-types/f1/?nc1=h_ls) is a compute instance with field programmable gate arrays (FPGAs) that you can program to create custom hardware accelerations for your application. F1 instances are easy to program and come with everything you need to develop, simulate, debug, and compile your hardware acceleration code, including an [FPGA Developer AMI](https://aws.amazon.com/marketplace/pp/B06VVYBLZZ) and [Hardware Developer Kit](https://github.com/aws/aws-fpga) (HDK). Once your FPGA design is complete, you can register it as an Amazon FPGA Image (AFI), and deploy it to your F1 instance in just a few clicks. You can reuse your AFIs as many times, and across as many F1 instances as you like.

Wired did a nice story on the MSFT use of FPGAs too, "[Microsoft Bets Its Future on a Reprogrammable Computer Chip](https://www.wired.com/2016/09/microsoft-bets-future-chip-reprogram-fly/)".

[Inside the Microsoft FPGA-based configurable cloud](https://channel9.msdn.com/Events/Build/2017/B8063) is also a good reference if want to know Microsoft's vision on FPGA in cloud.

[This article "智慧云中的FPGA"](http:https://mp.weixin.qq.com/s/Ti6N1SJ7UDRSQtl869Qvlg) gives and overview about FPGA used in AI aceleration in the cloud.

[Drilling Into Microsoft’s BrainWave Soft Deep Learning Chip](https://www.nextplatform.com/2017/08/24/drilling-microsofts-brainwave-soft-deep-leaning-chip/) shows more details based on Microsoft's presentation on Hot Chips 2017.

[FPGA Cloud server](https://cn.aliyun.com/product/ecs/fpga) (Beta) is an computing instance of a field-programmable gate array (FPGA) that allows users to easily create FPGA design in minutes and create custom, dedicated hardware accelerators based on the Alibaba Cloud Elastic Computing Framework. > 深度学习(Deep Learning)是一种多层计算模型,可以对复杂输入进行建模,在图像分类、语音识别、自然语言处理中取得了成果,FPGA 实例由于其细粒度并行的硬件特性,非常适合小批量数据的深度学习预测过程,以低功耗、低延迟、高性能著称,以 AlexNet 模型为例,使用 FPGA计算实例进行图片类别预测,速度比仅用CPU的普通实例快 8~15 倍。

Tencent cloud introduces [FPGA instance](https://cloud.tencent.com/product/fpga)(Beta), with three different specifications based on Xilinx Kintex UltraScale KU115 FPGA. They will provide more choices equiped with Inter FPGA in the future.

> We’ve written much over the last few years about the company’s emphasis on streamlining deep learning processing, most notably with GPUs, but [Baidu has a new processor](https://www.nextplatform.com/2017/08/22/first-look-baidus-custom-ai-analytics-processor/) up its sleeve called the XPU. For now, the device has just been demonstrated in FPGA, but if it continues to prove useful for AI, analytics, cloud, and autonomous driving the search giant could push it into a full-bore ASIC.

[FPGA Cloud Compute](https://cloud.baidu.com/product/fpga.html) is open for beta test. > 在百度内部,FPGA从2013年开始就应用在许多典型的深度学习模型中,如DNN,RNN,CNN,LSTM等,涵盖了语音识别,自然语言处理,推荐算法,图像识别等广泛的应用领域。百度FPGA云服务器中开放了基于FPGA的深度卷积神经网络加速服务,单卡提供3Tops的定点计算能力,支持典型深度卷积网络算子,如卷积、逆卷积、池化、拼接、切割等,有效加速典型网络结构如VggNet、GoogLeNet、ResNet等。我们基于FPGA的深度学习硬件,深度定制优化了主流深度学习平台如caffe等,用户可以直接将深度学习业务切换到FPGA平台,而无需考虑底层硬件细节。

[FPGA Accelerated Cloud Server](http:https://www.hwclouds.com/product/fcs.html), high performance FPGA instance is open for beta test. > FPGA云服务器提供CPU和FPGA直接的高达100Gbps PCIe互连通道,每节点提供8片Xilinx VU9P FPGA,同时提供FPGA之间高达200Gbps的Mesh光互连专用通道,让您的应用加速需求不再受到硬件限制。

> This [DLU that Fujitsu is creating](https://www.nextplatform.com/2017/08/09/fujitsu-bets-deep-leaning-hpc-divergence/) is done from scratch, and it is not based on either the Sparc or ARM instruction set and, in fact, it has its own instruction set and a new data format specifically for deep learning, which were created from scratch. > Japanese computing giant Fujitsu. Which knows a thing or two about making a very efficient and highly scalable system for HPC workloads, as evidenced by the K supercomputer, does not believe that the HPC and AI architectures will converge. Rather, the company is banking on the fact that these architectures will diverge and will require very specialized functions.

III. Traditional IP Vendors


[DynamIQ](http:https://pages.arm.com/dynamiq-technology.html) is embedded IP giant's answer to AI age. It may not be a revolutionary design but is important for sure.

ARM also provide a open source [Compute Library](https://developer.arm.com/technologies/compute-library) contains a comprehensive collection of software functions implemented for the Arm Cortex-A family of CPU processors and the Arm Mali family of GPUs.

[DesignWare EV6x Embedded Vision Processors](https://www.synopsys.com/company/newsroom/mnr/ev6x-processors-news-release.html)

[处理器IP厂商的机器学习方案 - Synopsys](http:https://mp.weixin.qq.com/s/Y4BvzmH67OaTWc_2SXIVGg)

> [Imagination reveals PowerVR Neural Network Accelerator (NNA) with 2x the performance and half the bandwidth of nearest competitor](https://www.imgtec.com/news/press-release/imagination-reveals-powervr-neural-network-accelerator-nna-with-2x-the-performance-and-half-the-bandwidth-of-nearest-competitor/)

PowerVR Series2NX Neural Network Accelerator

[CEVA-XM6 Fifth-generation computer vision and deep learning embedded platform](http:https://www.ceva-dsp.com/product/ceva-xm6/)

[处理器IP厂商的机器学习方案 - CEVA](http:https://mp.weixin.qq.com/s/rosyXJew4B0NvzY73uHz5w)

[Tensilica Vision DSPs for Imaging, Computer Vision, and Neural Networks](https://ip.cadence.com/vision&CMP=TIP_BB_CDN_Vis_0501_C5_PP)

[VeriSilicon’s Vivante VIP8000 Neural Network Processor IP Delivers Over 3 Tera MACs Per Second](http:https://www.verisilicon.com/newsdetail_499_VivanteVIP8000.html)

[神经网络DSP核的一桌麻将终于凑齐了](https://mp.weixin.qq.com/s/1W8mAMR9xaljZPLyEW0Xmw)

IV. Startups in China


[Cambricon](http:https://www.cambricon.com/en/) successfully [raised record $100M round A](https://www.chinamoneynetwork.com/2017/08/18/chinas-state-development-investment-corp-leads-100m-round-in-ai-chip-maker-cambricon) resently, providing solutions covering IP licensing, Chip Service, Smart Card and Intelligent Platform.

[Horizon Robotics](http:https://www.horizon.ai/) has a [Brain Processing Unit](http:https://www.horizon.ai/programme.php) (BPU) in the works. > Chinese startup Horizon Robotics joined forces with Intel to showcase an Advanced Driver Assistance System (ADAS) at CES 2017 which is a single camera that can see stuff and identify it. They also teamed up with Midea and launched an intelligent [air conditioner](http:https://www.midea.com/global/products/air_conditioning/residential_air_conditioner/ac_split/201403/t20140309_70731.shtml) that no doubt solves loads of first world problems including home security. No news yet on when we can expect their Brain Processing Unit (BPU) to debut but it was supposed to be “[in early 2017](http:https://www.scmp.com/tech/article/1921396/chinese-start-track-deliver-artificial-intelligence-chip)“. The founder and CEO of Horizon Robotics, Dr. Kai YU, used to be the Head of Baidu Institute of Deep Learning so we’re expecting some great things from this startup which has taken in an undisclosed amount of funding so far.

[DeePhi Tech](http:https://www.deephi.com/) has the cutting-edge technologies in deep compression, compiling toolchain, deep learning processing unit (DPU) design, FPGA development, and system-level optimization. This nextplatform arcicle ["FPGA Startup Gathers Funding Force for Merged Hyperscale Inference"](https://www.nextplatform.com/2017/05/22/fpga-startup-gathers-funding-force-merged-hyperscale-inference/) gave more information of the company.

Bitcoin Mining Giant [Bitmain](https://www.bitmain.com/) is developing processors for both training and inference tasks. > [Bitmain’s newest product, the Sophon, may or may not take over deep learning](https://qz.com/1053799/chinas-bitmain-dominates-bitcoin-mining-now-it-wants-to-cash-in-on-artificial-intelligence/). But by giving it such a name Zhan and his Bitmain co-founder, Jihan Wu, have signaled to the world their intentions. The Sophon unit will include Bitmain’s first piece of bespoke silicon for a revolutionary AI technology. If things go to plan, thousands of Bitmain Sophon units soon could be training neural networks in vast data centers around the world.

[Chipintelli's](http:https://www.chipintelli.com/?_l=en) first IC, [CI1006](http:https://www.chipintelli.com/CI1006?_l=en), is designed for automatic speech recognition application.

V. Startups Worldwide


[Cerebras](https://www.cerebras.net/) is notable due to its backing from Benchmark and that its founder was the CEO of SeaMicro. It appears to have [raised $25M](https://techcrunch.com/2016/12/19/a-stealthy-startup-called-cerebras-raised-around-25-million-to-build-deep-learning-hardware/) and remains in stealth mode.

Wave’s Compute Appliance is capable to run TensorFlow at 2.9 PetaOPS/sec on their 3RU appliance. Wave refers to their processors at DPUs and an appliance has 16 DPUs. Wave uses processing elements it calls Coarse Grained Reconfigurable Arrays (CGRAs). It is unclear what bit width the 2.9 PetaOPS/s is referring to. Some details can be fund in their  [white paper](http:https://wavecomp.ai/technology/).

After HotChips 2017, in the next plateform article "[First In-Depth View of Wave Computing’s DPU Architecture, Systems](https://www.nextplatform.com/2017/08/23/first-depth-view-wave-computings-dpu-architecture-systems/)", more details were discussed.

[Graphcore](https://www.graphcore.ai/) raised $30M of Series-A late last year to support the development of their Intelligence Processing Unit, or IPU. Resently, co-founder and Chief Technology Officer, Simon Knowles, was invited to give [a talk](https://youtu.be/T8DvHnb3Y9g) at the 3rd Research and Applied AI Summit (RAAIS) in London, showing [interesting ideas](https://www.graphcore.ai/posts/how-to-build-a-processor-for-machine-intelligence-part-2) behind their processor.

[解密又一个xPU:Graphcore的IPU](https://mp.weixin.qq.com/s/CH9h8dUtoNK_2ZfkK5YU0g) give some analysis on its IPU architecture.

Pezy-SC and Pezy-SC2 are the 1024 core and 2048 core processors that [Pezy](http:https://pezy.co.jp/en/index.html) develop. The Pezy-SC 1024 core chip powered the top 3 systems on the Green500 list of supercomputers back in 2015. The [Pezy-SC2](https://en.wikichip.org/wiki/pezy/pezy-sc2) is the follow up chip that is meant to be delivered by now, but details are scarce yet intriguing, > "PEZY-SC2 HPC Brick: 32 of PEZY-SC2 module card with 64GB DDR4 DIMM (2.1 PetaFLOPS (DP) in single tank with 6.4Tb/s" It will be interesting to see what 2,048 MIMD MIPS Warrior 64-bit cores can do. In the [June 2017 Green500 list](https://www.top500.org/green500/list/2017/06/), a Nvidia P100 system took the number one spot and there is a Pezy-SC2 system at number 7. So the chip seems alive but details are thin on the ground. [Motoaki Saito](https://wired.jp/special/2016/motoaki-saito/) is certainly worth watching.

[Their product page](https://www.knuedge.com/) has since June 2016 gone missing in action. Not sure what they are up to with the $100M they put into their [MIMD architecture](https://www.hpcwire.com/2016/12/15/knupath-hermosa-chip-expected-first-half-2017/). It was described at the time as having 256 tiny DSP, or tDSP, cores on each ASIC along with an ARM controller suitable for sparse matrix processing in a 35W envelope. > Since KnuEdge “emerged from stealth” last year, the company has gone quiet and not offered up any additional information about what they’ve been up to. According to an [article in VentureBeat](https://venturebeat.com/2016/06/06/former-nasa-chief-unveils-100-million-neural-computing-chip-company-knuedge/), we know that KnuEdge has already been generating revenue and that they were considering raising more funding this year in addition to the $100 million in “patient money” they have already raised. Their website contains next to no information aside from employee profiles. At an [Xconomy conference](http:https://www.xconomy.com/san-diego/2017/05/02/smarr-others-talk-healthtech-ai-at-xconomys-impact-of-innovation/) a few weeks ago, the Company talked about “cloud-based machine intelligence as a service” that is “supposed to be rolled out sometime this year“.

[Tenstorrent](http:https://tenstorrent.com/index.html) is a small Canadian start-up in Toronto claiming an order of magnitude improvement in efficiency for deep learning, like most. No real public details but they're are on the [Cognitive 300 list](http:https://www.cogniteventures.com/the-cognitive-computing-startup-list/).

[ThinCI](http:https://thinci.com/index.html) is [developing vision processors](https://venturebeat.com/2016/10/06/thinci-teams-with-denso-to-create-vision-processors-with-100x-performance-improvements/) from Sacremento with employees in India too. They claim to be at the point of first silicon, Thinci-tc500, along with benchmarking and winning of customers already happening. Apart from "doing everything in parallel" we have little to go on. > Founded in 2010, Eldorado Hills, California startup ThinCI has taken in an undisclosed amount of funding to develop a technology that will bring vision processing to all devices. The ability for smart devices to have functionality like computer vision that doesn’t require regular communication to the cloud is referred to as “edge computing” or “fog computing”. That’s where ThinCI wants to play.

[Koniku's web site](http:https://koniku.io/) is counting down to "your new reality". They have raised very little money and after watching their Youtube clip embedded in this [Forbes page](https://www.forbes.com/sites/federicoguerrini/2016/05/22/neuron-based-chips-will-soon-become-commonplace-this-startup-founder-says/2/#299122784f1f), you too will not likely not be convinced, but you never know. Harnessing biological cells is certainly different. It sounds like [a science project](https://motherboard.vice.com/en_us/article/d7ypqw/komiku-neuron-computer-agabi), but, then this, > "We are a business. We are not a science project," Agabi, who is scheduled to speak at the Pioneers Festival in Vienna, next week, says, "There are demands that silicon cannot offer today, that we can offer with our systems." > The core of the Koniku offer is the so-called neuron-shell, inside which the startup says it can control how neurons communicate with each other, combined with a patent-pending electrode which allows to read and write information inside the neurons. All this packed in a device as large as an iPad, which they hope to reduce to the size of a nickel by 2018.

[Adapteva](http:https://www.adapteva.com/): "[Adapteva tapes out Epiphany-V: A 1024-core 64-bit RISC processor](https://meanderful.blogspot.jp/2016/10/adapteva-tapes-out-epiphany-v-1024-core.html)." Andreas Olofsson taped out his 1024 core chip late last year and we await news of its performance. [Epiphany-V](http:https://www.parallella.org/docs/e5_1024core_soc.pdf) has new instructions for deep learning and we'll have to see if this memory-controller-less design with 64MB of on-chip memory will have appropriate scalability. The impressive efficiency of Andrea's design and build may make this a chip we can all actually afford, so let's hope it performs well.

[Knowm](http:https://knowm.org/) is actually setup as a .ORG but they appear to be pursuing a for-profit enterprise. The New Mexcio startup has taken in an undisclosed amount of seed funding so far to develop a new computational framework called [AHaH Computing](http:https://knowm.org/ahah-computing/) (Anti-Hebbian and Hebbian). The gory details can be found in [this publication](http:https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0085175), but the short story is that this technology aims to reduce the size and power consumption of intelligent machine learning applications by up to 9 orders of magnitude.

A battery powered neural chip from [Mythic](https://www.mythic-ai.com/technology/) with 50x lower power. > Founded in 2012, Texas-based startup Mythic (formerly known as Isocline) has taken in $9.5 million in funding with Draper Fisher Jurvetson as the lead investor. Prior to receiving any funding, the startup has taken in [$2.5 million in grants](https://techcrunch.com/2017/03/22/mythic-launches-a-chip-to-enable-computer-vision-and-voice-control-on-any-device/). Mythic is developing an AI chip that “puts desktop GPU compute capabilities and deep neural networks onto a button-sized chip – with 50x higher battery life and far more data processing capabilities than competitors“. Essentially, that means you can give voice control and computer vision to any device locally without needing cloud connectivity.

Despite many promises,[Kalray](http:https://www.kalrayinc.com/) has not progressed their chip offering beyond the 256 core beast covered back in 2015, "[Kalray - new product meander](https://meanderful.blogspot.jp/2015/06/kalray-new-product-meander.html)." Kalray is advertising their product as suitable for embedded self-driving car applications. Kalray has a [Kalray Neural Network](http:https://www.eenewseurope.com/news/kalray-turns-neural-networks) (KaNN) software package and claims better efficiency than GPUs with up to 1 TFlop/s on chip.

Kalrays NN fortunes may improve with an imminent product refresh and just this month Kalray completed a new funding that raised $26M. The new [Coolidge processor]((http:https://www.eenewseurope.com/news/kalray-turns-neural-networks)) is due in mid-2018 with 80 or 160 cores along with 80 or 160 co-processors optimised for vision and deep learning.

BrainChip Inc (CA. USA) was the first company to offer a [Spiking Neural processor](http:https://www.brainchipinc.com/technology), which was patented in 2008 (patent US 8,250,011). The current device, called the BrainChip Accelerator is a chip intended for rapid learning. It is offered as part of the BrainChip Studio software. BrainChip is a publicly listed company as part of BrainChip Holdings Ltd.

[This BDTi artical](https://www.bdti.com/InsideDSP/2017/07/27/AImotive) shows some information of aiWare IP of [Aimotive](https://aimotive.com/what-we-do/#aiware) . > Speaking of chips, AImotive and partner VeriSilicon are in the process of designing a 22 nm FD-SOI test chip, which is forecast to come out of GlobalFoundries' fab in Q1 2018 (Figure 4). It will feature a 1 TMAC/sec aiWare core, consuming approximately 25 mm2 of silicon area; a Vivante VIP8000-derivative processor core will inhabit the other half of the die, and between 2-4 GBytes of DDR4 SDRAM will also be included in the multi-die package. The convolution-tailored LAM in this test chip, according to Feher, will have the following specifications (based on preliminary synthesis results): > 2,048 8x8 MACs > Logic area (including input/output buffering logic, LAM control and MACs): 3.45mm2 > Memory (on-chip buffer): in the range of 5-25mm2 depending on configuration (10-50 Mbits). Another interesting activity of Aimotive is [Neural Network Exchange Format (NNEF)](https://www.khronos.org/nnef).

[DeepScale raises $3 million for perception AI to make self-driving cars safe](https://techcrunch.com/2017/03/21/deepscale-raises-3-million-for-perception-ai-to-make-self-driving-cars-safe/)

> [Leepmind](http:https://www.leapmind.io/products.php) is carrying out research on original chip architectures in order to implement Neural Networks on a circuit enabling low power DeepLearning

> While it is not actually possible to pick a worse name for your startup than “[krtkl](http:https://krtkl.com/)”, at least the product name is manageable. Snickerdoodle is “reconfigurable hardware for building intelligent systems” (think Raspberry Pi). A crowdfunding effort for Snickerdoodle raised $224,876 and they’re currenty shipping. If you pre-order one, they’ll deliver it by summer. The palm-sized unit uses the Zynq “System on Chip” (SoC) from Xilinix.

> NovuMind combines big data, high-performance, and heterogeneous computing to change the Internet of Things (IoT) into the Intelligent Internet of Things (I²oT). [this video](https://www.youtube.com/watch?v=TGQGStPoNu4) is the description and demos of NovuMind FPGA AI Accelerator.

REM

[Reduced Energy Microsystems](http:https://www.remicro.com/) are developing lower power asynchronous chips to suit CNN inference. REM was Y Combinator's first ASIC venture according to [TechCrunch](https://techcrunch.com/2017/03/16/reduced-energy-microsystems-pits-startup-chip-chops-against-industry-giants/).

TERADEEP

[TeraDeep](https://www.teradeep.com/) is building an AI Appliance using its deep learning FPGA’s acceleration. The company claims image recognition performance on AlexNet to achieve a 2X performance advantage compared with large GPUs, while consuming 5X less power. When compared to Intel’s Xeon processor, TeraDeep’s Accel technology delivers 10X the performance while consuming 5X less power.

DEEP VISION

[Deep Vision](http:https://deepvision.io/) is bulding low-power chips for deep learning. Perhaps one of these papers by the founders have clues, "[Convolution Engine: Balancing Efficiency & Flexibility in Specialized Computing](http:https://csl.stanford.edu/~christos/publications/2013.convolution.isca.pdf)" [2013] and "[Convolution Engine: Balancing Efficiency and Flexibility in Specialized Computing](http:https://csl.stanford.edu/~christos/publications/2015.convolution_engine.cacm.pdf)" [2015].

Groq

[Groq](https://www.cnbc.com/2017/04/20/ex-googlers-left-secretive-ai-unit-to-form-groq-with-palihapitiya.html) is founded by Ex-googlers, who designed Google TPU.

KAIST DNPU

[Face Recognition System “K-Eye” Presented by KAIST](http:https://www.kaist.ac.kr/_prog/_board/?code=ed_news&mode=V&no=65402&upr_ntt_no=65402&site_dvs_cd=en&menu_dvs_cd=)

[从ISSCC Deep Learning处理器论文到人脸识别产品](https://zhuanlan.zhihu.com/p/28328046)

Reference


  1. FPGAs and AI processors: DNN and CNN for all
  2. 12 AI Hardware Startups Building New AI Chips
  3. Tutorial on Hardware Architectures for Deep Neural Networks
  4. Neural Network Accelerator Inference

About

A list of ICs and IPs for AI, Machine Learning and Deep Learning.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages

  • PHP 100.0%