A guide to AI-Chip-related articles (on my Weichat blog)

5 min readMar 12, 2019

AI chip/IP, specialized hardware supporting AI(ML/DL) application, became a hot topic after we saw Google’s TPU. I write 40+ articles related to it on my Weichat blog: StarryHeavensAbove. Although the articles are in Chinese, I think Google translate can help you to get the basic ideas. Following is a guide for you to find something you may like. Enjoy!

Industry Background

AI芯片0.5与2.0：I started writing AI hardware related articles from ISSCC 2017, and it has been two years now. In the ISSCC2019, the AI chip is still a hot topic, and several sessions are related to AI hardware. At the same time, the Compilers for Machine Learning Workshop of the CGO19 conference, a variety of ML compilers have been presented. From the perspective of the whole industry, the first-generation AI chip’s software and hardware are basically mature, the competitive landscape is gradually clearer, and it is ready for large scale applications, which can be called the 0.5 version of the AI chip. At the ISSCC meeting, Professor Yann LeCun put forward the demand for future AI chips in the speech, which opened up our thinking about the new architecture (AI chip 2.0).

从AI Chip到AI Chiplet: Recently, the concept of chiplet has heated up. From DARPA’s CHIPS project to Intel’s Foveros, chiplet is regarded as an important technology for future chips. To put it simply, chiplet technology is like using basic blocks, some pre-produced dies, to build a system chip by advanced integration technologies (such as 3D integration). And these basic dies are chiplets. In this sense, chiplet is a new IP reuse mode. In the future, chips integrated in chiplet mode will be a “super” heterogeneous system that will bring more flexibility and new opportunities to AI computing.

AI芯片开年：This article was written at the beginning of 2018 with some observations and predictions about AI chips.

2017 • AI芯片元年：We believe that the year 2017 was the start point of a new era of AI chip (dedicated hardware for AI, machine learning, deep learning). This article was a summary of what happened in 2017.

中国初创公司在AI芯片（IP）领域的机会: This article was written in the mid of 2017, talking about the chances for the AI chip startups in China. Some information may not up to date, but I think most of the ideas are still true.

黄金时代: This article was talking about the Golden age of computer architecture.

“全栈”开源的VTA会给AI芯片产业带来什么？: Full-stack open source solution TVM/VTA for AI software and hardware is a very interesting work. I discussed the possible impacts on the industry in this article.

AI Inference芯片 ∙ 血战开始: Competition on AI inference hardware is fierce. Players, from giants to startups, announces new product one after another. This article shows some interesting examples.

AI芯片在5G中的机会: AI chip/IP may be used 5G eco-systems in different ways. I discussed two of them in this article.

Summary of Hot Chips 30

The 2018 Hotchips conference is the 30th and is the big stage for AI chips. The following three articles cover some interesting observations of it：

Hot Chips 30，黄金时代的缩影: The epitome of the golden age:

Hot Chips 30 — 机器学习: Machine learning

Hot Chips 30 — 巨头们亮“肌肉”: Giants

AI chip/IP classification and basic technology

I think the hardware of AI/ML/DL can be roughly classified as below with their target domain.

From the AI function point of view, they can be divided into Training and Inference. If we look at the applications scenarios, they can be divided into “Cloud / Data Center”, “Edge” and “End User Equipment”. From one target domain to another, the requirements and constraints may change a lot. In order to meet the needs of AI applications, we have done a lot of exploration on AI hardware. For details, please refer to the following articles:

深度神经网络的模型·硬件联合优化: A summary of optimization techniques used in AI hardware.

AI会给芯片设计带来什么？: With AI as a huge driving force for semiconduction industry, how chip design methodology can be improved.

从ISCA论文看AI硬件加速的新技巧: In ISCA 2018, I see some new approaches to design AI hardware accelerators.

AI Chip list

Start from 2016, we have been seeing a lot of AI chips/IPs. As a reference, I maintain a list on Github, AI/ML/DL ICs and IPs

AI Chip in Cloud

AI acceleration is deploying in Cloud for training and inference. For people who are interested in the technology in this area, the following articles may help。

如何设计一颗40PFLOPS量级的AI芯片？：Is it possible to design a “single-chip” to provide 40 PFLOS machine learning computation power？

Google TPU3 看点: About Google TPU3

Google TPU 揭密: What I learned from Google’s paper about TPU

脉动阵列 — 因Google TPU获得新生: More discussions about systolic array in Google TPU

从Nvidia开源深度学习加速器说起, Some thoughts about Nvidia open source NVDLA

解密又一个xPU：Graphcore的IPU, about Graphcore’s AI Chip

Graphcore AI芯片：更多分析, more about Graphcore’s AI chip

Another interesting option in the cloud is FPGA acceleration. The following article can give you some basics.

智慧云中的FPGA

ISSCC 2017 Deep Learning Processor

ISSCC 2017 Deep Learning Processor session covered many interesting topics about how to design AI accelerator. Following articles included what I learned.

“A 2.9TOPS/W Deep Convolutional Neural Network SoC in FD-SOI 28nm for Intelligent Embedded Systems”

ISSCC2017 Deep-Learning Processors文章学习（一）

“DNPU: An 8.1TOPS/W Reconfigurable CNN-RNN Processor for General-Purpose Deep Neural Networks”

ISSCC2017 Deep-Learning Processors文章学习（二）

“A 28nm SoC with a 1.2GHz 568nJ/Prediction Sparse Deep-Neural-Network Engine with >0.1 Timing Error Rate Tolerance for IoT Applications”

ISSCC2017 Deep-Learning Processors文章学习（四）

“A Scalable Speech Recognizer with Deep-Neural-Network Acoustic Models and Voice-Activated Power Gating”

分析一下MIT的智能语音识别芯片

“ENVISION: A 0.26-to-10TOPS/W Subword-Parallel Dynamic-Voltage-Accuracy-Frequency-Scalable Convolutional Neural Network Processor in 28nm FDSOI”

ISSCC2017 Deep-Learning Processors文章学习（三）

“A 0.62mW Ultra-Low-Power Convolutional-Neural-Network Face-Recognition Processor and a CIS Integrated with Always-On Haar-Like Face Detector”

ISSCC2017 Deep-Learning Processors文章学习（七）

“A 288μW Programmable Deep-Learning Processor with 270KB On-Chip Weight Storage Using Non-Uniform Memory Hierarchy for Mobile Intelligence

ISSCC2017 Deep-Learning Processors文章学习（五）

Software Stack

In the field of Deep Learning, in addition to the optimization of hardware architecture, the software stack plays a crucial role in achieving better performance and efficiency. Following article was about the competition on the IR (Intermediate Representation).

Deep Learning的IR“之争”

AI Chip Requirements and Benchmarking

浅析图像视频类AI芯片的灵活度, discussed the flexibility we need for designing AI chips for image/video applications.

语音及文本类AI芯片的需求分析, discussed the requirements of AI chips for voice/text applications.

从NNVM和ONNX看AI芯片的基础运算算子, discussed the fundamental computations in AI chips, using ONNX as a reference.

给DNN处理器跑个分 — 指标篇, DNN processor benchmark metrics

给DNN处理器跑个分 — 设计篇, how to design DNN processor benchmarks

如何评测AI系统？，how to evaluate AI systems

Other Discussions

AI芯片架构的争论真有意义吗？, What is the most important ingredient for a successful AI chip?

Hot (AI) Chips 2017, summarizing Hot Chips 2017

“传说中”的异步电路是否能在AI芯片中异军突起？, If the “mysterious” asynchronous (clock-less) circuits will stand out.

Application Specific Processor

Following articles cover different aspects of designing Application Specific Processor and Domain-Specific design methodology.

当我们设计一个专用处理器的时候我们在干什么？（上）

当我们设计一个专用处理器的时候我们在干什么？（指令集）

当我们设计一个专用处理器的时候我们在干什么？（微结构）

专用处理器设计方法&工具

当我们设计一个专用处理器的时候我们在干什么？（风险）

自己动手设计专用处理器！

Disclaimer：

These articles were written on Weichat Blog platform. Please respect the copyright.
These articles do not reflect the opinions of my employer and any organizations I am affiliated with.
These articles are based on information on public domain and are accurate and true to the best of my knowledge, but that there may be omissions, errors or mistakes.
These articles are only for informational purposes and shouldn’t be seen as any kind of advice.