AI Chipsets: where China still needs to prove itself

Zhu Jia
9 min readDec 13, 2018

--

Much has been written about China’s potential global dominance in AI tech. Some of it hype. Some of it hearsay. And much of it generalizations of what is happening on the ground in the country of 1.3 billion people. But any country aiming to be an AI leader needs to develop industries around both programming and hardware, as well as practical applications. China has yet to prove itself capable of doing so.

The country has thus far relied on imported supercomputer chips. But in 2015, the US government banned Intel, Nvidia, and AMD from supplying high-end chips to parts of China’s supercomputer industry, including defense and government sectors.

This has taught China that while it may have tailed other countries in the development of computer and mobile phone chips, it has to lead the game in AI processors. After all, some of its largest industries — agriculture, accommodation, food, and manufacturing — are among those most ready for AI disruption, being largely driven by routine, programmable tasks. McKinley Global Institute estimates that 50 percent of work activities in the country can be automated.

To usher in a new era of intelligent machines, Chinese companies are trying to develop AI-optimized chipsets. Tech giants like Baidu and Huawei, and startups like Horizon Robotics and Cambricon Technologies, are among the early players in this field.

And it’s not just private companies — universities are also driving R&D. Tsinghua University, for instance, is opening an AI research centre and has hired Google’s AI Chief Jeff Dean as an advisor.

But what does this landscape really look like for China?

In this article, we’ll look at the complexity of AI chipset development and the different efforts by Chinese companies. In our next one, we’ll dig into the key opportunities and trends in this space.

Applications of AI

Deep learning is now disrupting completely orthogonal industries like agriculture and enterprise.

Let’s look at two companies in the Vertex Ventures Israel portfolio. Taranis is a crop management solutions provider. The company works directly with farms to improve their yields and crop management. It employs algorithms that use deep learning to continually improve accuracy in pest and disease prediction. Kryon Systems uses patented visual and deep learning technologies to power intelligent robotic process automation systems for businesses.

In both companies, they’re merging hardware and software to apply AI in ways that make incremental and expansive transformations to their industries. This is the new norm.

This new norm has created ripple effects. AI-based applications are built on hardware and software that may reside in the cloud, on edge devices, or in a hybrid environment. Naturally, chip requirements change depending on the environment, leading to a flowering of models dependent on the symbiosis.

https://www.slideshare.net/yanaioron1/vertex-perspectives-ai-optimized-chipsets-part-i

On the cloud, AI can automate functions like ad targeting, online marketing, and content recommendation. Such tasks are starkly different from those of edge residents, such as autonomous vehicles, delivery drones, robots, unmanned systems, and other sensor-equipped devices.

In a hybrid environment, an AI application would include grid control for autonomous trucks, or medical diagnostics based on information gathered from medical imaging devices and surgical robots. Water flow control, warehouse automation, and personal assistance also exist in this space.

All these applications will produce an inconceivable amount of data. In about an hour and a half of driving, an autonomous vehicle could generate 4TB of data from its cameras, radar, sonar, GPS, and LIDAR.

Satellites also produce large data troves. Descartes Labs uses deep learning to process satellite imagery for agricultural forecasts. To come up with evidence-based predictions, they process more than 5TB of new data every day, and comb through a library of 3PB of archival satellite images.

The ADAC loop

All stages of AI programs that are capable of deep learning — not just machine learning — are made to be trainable with data. Such programs move beyond tasks like image and voice recognition to perform decision-making, complex analysis, and prediction.

With the growing number of deep-learning machines and IoT devices, coupled with 5G network adoption, we can expect in the near future not only large volumes of data, but also a wide variety and unprecedented velocity.

This reflects the ADAC loop — applications, data, algorithms, and computing hardware. Applications and IoT devices generate data. More data improves algorithms, leading to larger and more complex neural nets that require more computing power. As computing power improves, more applications become possible — and so the loop continues.

https://www.slideshare.net/yanaioron1/vertex-perspectives-ai-optimized-chipsets-part-i

As machines keep being trained in deep learning, neural nets will become larger and more sophisticated, endowing computers with capabilities we have only dreamt of. An artificial neural network (ANN) is an information processing paradigm that imitates the human brain, both in terms of the interconnectivity of its neurons and in the way it processes information.

But ANNs will eventually be capable of accomplishing tasks at speeds that a single human brain can’t begin to catch up with. For example, AI will eventually learn to write software for automating business processes — software so complicated a human programmer won’t be capable of writing it himself.

Obviously, current processors are not the best choice to perform training and inference at these levels, as they lack efficiency in terms of cost and power.

New architecture, then, has to be designed to handle the deep learning tasks of training and inference. Deep learning comes with requirements that the general computing applications of the past didn’t, such as:

  • Running algorithms in parallel
  • Splitting data between different processing units
  • Efficient connections within the computing pipeline
  • Significant data transfer between memory

Chips tailored to AI applications

AI-optimized chips are capable of carrying out deep-learning computations with speed and efficiency, thus accelerating deep learning training and/or inference.

In China, we have seen startups already unveil their AI chips for vertical sectors. These include Rokid which announced this year their AI chip and solution for smart speakers, enabling efficient speech recognition and understanding.

Processors enhanced for AI

Strategies for chip optimization run the gamut from enhancing existing processors to adopting new paradigms. They depend on the use (training, inference, or both) and the environment (cloud, edge, or hybrid).

Edge residents will especially benefit from AI-optimized chips, because these would allow computation to be done offline, enhancing speed and reducing cybersecurity risks.

Graphics processing units (GPUs) use the popular parallel programming framework, CUDA. Originally built for image processing, they are now being adapted to AI thanks to their capability to process thousands of threads at the same time. They may, however, face problems with efficiency and scalability.

Field-programmable gate arrays (FPGAs) are reconfigurable and capable of handling constantly evolving workloads. However, they’re difficult to program, have lower performance than GPUs, and do not use any major AI framework.

Tencent currently offers an FPGA solution to support inference on its cloud service.

Application-specific integrated circuits (ASICs) are even more efficient in terms of cost and energy consumption, and are fully customizable. They fall short, though, in their inflexibility, because once they enter production, they cannot be altered. They also require a long development cycle and are practical only for high-volume operations.

Beijing-based Bitmain, which sells ASICs for cryptocurrency mining, is entering the AI chipmaking space.

GPUs and FPGAs are capable of handling AI training, which requires highly precise calculations and the capability to perform many concurrent tasks. AI algorithms use significant amounts of data to learn and be ‘trained’. This typically happens on servers and clouds.

FPGAs, ASICs, and Digital Signal Processors can be used for inference — the interpretation of new data to produce accurate results. Inference is typically done on edge environments. With IoT, more devices are expected to perform inference locally, such as on smartphones or autonomous vehicles.

In such situations, speed and energy efficiency are more important than precision — it’s important to let the car know instantly when it rains, but not to compute how many droplets are pouring per second. That’s why even CPUs are capable of supporting inference tasks in certain environments.

More AI chip solutions from China

Other AI-powered hardware include neural processing units (NPUs), which are microprocessors built to accelerate machine learning algorithms. Kneron, one of several AI chipmakers backed by the Alibaba Entrepreneurs Fund, focuses on edge solutions, including AI NPUs. Alibaba’s Ali-NPU is designed to handle video analysis and similar AI tasks.

Beijing’s Horizon Robotics, which is backed by Vertex Ventures China and Intel, has AI-tailored processors meant for smart vehicles and smart cameras. It is partnering with Chinese retailer Belle Holdings and aims to use smart cameras to gauge shoppers’ interest in purchasing a product based on their facial expressions. Smart cameras are also used in China’s streets to analyze surveillance videos.

One unit Horizon Robotics is developing is the Brain Processing Unit (BPU), which uses a Multiple Instruction, Multiple Data (MMID) computation systems . The MMID allows several process units to function asynchronously and independently, thus improving performance without sacrificing energy efficiency. The BPU is designed specifically for inference tasks.

Cambricon Technologies developed the MLU-100 chip, which has a capability of 64 teraflops in standard mode and supports cloud-based machine learning. Its Cambricon-1A chip can handle 16 billion virtual neurons per second, and is the intellectual property used for the Huawei Kirin 970 smartphone chipset. Its Cambricon 1M processor comes with three processor cores.

Semiconductor maker MediaTek has made Helio P60, an on-device intelligence chip featuring a multi-core AI processing unit (mobile APU) and aimed at mid-range smartphones. Applications include face recognition and smart video.

Perhaps one of the most innovative chips is Thinker, a neural network processing chip developed by Beijing’s Tsinghua University. Thinker is designed to adapt its computing and memory requirements to whatever AI task a piece of software requires. This makes it useful for various devices, from smartphones to robots. Adding to its advantage is its power efficiency — eight AA batteries are enough to keep it running for a year.

Future developments for AI-optimized chips

Going forward, we are likely to see Federated Learning — a multi-faceted infrastructure where learning happens both on the edge and in the cloud. This setup can drive the development of smarter levels with lower latency and power consumption, which are required for edge applications.

https://www.slideshare.net/vertexventures/vertex-perspectives-ai-optimized-chipsets-part-ii

Federated Learning would allow edge devices to perform a bit of training and learning rather than just inference. We already see this in some sensor designs for autonomous vehicles. For example, Innoviz — an Israeli company also backed by Vertex — makes LIDAR units capable of detecting, classifying, and tracking objects within the vehicle’s sight. It can distinguish lanes from objects on the road.

Federated learning is also expected to improve individual data privacy and personalized experiences.

If things go according to China’s three-stage Next Generation Artificial Intelligence Development Plan, the country will be mass-producing neural-network processing chips by 2020.

Incentives for China

Chinese manufacturers have plenty of incentives to develop their own computer chips. Funding and regulatory support are abundant, with the government hoping to reduce reliance on around US$200 billion worth of annual semiconductor imports.

Plus, the use of AI to automate work can help counter the potential loss of productivity brought on by China’s aging population. The government also sees AI as a key part of the solution to security, health, environment, and other public issues.

With so much for the country to gain, it’s easy to see why China aims to lead the world in AI adoption. Companies that supply major components, such as AI-optimized chipsets, are blazing the trail for growth. And it remains to be seen how this space will evolve in the Middle Kingdom.

--

--