The Evolution of AI Frameworks and MindSpore’s Vision

Published in

Huawei MindSpore

12 min readNov 9, 2020

Author: Xuefeng Jin (jinxuefeng@huawei.com)

After MindSpore was open sourced at the end of March, it has been busy with function optimizations and internal and external application promotion. Now, I finally have some time to stop and summarize some thoughts based on MindSpore’s practice and team on the AI frameworks. With this, I hope it will help you understand MindSpore better.

My plan is to publish a series of articles. The preliminary topic plan includes:

· AI Framework Evolution Trend and MindSpore Vision

· Analysis of IR layer of the AI framework

· Unification of dynamic and static diagrams

· How to unify the device-edge-cloud framework?

· Graph Design and Calculation Integration

· Is automatic operator generation feasible?

· What should an AI programming language look like?

· How to become a distributed parallel native AI framework?

· How to combine AI and scientific computing?

· How does the AI framework enable AI responsibility?

…and many more

There is a lot of content, and it takes a long time to write it all. I hope I will have enough patience to stick to it.

This article is an overall introduction, mainly to analyze the development trend of AI frameworks, and introduce the idea behind MindSpore.

1. What is the development trend of AI frameworks in the future

The development of AI frameworks can be roughly divided into 3 stages:

The representatives of the first stage are Torch, Theano and Caffe, which laid down the basic foundation for the design ideas based on Python, automatic differentiation, and computing graphs.

In the second phase, we can see TensorFlow and PyTorch widely used in the industry through distributed training and various deployment capabilities. In addition, TensorFlow and PyTorch provide dynamic graph capabilities, attracting a large number of researchers and algorithm engineers in terms of flexibility.

But, what is the direction of phase three? Currently, Google has not developed a fixed model. We can see that it is exploring multiple technical paths, including TF2.0, JAX, MLIR, and Swift for TF. Some are combining dynamic and static graphs, some are building a unified IR infrastructure, and some are exploring new expressions. All in all, there are many kinds of things, and they are all in full bloom.

Regardless of the future development direction of AI frameworks, the driving forces are still relatively clear. We believe that they are mainly four:

Future development directions of AI frameworks

· Application + Big Data: The application is the AI model and algorithm, and Big Data is the data required by the model.

· Chip: AI chips, representing the development of AI algorithms

· Developer: AI Algorithm researchers and Algorithm engineers

· Enterprise: AI deployment and AI responsibility

Now, let’s discuss the development direction of the AI framework by analyzing the drivers of the AI framework.

Challenges of AI frameworks from an Application and Data Perspective

· Continuous improvement in the scale and complexity of models and data.

Increase in scale of deep learning models over time

In May this year, OpenAI released the GPT-3 model, which contains hundreds of billion parameters and 45 TB datasets (before processing). The training cost is claimed to be close to US$5 million. Ultra-large models not only challenge algorithms and computing power, but also pose great challenges to the AI framework, performance wall (memory, communication, and computing utilization), efficiency wall, and precision wall:

1. Performance wall: In the large model, a single GPU or NPU card (generally 32 GB memory) cannot fit the entire model. Traditional data parallelism is insufficient. In the future, technologies such as memory over commitment and hybrid parallel (data parallel/model parallel/pipeline parallel) will become normal. In addition, it is difficult to optimize the hybrid and parallel split policies. The traffic generated by different splits varies greatly, and the computing utilization is different. For example, pipeline parallelism poses great challenges to the computing utilization, and operator split parallelism poses great challenges to the traffic. All these require a strong AI framework support. In a large-scale parallel training system, data preprocessing becomes a bottleneck when the performance requirements are higher. For example, during the performance optimization of ResNet50, a single step lasts 17 to 18 ms, but the host data processing time cannot meet the requirements.

2. Efficiency wall: If the hybrid parallel policy is manually determined by algorithm engineers, the threshold is high. The key is to understand both algorithms and systems.

3. Precision wall: Large-scale model training is inherently a large batch size training. How to achieve better convergence and precision requirements is a big challenge.

In addition to the preceding three points, extra-large-scale training faces other challenges, such as availability, tail latency, and dynamic scheduling of large-scale clusters.

· The load of the framework evolves from a single deep learning model to a universal tensor differentiable computing.

Currently, there are three main directions:

1. DNN combined with traditional machine learning, such as deep probability learning and graph neural network. This is supported by almost all AI frameworks in the industry.

2. Combination of AI and scientific computing. We can see the industry exploring in three directions. First, AI modeling is replacing traditional computing models. This direction has just started and there is not much progress. The second is AI solving, where the model is still a traditional scientific computing model. However, we have seen real progress by using the neural network methods to solve problems. Currently, many basic scientific computing equations have corresponding AI solutions, such as PINNs and PINN-Net. However, there are still great challenges, especially in precision convergence. If we want to solve scientific computing models on AI frameworks, the biggest challenge lies in the front-end expression of high-performance high-order differentials. The third is to use the framework to accelerate the solution of the equation. That is, the model and method of scientific calculation remain unchanged. However, using the same framework as the one used for deep learning to solve any equation is actually regarded as a distributed framework for tensor computing.

3. Languages and frameworks related to computing graphics, such as Taichi, provide differentiable physical engines and differentiable rendering engines.

Challenges to the AI Framework from the Trend of AI Chips

AI chips are classified into two types: SIMT-based GPU and SIMD-like NPU. With the release of the NVIDIA A100, the architectures of the two types of chips are mutually referenced and integrated.

1. Large computing power depends on SIMD. (Although SIMT has strong flexibility, the chip area and power consumption challenge is still big). Note that the scale of Tensor Core is increasing.

2. On-chip inter-chip high-speed interconnection

3. Multi-silicon chips, large memory package, x times the size (especially inference chips).

The continuous evolution of AI chips poses many key challenges to AI frameworks:

1. The degree of optimization and hardware coupling has been improved. Graph calculation and compilation are integrated: the unification and optimization of graph layers has tended to converge, and therefore, needs to be optimized with the operator layer. The boundary between the sub-graph and the operators’ level is broken. The overall optimization based on graph tuning is a hot topic.

2. Diversified execution modes: The graph drop mode and single-operator invoking mode are mixed. Different execution modes may be used in different scenarios.

3. Larger programmability challenges: Heterogeneous programming is more challenging because of the large number of SIMD acceleration instructions.

Trend of AI Framework from the Perspective of Algorithm Engineers and Developers

· The new AI programming language attempts to break through the limits of Python

Currently, Julia and Swift for TensorFlow are the most representative.

Julia has a long history and a good application foundation in the scientific computing field. Based on her experience in this field, Julia gradually enters the AI and deep learning fields. It claims to have the features of dynamic languages (like Python) and the performance of static languages (like C):

1. Tensor Primary Expression of MATLAB-like

2. Polymorphic capability + dynamic type derivation/specialization

3. IR reflection mechanism

The IR reflection mechanism means that Julia’s IR allows developers to perform secondary processing. Take Julia’s machine learning library Flux + Zygote as an example:

Use of Julia’s IR for machine learning with Flux and Zygote

Flux is an extension library on Julia. It defines basic operators, such as conv, pooling etc., so that developers can define machine learning models based on Flux.

Zygote implements source-to-source automatic differentiation based on the IR reflection mechanism provided by Julia. In this process, Julia itself is not changed.

Compared with Julia, Swift for TensorFlow is a different approach. He tries to identify differentiation from the industrial-grade development perspective, including static types, high performance, and easy deployment with devices.

Although Julia and Swift have some unique features, it is difficult to shake Python’s position in the AI field in the short term.

· AI Compilers Become the Competition Focus of AI Frameworks

AI compilers are working in three directions. The first is AI JIT capabilities dedicated to unified dynamic and static graphs, such as TorchScript and JAX. The second type is chip-oriented optimization, such as XLA and TVM. The third type is the infrastructure for AI compilers. MLIR hopes to provide MetaIR as the basis for building AI compilers. Relay/TVM wants to open up compiler interfaces to support third-party frameworks. However, we can see that there are still major challenges in all three directions.

1. AI JIT: Python is too flexible and dynamic. Dynamic shapes are easy to handle, but dynamic types are difficult (not to mention that Python has a large number of flexible data structures). It is not easy to seamlessly switch from a dynamic graph to a static graph.

2. Compilation acceleration: Currently, pattern-based operator union and template-based operator generation technologies are used. The bottleneck lies in the operator generation technology. Because there are too many combinations for operator unification, template-based enumeration cannot be implemented. Therefore, the generalization capability of compilation acceleration needs to be improved. The next technology to be tackled is automatic operator generation, which does not require templates.

3. AI compiler infrastructure: MLIR is advanced and ambitious in terms of design. It aims to support compilers in various domains, including AI compilers (layers and operator layers), through the extension of Dialect. According to the progress, MLIR is fastest when used in TF Lite, and it is mainly used as a model conversion tool. Why do we need to think about this, what are the benefits of MLIR? Personally, I think that MLIR itself does not improve the framework performance nor usability. It is mainly for reuse and normalization, such as CFG+BB and a large number of optimizations of the LLVM infrastructure. The question is whether these reuses are beneficial to the AI layers and the compilation of the operator layer. MLIR+LLVM is applicable to the operator layer, but not for the graph layer. Although LLVM unifies compilers of many programming languages, its advantages lie in the static compilation field. In the JIT and VM fields, LLVM has no obvious advantages. Infrastructures such as CFG+BB may not be suitable for AI layer compilers such as automatic differentiation/JIT.

Challenges of AI Framework from AI Deployment

From the perspective of AI deployment, we can see three trends:

1. Deployment of large models, especially language-based models, on the device side.

2. Device-cloud synergy is gradually applied. There are two types of scenarios:

a. Cloud-side training and online fine-tuning for incremental learning on the device side

b. And federated learning.

3. AI is everywhere. AI models are even deployed on IoT devices with limited resources.

Currently, the AI framework has two challenges:

1. Can the AI framework be unified on the cloud and device sides? The unified architecture mainly refers to the IR format. Only in this way, models trained on the cloud side can be incrementally learned on the device side, using federated learning conveniently.

2. Should the AI framework be large or small, for example, K-level noise floors be built into IoT devices?

Challenges of AI Framework from the Perspective of AI Responsibilities

AI’s responsibilities cover a wide range of issues, including security, privacy, fairness, transparency, and interpretability.

As the bearer of AI services, AI frameworks must be capable of enabling AI responsibilities. The current frameworks need to address the following challenges:

1. For all aspects of AI responsibility, there is a lack of universal analysis methods and measurement systems. There is a lack of automated measurement methods for scene awareness.

2. AI model robustness, privacy protection technologies, and dense AI greatly affect model performance in actual scenarios.

3. AI’s interpretability is not supported by theories and algorithms, and it is difficult to provide human-friendly inference results.

2. MindSpore’s vision

MindSpore is an emerging framework, and a question that people often ask is: where is its differentiation?

Based on the driving forces and challenges of the AI framework, we hope MindSpore will lead the evolution of AI frameworks in the following five directions:

· Beyond AI: Evolution from the Deep Learning Framework to the Universal Tensor Differentiable Computing Framework

MindSpore will provide a more general-purpose AI compiler, offering the possibility to support more application types.

· Distributed parallel native (Scale out): Evolution from manual-parallel to automatic-parallel

MindSpore not only provides high performance and scalability, but also wants to lower the threshold for large-scale training to simplify distributed training.

· In-depth graph-computing convergence (Scale up): Evolution from graph-computing separation optimization to graph-computing joint optimization

MindSpore provides key technologies, such as joint optimization of graphs and calculations, automatic operator generation, and depth graph optimization, to fully leverage the computing power of AI chips.

· All-Scenario AI: Evolution from a Separate Device-Cloud Architecture to a Unified Device-Cloud Architecture

MindSpore’s cloud-side framework and device-side framework are unified to facilitate cloud-based training, device-side fine-tuning, or federated learning with device-cloud collaboration.

· AI Enterprise-level trustworthiness: Evolution from consumer-level AI to enterprise-level AI

MindSpore has built-in adversarial training, differential privacy, dense AI, federated learning, and interpretable AI capabilities.

Of course, software architectures are constantly evolving, and few technologies can achieve unique skills. MindSpore also hopes to progress in the right direction with other frameworks in the industry.

In addition, the MindSpore community has released 10 topics and invited developers to participate in the innovation process. For details, see the following link:

https://github.com/mindspore-ai/community/tree/master/working-groups/research

Finally, a brief introduction to MindSpore’s high-level design:

MindSpore is divided into four layers:

1. MindSpore Extend: This is an extension package for MindSpore. The number of extension packages is still relatively small. I hope that more developers will contribute and build more packages together in the future.

2. MindSpore expression layer: MindExpress is a Python-based frontend expression. In the future, we plan to provide different frontends such as C/C++ and Java. MindSpore is also considering a front-end self-developed programming language Cangjie, which is currently in the pre-research phase. At the same time, we are also interconnecting with front-end PUs (such as Julia) to introduce more third-party ecosystems.

3. MindSpore compilation and optimization layer: MindCompiler is the core compiler of our graph layer. It implements three functions based on the unified MindIR (of the device-cloud) to achieve: hardware-independent optimization (such as type derivation, automatic differentiation, and expression simplification), hardware-related optimization (such as automatic parallelism, memory optimization, graph-computing convergence, and pipeline execution), and deployment-related optimization (quantization and pruning). MindAKG is the MindSpore’s automatic operator generation compiler, and it is still being improved.

4. MindSpore full-scenario running: cloud, device, and smaller IoT.

At the same time, MindSpore provides MindArmour for AI responsibility, and MindData for data processing, visualization, and interpretation.

MindSpore is a new open source project, which was just opened at the end of March this year. The concepts described in this document are more oriented to the MindSpore’s planning. Some functions are not yet perfect, many are not easy to use, and some are even in the pre-research stage. But we hope that developers will participate in the MindSpore community, raise more questions and suggestions, and build the future together.

MindSpore official website: https://www.mindspore.cn/

MindSpore Forum: https://bbs.huaweicloud.com/forum/forum-1076-1.html

Gitee: https://gitee.com/mindspore/mindspore

GitHub: https://github.com/mindspore-ai/mindspore