Machine Learning

How We at Intel Optimized for TensorFlow 2.5…

A principal engineer’s approach to optimizing TensorFlow performance on Intel® Xeon® processors

Intel Tech

Published in

Intel Tech

3 min readMay 24, 2021

Author: Ramesh AG, Intel Principal Engineer

Team of Computer Engineers Work on Machine Learning Neural Network Technology Development, Intel photos

I am a principal engineer at Intel, where I have worked for more than 25 years, largely on software optimizations. For the last four years my primary responsibility has been working with Google engineers to optimize the TensorFlow artificial intelligence (AI) framework so it runs as fast as possible on Intel Xeon CPUs.

When we started this effort four years ago, much of the talk about machine learning (ML) and in particular deep learning focused on GPUs. Mistakenly, Intel Xeon CPUs were often not perceived as an alternative for deep learning. Frankly, one reason was that the most popular framework for machine learning, TensorFlow, was not running as well as it should on our Intel Xeon CPUs.

Over the last three years, working with Google, our group at Intel has performed many optimizations to TensorFlow, and now those improvements are part of the latest 2.5 release of TensorFlow. You can read more about these optimizations and what we did in the Intel Analytics Software blog, Leverage Intel Deep Learning Optimizations in TensorFlow.

Developers now recognize that machine learning performs very well on a multi-core CPU. Since there are millions of Intel Xeon CPUs in operation¹, it is a good thing for AI in general that these algorithms run well on a multi-core CPU, so people don’t have to think that they have to go out and buy an expensive GPU. We are helping make that possible by making TensorFlow and ML applications run faster on Intel hardware, which is already widely available at data centers, for students in colleges, and many other places.

The second thing I would like to emphasize is that Intel has worked on new instructions for AI processing, in our Intel® Deep Learning Boost software. Intel DL Boost has specific instructions which accelerate workloads that use smaller data types than the typical ML data type of 32 bits. What the AI industry has seen is that machine learning in general does not require 32-bit floating-point numbers, and can tolerate smaller data types, for example, 8-bit and 16-bit data types. We have specific instructions for these matrix multiplications, using 8-bit data types. These smaller data types are becoming popular, and Intel DL Boost addresses that, among other topics.

Applications run much faster on Intel Xeon processors by using the special optimizations that we have added to TensorFlow. We have proven that by conducting benchmark testing and publishing the results for Intel’s Xeon processors on the Leverage Intel Deep Learning Optimizations in TensorFlow, which over the past two years are the most widely deployed processors in the cloud. Depending on the SKU, these Xeon processors can have up to 40 cores per socket, with two sockets per system for up to 80 cores per system. One important aspect of our optimizations is to make sure that the workloads are balanced for the number of available cores.

That was an important part of our optimization effort, to make sure the machine learning algorithms are properly partitioned over all the cores that are available. The matrix math operations can be broken down into smaller pieces so that each one can be done on a separate core individually, and then combined to get the results.

These hardware and software optimizations are continuing, with our latest Intel Xeon scalable processors as a prime example on the hardware side and our TensorFlow work with Google, and Intel DL Boost, as good examples on the software side. This work is necessary because the AI models are becoming much more complex and Intel is continuing to innovate to address these and future complexities.

Notices and Disclaimers

¹Sources:

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.com.

Intel technologies may require enabled hardware, software or service activation. No product or component can be absolutely secure.
© Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others.

Machine Learning

How We at Intel Optimized for TensorFlow 2.5…

A principal engineer’s approach to optimizing TensorFlow performance on Intel® Xeon® processors

Notices and Disclaimers

Written by Intel Tech