Facebook Uses This Library to Help Data Scientists Write Scalable Machine Learning Code

Tensor Comprehensions powers the scalable infrastructure for many machine learning solutions at Facebook.

Jesus Rodriguez
Nov 6, 2020 · 5 min read
Image for post

I recently started a new newsletter focus on AI education. TheSequence is a no-BS( meaning no hype, no news etc) AI-focused newsletter that takes 5 minutes to read. The goal is to keep you up to date with machine learning projects, research papers and concepts. Please give it a try by subscribing below:

Machine learning projects in the real world are regularly vulnerable to the friction between data scientists and developers. That friction is mostly due to the challenges in translating a machine learning model expressed mostly in mathematical terms into code that can scale across several GPUs or CPUs. How many times have we experience the following scenario:

1. A data scientist or researcher write a machine learning algorithm using a highly mathematical notation like PyTorch(which is becoming a favorite of researchers).

2. An engineer takes the model and translates it into more actionable code maybe using a different framework like TensorFlow or Caffe2 and starts using performance libraries such as CuBLAS to optimize the execution of the model for different GPUs.

3. After some time, the transformations on the model are so many that the data scientist can barely understand it. In other words, the knowledge of the model completely shifts from the data scientists to the engineers.

The scenario described above is a consequence of the high levels of complexity required to optimize machine learning models using popular high performance libraries such as CuBLAS, MKL, and CuDNN. Those libraries are essential to optimize the performance of the model but include many low level routines in areas such as memory checks, concurrency, instrumentation logic are some of the modifications that make the model almost incomprehensible to the original researchers. This problem becomes an order of magnitude more complex in large data science teams.

Image for post
Image for post

Enter Tensor Comprehensions

The friction between researchers and engineers is an omnipresent challenge on any large scale data science operations. Large internet powerhouses such as Google, Amazon or Facebook encounter many forms of this challenge in any machine learning problem they decide to tackle. Last year, the Facebook artificial intelligence(AI) Lab, released the first version of Tensor Comprehensions, an open source library that helps bridge the gap between researchers and engineers. Conceptually, Tensor Comprehensions provides a mathematical language that allow researchers to model problems that can be easily translated into high performance code. The ideas behind Tensor Comprehensions were captures in a research paper published by Facebook AI Labs last year.

The first release of Tensor Comprehensions includes four fundamental components:

· A high-level language to express tensor computations arising in ML with a syntax generalizing the Einstein notation

· An end-to-end compilation flow capable of lowering tensor comprehensions to efficient GPU code. It delivers strong baseline performance for custom operators and remains competitive with vendor libraries on standard ones.

· A collection of polyhedral compilation algorithms with a specific domain and target orientation. Unlike general-purpose parallelizing compilers, Tensor Comprehensions primarily optimize for reduced launch and synchronization overhead through kernel fusion and also favor multi-level parallelism and promotion to deeper levels of the memory hierarchy.

· An auto-tuning framework that takes advantage of Just-In-Time (JIT) compilation and code caching.

· Integration into common ML frameworks such as PyTorch and Caffe2(the core of Facebook’s ML stack).

Tensor Comprehensions builds on ideas of other high performance computing frameworks such as Halide. In fact, Tensor Comprehensions uses the Halide compiler as a library. Specifically, the framework relies on Halide’s intermediate representation (IR) and analysis tools, and pair it with polyhedral compilation techniques, so that developers can write layers using similar high-level syntax but without the need to explicitly say how it is going to run.

Image for post
Image for post
Source: https://research.fb.com/downloads/tensor-comprehensions/

Using Tensor Comprehensions in deep learning frameworks is fairly simple as shows in the following PyTorch code:

import tensor_comprehensions as tc
import torch
lang = """
def matmul(float(M, K) A, float(K, N) B) -> (C) {
C(m, n) +=! A(m, r_k) * B(r_k, n)
matmul = tc.define(lang, name="matmul")
mat1, mat2 = torch.randn(3, 4).cuda(), torch.randn(4, 5).cuda()
out = matmul(mat1, mat2)

The previous code automatically produced a program that is optimized to run on a CUDA GPU architecture.

Tensor Comprehensions relies on a technique called Polyhedral Compilation to bridge the impedance mismatch between the logical layout of high level tensor operations (dimension ordering) and the data format the polyhedral code generator expects. Polyhedral Compilation allows Tensor Comprehensions to schedule computation of individual tensor elements on-demand for each new network.

Another important contribution of Tensor Comprehensions is the use of a multi-GPU auto-tuning library, based on evolutionary search techniques, that generate and evaluate thousands of implementation alternatives and select the best performing ones.

Image for post
Image for post
Source: https://research.fb.com/downloads/tensor-comprehensions/

Initial tests showed that Tensor Comprehensions can already match and, in many cases, surpass the performance of native high performance libraries. The following bar chart illustrates performance gains we observed when comparing kernels produced automatically by Tensor Comprehensions against existing alternatives in Caffe2 and ATen (which use vendor library implementations such as CuDNN).

Image for post
Image for post
Source: https://research.fb.com/downloads/tensor-comprehensions/

Tensor Comprehensions is available as an open source release in Github. Future releases might extend the framework to support other popular deep learning libraries such as TensorFlow or MxNet. Even if you don’t use the framework, the ideas behind Tensor Comprehensions provide a lot of insight about how to mitigate the frictions between researchers and engineers working on large machine learning projects.

The Startup

Medium's largest active publication, followed by +752K people. Follow to join our community.

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store