Graphcore Announces Production Release of PyTorch for IPU

Published in
5 min readDec 9, 2020


Author: Matt Fyles, SVP Software, Graphcore

Today we are introducing our first production release of PyTorch for IPU — PopTorch — combining the performance of the Graphcore IPU-M2000 system and the developer-ready accessibility of PyTorch.This will enable the fast-growing PyTorch developer community to make new breakthroughs in machine intelligence with Graphcore IPU systems, while maintaining the dynamic PyTorch experience.

Founded in 2016, Graphcore co-designed the Intelligence Processing Unit (IPU) and Poplar® software from the ground up for AI to let innovators make new breakthroughs in machine learning and we are now shipping production MK2 IPU-based systems to customers around the world. PyTorch’s user base has grown significantly over the same time period, with developers citing its feature set and user-friendly interface as key determining factors when choosing the framework. We listened to this feedback from developers and acted upon it, putting in place a dedicated development team at Graphcore building PyTorch for IPU.

Since the beginning of 2020 we have been working on directly integrating PyTorch with Poplar to support the use of IPU for both training and inference. Our goal has been to let users simply move their PyTorch applications to run on IPU platforms. Over the course of the year, we rolled out an extensive preview program, getting feedback from our customers on its usability and running standard workloads to prove out its capability.

We now have a tightly coupled and elegant solution. It executes by interfacing standard PyTorch to our Poplar® software and the IPU via the PopTorch connecting library.

To run a standard PyTorch program on the IPU, you simply need to change a couple of lines of code and include PopTorch:

import torch


Our PyTorch interface library is a simple wrapper for PyTorch programs that lets developers easily run models for both training and inference, directly on IPUs, with a few additional lines of code.

When using multiple IPUs, developers wrap individual layers in an IPU helper to designate which IPU should be targeted. The Poplar Advanced Runtime, PopART, then parallelises the model over the chosen number of IPUs. PopART passes the PyTorch model down through the software stack to the Graph Compiler which optimises and schedules workloads and associated memory operations for execution on one or more IPUs.

Graphcore’s Poplar software stack


We are open sourcing PopTorch, with code available from GitHub, following the open-source principles that we have already established for our Poplar Libraries (PopLibs™), PopART™, and TensorFlow for IPU port.

We strongly believe that embracing an open-source approach will ultimately accelerate innovation in the field of machine intelligence. Allowing developers to look ‘under the hood’ of Graphcore’s software builds familiarity with, not just our Poplar software, but the workings of our system hardware.

Those same developers — through their community contributions — will help to build, refine, and optimise our platform faster than we ever could on our own. It also allows developers working with PyTorch in highly specialised areas of AI research to bring their implementations to the community.

Developers wishing to submit code contributions can do so via standard GitHub pull requests, after accepting our Contributor Licence Agreement.


PyTorch for IPU is designed to require minimal manual alterations to PyTorch models.

The following code example shows how to perform inference using a standard pre-trained BERT PyTorch model on the IPU. We’ve highlighted what you’ll need to do differently.

As you can see it’s not very much:


The PyTorch for IPU interface library supports popular features developers will be familiar with from other hardware platforms, with some additional capabilities:

· Support for inference and training

· Data and model parallel support, model replication up to 64 IPUs

· Optimised host dataloader support

· FP32.32, FP16.32, FP16.16 precision with FLOAT32, FLOAT16, INT32, INT64, BOOL data types

· Support for popular optimizers, SGD, RMSprop, AdamW, LAMB and features to support float16 models such as loss scaling

· Broad PyTorch loss function coverage with support for arbitrary loss functions

· Multi-convolution support

· Ability to implement custom optimised operators

· PyTorch for IPU Docker containers

· Full support within Graphcore’s PopVision analysis tools

· Examples and tutorials available from


We are publishing new benchmarks for our IPU-M2000 system today too including some PyTorch training and inference results. We also provide reference implementations for a range of models on GitHub. In most cases, the models require very few code changes to run IPU systems. We will be regularly adding more PyTorch-based performance results to our website and as code examples on GitHub so please keep checking in.

The following results show the performance advantage you can see for both training and inference for ResNet50 and EfficientNet using PyTorch.

Graphcore MK2 performance results December 2020
Graphcore MK2 performance results December 2020
Graphcore MK2 performance results December 2020
Graphcore MK2 performance results December 2020

This is just the start of the PyTorch and Graphcore story. We are looking forward to seeing the breakthroughs you make.


Graphcore developer portal
PyTorch for the IPU: User Guide
Graphcore on GitHub
VIDEO: PyTorch for the IPU — running a basic model for training and inference
CODE EXAMPLE: BERT inference using PopTorch
TUTORIAL: Simple PyTorch for IPU

Anthony Barbier, Stephen McGroarty, David Norman, Phil Brown, Bence Tilk and Alex Cunha

Matt Fyles is a computer scientist with over 20 years of proven experience in the design, delivery and the support of software and hardware and currently SVP of Software at Graphcore.




PyTorch is an open source machine learning platform that provides a seamless path from research prototyping to production deployment.