Google Summer of Code 2021 — Fossi Foundation — OpenPiton

Guillem López
4 min readAug 20, 2021

--

Introduction

This post represents my journey on GSoC 21' in the last 10 weeks of this summer. My name is Guillem and I am a first-year PhD student at the Polytechnic University of Catalonia (UPC) living in Barcelona. This summer, GSoC 21 has given me the opportunity to work on an open-source project that I have used in the past, like Verilator or OpenPiton. In addition, It has also been a great experience for my PhD, exploring new ideas, developing them, and talking with my mentors.

Summary

My project has consisted of exploring the possibility of partitioning the tiles of an NoC in different processes making use of MPI in order to speed up simulations of 10’s of cores. It all started with the work of Brian that on Metro-MPI repo demonstrated that communications in RTL Simulation over MPI were possible. Hence, the first weeks were devoted to designing a protocol on how the differents tiles will communicate through MPI in this sub-project: Metro-MPI. The latter includes trying in the RTL Simulator (Verilator) if it was viable; design the messages to be passed every cycle and the synchronization of all the processes; creating a user library for these messages. After the protocol was designed, I implemented the necessary work to do the communications over MPI in OpenPiton with the help of Brian and Jon. Finally, we have tested this implementation in scenarios up to 128 cores.

Tools

In this ambitious project, I have mainly used three open-source tools and projects:

  • OpenPiton
  • Verilator
  • OpenMPI

OpenPiton is an open-source manycore processor. It is a tiled manycore framework scalable from one to 1/2 billion cores (NoC). It is highly configurable in both core and uncore components. It has all the Verilog and infrastructure ready to use on GitHub.

Verilator compiles Verilog/System-Verilog into a much faster optimized and optionally thread-partitioned model (OpenMP), which is in turn wrapped inside a C++/SystemC module. Verilator reads the code, performs linting checks, and optionally inserts assertion checks and coverage-analysis points. It outputs single- or multi-threaded .cpp and .h files.

OpenMPI is an open-source Message Passing Interface implementation that is developed and maintained by a consortium of academic, research, and industry partners.

My Contributions

To improve the RTL simulations of big NoCs, we can divide the work done in different phases:

First phase

  • Design of the protocol to be used later with many tiles but in toy examples
  • Design of the toy examples: sender module and receiver module
  • First experiments with these setups on the repo Metro-MPI.
  • Implementation of a user library with the main functions to be used later

Second Phase

  • Adaptation of OpenPiton to have independent Tiles and Chipset modules. This work was mainly done by Jonathan Balkind, one of my mentors and creators of OpenPiton since involved in getting inside of the internals of OpenPiton.
  • Adaptation and code of the protocol developed on the first phase but in the tile and chipset modules.
  • First experiments and debug of one tile and one module connected through MPI

Third Phase

  • Design of simple functions to map the ranks of MPI to ids of the cores and tile needed for the protocol to send the messages.
  • Scale the NoC simulation up to 8x8 cores. as a first try. then increase it up to 16*8 tiles (128)
  • Implemented 4 optimizations in MPI. Reducing the number of messages between tiles from 6 to 1.

We can see two PR with most of the work:

State of the project and future work

The current plan is to continue working up to 1024 cores meanwhile we try to make a multi-tile process. Instead of having 1 core per MPI, have a simple pair or 3x3 cores Noc simulated in the same MPI process to explore even speeding up more the simulation time. Apart from that, Brian is making a great effort on doing a similar approach of using MPI in OpenPiton using VCS.

Challenges and Learnings

Challenges:

  • C++ and MPI can be tricky
  • Debugging is extremely hard on RTl even more in multi-core environments
  • OpenPiton code is big
  • Designing cautiously to implement it afterwards
  • MPI learning curve

Learnings:

  • Design before coding
  • Learn from Pull Request revisions
  • Keep it simple
  • Try very easy models/tries first
  • Improved organizing and work on remote
  • Keep track of weekly meetings

Finally, these are some links to the project and my fork in Github:

Extras

The next image shows the first hello world with 10 MPI processes!

--

--