Neural Network on the Tip of a Pencil
Written by Daniel Hensley, Blayne Kettlewell, Lina A. Colucci, and Sidney Primas at Edge Analytics
CPUs compute along one-dimension: sequentially in time. Algorithms are broken down into instructions that are always loaded and executed one after another. In the future, computation in two-dimensions (2D) will be the norm via hardware (HW) accelerators that support parallel execution over space. This will unify the exploitation of algorithm and HW structure for faster and more efficient solutions.
We’ve seen this trend partially realized with the rise of GPUs that support 2D computation. This has led to huge performance increases in many “embarrassingly” parallel applications.
However, GPUs only accelerate specific algorithms in specific situations. In the future, interconnected CPUs and various HW accelerators will allow seamless hyper-optimization. FPGAs, which allow for extreme customization of 2D computation by programming the hardware fabric, will be a major part of this future. For a deeper dive into the new era of HW/SW co-design, we highly recommend this talk by Chris Lattner.
In this blog, we show how to map a concrete neural network for sleep tracking onto an FPGA. More importantly, we demonstrate key tools necessary to map hardware to an algorithm today and discuss how we’ll get to seamless heterogeneous compute tomorrow.
Sleep Tracker: Neural Network on the Tip of a Pencil
We made a wearable FPGA-based sleep tracker. In the process, we built a pipeline that allows us to map a neural network originally described in Python (Keras) to silicon fabric (FPGA). The entire sleep tracker — from data acquisition to the neural network predictions — is running entirely on a tiny FPGA with no processor in the loop.
As you can see in the video, the user’s sleep state is classified directly on the device.
We leveraged a peer-reviewed implementation of algorithms developed by the University of Michigan (Walch, et. al. Sleep, 2019); this is the first open-source sleep dataset and corresponding algorithm repository of its kind.
We validated our FPGA neural network (NN) core against labeled data from this project. The neural network we used is a multilayer perceptron that takes accelerometry, heart rate, and circadian rhythm data as an input, and predicts wake, REM sleep, and non-REM sleep with 91.3% overall accuracy.
In this blog post, we introduce our open-source version of these Python algorithms deployed on wearable FPGA hardware. You can see the full technical details and source code here.
The parametrically defined FPGA NN core we built is vendor-independent and applicable beyond this application as our pipeline allows easy updating of model shapes and parameters, within certain constraints.
The Future of 2D Algorithms on Adaptable Accelerators
Deploying adaptable accelerators such as FPGAs is high friction and time-consuming today. We’ll describe three aspects of FPGA development in terms of what we did today and how it will improve in the future.
2D algorithms will be described at a high level (e.g., Python) and automatically deployed
Writing FPGA code is an arcane task that requires different expertise than what is typical for data scientists and most software engineers. This can be a barrier for teams that would otherwise greatly benefit from 2D FPGA-based acceleration. The ability to describe FPGA-targeted algorithms in familiar high-level languages such as Python is critical to democratizing FPGA use. FPGA experts will also benefit from the major efficiency gains with this infrastructure.
To deploy a new sleep tracker network in our application, a user only needs to run a script and lightly modify a couple of files. No hardware knowledge is required and there is no need to write new SystemVerilog code.
This works because we only allow a highly constrained set of models. More general High Level Synthesis (HLS) tools such as Xilinx’s Vitis HLS and Google’s XLS will, in the future, allow users to provide generic, high-level descriptions of algorithms they want deployed to adaptable accelerators.
2D algorithms will be efficiently tested and debugged in languages like Python
Simulation, validation, and debugging are critical parts of the design process for FPGA applications. These processes will also see major improvements from high level interfaces and tools.
There is great progress already. For example, although we wrote all of our components directly in SystemVerilog, we used Cocotb for all of our off-device validation and test benches — for each module and the sleep application as a whole — without ever leaving Python. With Cocotb, we can wield the cycle-accurate simulation so important for FPGA validation in the Python ecosystem that is so efficient for developers.
Rust will be the glue that holds heterogeneous systems together
A common scenario for embedded engineers is hooking up communication between a HW accelerator, such as an FPGA, and a host CPU. This work is notoriously tedious and buggy.
We built our FPGA sleep app device driver and higher level Session API in Rust. The former implemented our custom packet protocol and we used the latter to create various programs to interact with the FPGA sleep app. Rust is a great solution because its type system and static checks make it much easier to write safe low-level code and ergonomic higher-level APIs. The second half of this talk describes some of these features in detail.
We believe Rust is the best choice to glue together heterogeneous compute systems. In this role, Rust will provide safety in low-level communications, reduce driver fragility, and provide ergonomic APIs for algorithms to communicate across HW boundaries.
We published our open source repository with additional technical details here.
The FPGA work here was certainly a team effort by the Edge Analytics team! A major thank you to Blayne Kettlewell, Andrew Weitz, and Vasiliy Nerozin for all their help building the tools and code behind this work.
Edge Analytics is a company that specializes in data science, machine learning, and algorithm development both on the edge and in the cloud. We provide end-to-end support throughout a product’s lifecycle, from quick exploratory prototypes to production-level AI/ML algorithms. We partner with our clients, who range from Fortune 500 companies to innovative startups, to turn their ideas into reality. Have a hard problem in mind? Get in touch at firstname.lastname@example.org.