Today, we are open sourcing some scripts to cross compile TensorFlow for the RaspberryPi.
Since its release in 2015, TensorFlow has become one of the most powerful tool for doing machine learning. Our upcoming on-device AI platform makes heavy use of it, so we’ve been working hard on making it run on small devices such as Raspberry Pis!
In particular, the code that calls the TensorFlow inference is written in Rust. As there is a semi-official tensorflow crate available, all you need to do is add it to your cargo.toml, run cargo build, go grab a cup of coffee and, with a bit of luck, see it work out of the box. But the next time you do a cargo clean, you will have to wait another 15 minutes for it to compile your dependencies, including TensorFlow.
Looking at the source code of the tensorflow-sys crate, we noticed that it calls pkg-config and looks for a tensorflow_c library. If successful, the code will use it instead of compiling everything from scratch, saving quite a lot of build time. Our first experiment thus consisted in extracting the dynamic library after a successful build, put it somewhere safe, and writing a simple pkg-config descriptor file. Finally, by setting the PKG_CONFIG_PATH environment variable to point to that descriptor file, we were able to significantly reduce the build time.
But our upcoming product (yes, I’m being vague on purpose) needs first-class support for ARM environments (such as Raspberry Pis), meaning we needed to build the tensorflow_c library for it.
Unfortunately, the integrated cargo build on the Pi doesn’t cut it. First, it takes a while to build. And that’s assuming it can build in the first place! Turns out the Pi doesn’t have enough RAM to link TensorFlow, forcing us to add swap on disk or flash. We also needed to patch the TensorFlow code in a few places, a fairly difficult task with the integrated build. And none of the pre-built libraries we could find lying around worked for us, mostly due to version mismatch, manifesting as missing symbols.
Another option would be to cross compile the library from a more capable machine, enabling faster iterations. Some people having successfully built TensorFlow on the Raspberry Pi report hours of build, while a decent laptop can do it in minutes. Unfortunately, cross compilation is often a messy business.
What we need is a cross-platform toolchain for the Pi, running on the build machine (a Linux or Mac laptop). A cross-platform toolchain typically contains a compiler and linker tweaked to run on one given architecture while generating an executable for another. It also features a minimal image of the libraries that will be available on the target system so that the executable can properly link to them. Needless to say, many stars have to align for this to work: from the micro-processor architecture to the kernel version and ABI, C and C++ standard library…
Although the mobile ecosystem has released excellent cross-platforms toolchains as part of the Android SDK and XCode, things are a bit less streamlined for the Raspberry Pi . There are some efforts to deliver ready to use toolchains (e.g. Linaro), but for some reason macOS isn’t supported as a build platform, meaning most of our team wouldn’t be able to use it.
There is another tool called crosstool-ng that tries to make things easier, but unfortunately isn’t quite there yet. It requires configuring things in a “menuconfig” (very reminiscent of the Linux kernel one), then waiting a few minutes hoping the selected options will yield a working compiler. If successful, we can then test a simple “hello world” program by compiling on our laptop before copying and running it on the Pi. If this works, then we can try to use our toolchain to link a Rust test program. Keep in mind that even though Rust can natively compile to multiple architectures (including the Raspberry Pi), we still need the cross-platform toolchain to link the code and make it executable. So far, we managed to make this work on both Linux and macOS.
Now, on to the main event: cross-building TensorFlow. Once again, many things have to go right. TensorFlow is pretty aggressive in terms of optimization. Even without CUDA, it contains bits of assembly that have to be activated to match the target platform. Then there is quite a little bit config work for bazel, the TensorFlow build tool, to make it work with our toolchain. Unfortunately, we didn’t manage to build a toolchain (on either macOS or Linux) that is able to cross-compile a working TensorFlow library for the Raspberry Pi. Instead, we ended up using the one provided by Raspberry Pi, which only works on Linux.
But thanks to Rust’s magnificent cross-compilation support, cross-compiling a Rust executable is easy one we have a working toolchain and built TensorFlow library. We simply used rustup to download the rust standard library for Raspberry Pis and added a line in the .cargo/config file to tell it to use our toolchain!
We encountered one last problem: by default, the Rust pkg-config crate forbids cross compilation. A little bit of digging in the pkg-config crate source code showed that setting PKG_CONFIG_ALLOW_CROSS=1 would disable that restriction.
Finally, we made some changes to Dinghy, our homemade tool to easily run Rust programs on other devices during development. It only supported iOS and Android, so added support for SSH, enabling us to automatically cross-compile and run tests on the Pi, using a single cargo command.
Overall, this has been quite a journey, which ended up with a set of useful scripts to simplify the TensorFlow cross-building process. We made them all available on github, including some precompiled debs for those who don’t want to do everything from scratch ;). The repository is also a homebrew tap, containing a tensorflow_c formula (for macOS to macOS, not cross-compilation capable).
There are still plenty of ways to improve this (for instance the generated .so is quite heavy), but at least we now have a good base on top of which we can start doing machine learning on-device!
If you want to work on AI + Privacy, check our jobs page!