Thoughts on Google’s Trax library
Intended audience: software engineers, machine learning engineers, aspiring deep learning researchers (warning: it’s a small niche)
Epistemic status: “cold take” distillation, as I’ve been exploring the codebase in off-hours and coffee breaks for months
Introduction
The Trax library intends to be an easy on-ramp to advanced deep learning.
As you can see, in its opening paragraph, an implementation of the Re-former network (a modified, more efficient Transformer) lives in the code base. It’s also highly accessible to all of the nice datasets and compute engines you can handle.
In this blog post, I want to touch on the two selling points for my fellow Machine Learning Engineers (MLE), Software Engineers in Machine Learning (SWEML), and Research Software Engineers (RSE/RSWE) that are constantly pushed to be jacks-of-all-trades, master of none. I have some good news for you. Trax is easy to learn, makes deep learning easy to grok, and gives you a guide to doing more research and less typy-type. It achieves this by:
- Giving you code as instructions as code in deep learning
- Demonstrating that good software practice really does help
Let’s dive in.
Instructions as code, or “how SWEs like to get down”
There’s an old joke on the internet that all a programmer needs is three buttons on their keyboard. Sometimes this joke is nightmare reality, but it touches on truth: SWEs like to write code, learn from code, and read good code. Who needs documentation when the code tells the whole story?
Quick side note: I’m over-generalizing because I have met more SWE generalists that feel this way than not. This is honestly not my preference .We MLEs — and RSEs, in particular — tend to be cut from a different cloth.
I’ve been a software engineer for five years and getting thrown into a new codebase has never been my favorite part of the job. It’s not terrible, but I don’t relish in that space; my more of a “greenfield guy”. I might have had more funny working with the team that produced the Trax codebase than the team that is currently maintaining it. If that’s the same group, I would be stunned, given the thousands of engineers at Google, but I digress.
The quick start tutorial shows you just where to jump in and play, with clearly outlined comments and slight hints as to what to avoid. It’s already wired up, so it’s really up to you to figure out how to plug and play:
- See which hyperparameters have a significant impact on training
- See what kind of problems you can model for the Transformer language model can learn.
- Colab has made it easier than ever to actually work with code, as they’ve recently implement Ctrl/Cmd + Left Click and have expanded the usefulness of the file browser
Run through the tutorial really quickly (it’s three notebook cells). Note: just hitting “run” three times isn’t learning, and the model you’ll train is not perfect. Far from it, even. Just supply a negative number and watch the predictions go nuts.
On the positive side, the tutorial code is well organized and serves as a starting to begin learning . As mentioned, it’s very clear where they want you to start fiddling and it’s easy to slip into that model iteration mindset. It’s fun, engaging, and illuminating if you haven’t done a deep dive. That’s what tutorials are supposed to do: get you going, with a smile on your face.
Google’s instruction also gives us a nice example of well-formatted logging, something engineers and scientists value alike, because when things break you want them working again ASAP. Hats off for good logging! wave🎩
Best practices pay off, or “a gold mine of clean code”
Searching around Trax, you will find the (aforementioned) legitimate implementation of the Re-former architecture, implementations of Attention from first-principles, and an over-powered Python decorator that dynamically creates Python classes for neural network Layers. That’s just getting started. The code is deliberately well organized by SOLID architecture and design principles.
- Things that change for the same reasons live together. Guess where you’ll find the introduction to the Layers API?
- Code is kept concise (yet, not terse), which makes it hard to get lost in the sauce. See
layers/rnn.py
for a brilliant example of this. - The library’s internal modules (plus JAX) make the code read a lot more like the math it implements, than typical Python/Numpy/Tensorflow
- The project was designed to be expressive and easy to learn from. Just look at this implementation of BERT.
- There are unit tests galore, living right next to the code it tests.
We can all grow a bit complacent and let these practices slip. But here they stand. Well worth your time to dig in.
Another quick note: a lot of the complex model implementations are oriented around language modeling, rather than computer vision (CV) stuff. If you want CV tutorials, Tensorflow prime has you covered. I’ve been looking for more legible NLP code, so this is a huge win for me.
Conclusion
In spending time reading the code, I’ve gotten a feel for how Google Brain (who has been tremendously successful in pushing ML forward) thinks about their work; I’ve found some great, yet concise, papers that have pushed sub-fields forward; and I’ve seen a new perspective on how to write code for someone else to read. (Maybe I’ll write about that next.)
So if Trax turns out to be of use to you, that’s great! If, 8 months from now when you’re starting at Google AI, it’s the one resource that you needed, maybe remember who helped you out 😄 If you’re not impressed with the code base and NLP isn’t your thing, I appreciate you reading this far.
Till then, be well.