Machine learning for engineering teams: from algorithms to ecosystems
Machine learning is all the rage and sure, it’s fun to be a tourist at ICML, drink up all the exciting new developments, come home and tinker around with the magic new techniques. However, with apologies to Thomas Edison, the life of ML in the enterprise is “1% algorithm and 99% perspiration”. And successful implementation of ‘ML inside’ products takes a team of committed professionals, not a couple of lone data scientists.
This is why most Wednesday lunch times over the last four months, you would have found me teaching a ‘Machine Learning Learning’ course to my good friends and colleagues in software engineering. I’d like to say this was altruistic behaviour on my behalf but it really wasn’t. For one thing, I throughly enjoyed it and for another, as someone fervently interested in building and deploying ML pipelines to make products more personalised, more engaging and more useful, I know my life will be a whole lot easier if more engineers understand the practical basics of the machine learning lifecycle. And it doesn’t hurt that my engineering colleagues are smart and fun and learn like lightening.
Because this was all being done on magic time, I put some thought into how to achieve the most engagement and education for the least effort. The course was well received and the spin offs are already significant so I wanted to share our approach.
Watch at home, discuss in class
As the mother of three, I’ve long been a fan of Sal Khan’s Khan Academy and followed the research into the effectiveness of the ‘flipped classroom’. Given an audience of people used to ‘learning by doing’ I thought this approach could work well for us — the participants would watch video lectures during the week and we would then discuss the content in class, brought to life with examples drawn directly from existing and emerging products in the company that we were all familiar with. Lucky for us, Google had recently launched their Machine Learning Crash Course and it looked like a good content match.
We advertised internally and the course quickly grew through word of mouth. Each week I would post the lectures and assignments that needed to be covered in pre-work and each Wednesday we would congregate and discuss. An active Slack channel helped learning and engagement — we had a mix of folk with some experience and complete newbies and we’re lucky to have an honest and open culture where it’s quite OK to say you don’t know something so there was a lot of questions, quite a few answers and a lot of crowd sourced content as people shared additional blogs and Youtube videos that they had found useful while covering the material. I think the peer support (pressure?) of a visible and active learning group helped a lot as most people were completing this learning on their own time — lots of evenings and weekends.
Before class each week, I would pull together discussion points for all the concepts I thought were hard to get on first viewing and we would talk about those.
- Like the fact that y = w₁ x₁+ w₂ x₂ + w₃x₃² is still a linear equation.
- Or what it is exactly that you are minimising when you perform gradient descent.
- Or how to interpret a 300D word embedding.
Most weeks we would divide the room into pairs and perform yet another flip
“OK Participant A, you explain to Participant B why such and such is true. Five minutes. Go”.
That approach certainly flushed out those who hadn’t managed to get to the homework during the week and provided some good incentive not to wing it next week. A highlight from these discussions was one colleague illustrating the loss surface to another with a curtain cord.
Our 90 minutes sessions never ran short — on the occasions where I mistimed (underprepared?) it and ran out of material, there were more than enough questions from the floor.
Going beyond the course work
In addition to making use of the excellent Google resources, we also customised discussions to suit ourselves. Given the developer audience, we spent half a session covering the advantages and pitfalls of machine learning frameworks. Given our company’s core business model, we dove deep into natural language processing. We also had some pretty spirited discussions about linear algebra and why uniform initialisations of neural net weights were a bad idea.
Because of the flipped classroom approach, each session only took me about an hour to prepare — I wasn’t writing lectures, I was further illustrating interesting and tricky points from the week’s content with examples drawn from my own experience and where possible from internal systems and problems we were all familiar with.
We also leaned on the work of the brilliant educator Andrew Ng (lots of talks available for free on YouTube) and video lectures from a number of other researchers. In one session we spoke about ethics in AI, and watched excerpts from this excellent ICML 2017 keynote from Latanya Sweeney.
Things went well so we have a couple of next steps:
- Run the same course again for a fresh group of developers. Con: I haven’t figured out how to clone myself yet and because we leaned heavily on past experience and anecdotal discussions in the classroom discussions this format doesn’t lend itself to a ‘train the trainer’ approach.
- Run a shorter follow up course on implementing ML at scale, particularly automating retraining and building robust inference pipelines.
Do you have other successful approaches to bringing software engineers up to speed on building and supporting ML pipelines? I’d love to hear your thoughts.