I want to develop mastery of core ML algorithms in TensorFlow, and to be able to quickly convert research papers into well-written code. What should I do over the next 3–6 months?
This prompted an interesting discussion, the summary of which I hope is useful to others here.
To become solid at developing (and potentially then expanding to researching) machine learning algorithms, I’d recommend spending time doing the following:
Write a diverse range of models from scratch
For truly understanding how something works, nothing beats getting down to the nuts and bolts (it’s why many Computer Science courses still teach assembly and hardware). Writing a model from scratch will help you to appreciate every design decision and library function employed. It also helps commit to memory how that model works.
Furthermore, debugging a model forces you to understand what it is (/is not) doing, why it’s doing that, what its limitations are and how to resolve common problems.
It’d recommend picking a non-trivial data problem (e.g. not MNIST or Iris) so that you run into more real-world relevant challenges (for example, class imbalance, noise, intractability, different accuracy metrics, data-cleanup and pre-processing).
If you really want to learn (and experience some pain!) pick problems that lack a tutorial/public solution. These have no easy short-cuts and will push your abilities.
So that you’re flexible and can employ a wide range of tactics, I’d suggest getting familiar with all the major ML architectures:
- Dense layers / Regression
- Convolutional neural networks
- Recurring neural networks
- Reinforcement neural networks
- Embeddings (e.g. collaborative filtering, search)
- Bonus: Neural turing machines
Build modular, testable, assertive/typed code
Machine learning libraries tend to encourage monolithic, hard to read, hard to test code. Fight this urge!
By writing more bullet-proof code you will be able to write working models faster.
You want to make code that is friendly to others, likely to work and will give an easy-to-understand error instead of a pesky zero percent accuracy.
- Build your models out of many small functions that tell a story
- Include as much static and runtime checking as possible (e.g. include assertions that tensors are the shape you think they will be, assert masks are indeed the right format)
- Include unit tests of sub-modules (e.g. does your memory read module correctly retrieve values? does the language tokenization reliably encode-decode values?)
- Use well-tested library functions instead of rolling your own when possible (e.g. explore your library’s utilities!)
Try multiple tools and platforms
There are a lot of great tools and platforms out there now to speed up your work. For example:
- Try out all the different tabs of TensorBoard (the projection tab is really handy when computing embeddings! Try generating your own label dictionary for it)
- Try training in the cloud (e.g. with FloydHub, Cloud ML, SageMaker)
- Try a different machine learning library (e.g. PyTorch vs TensorFlow, check out Keras’s Model class)
Read/implement ideas from research papers
First of all, getting into the habit of reading research papers is a great way to:
- Get good at reading maths and cutting edge computer science
- Hear about new ideas
- Find inspiration for projects
Twitter is currently a good place to discover new papers. Here are some ideas of handles to follow.
Next, try implementing things from papers. Even if implementing a whole research system is daunting, there are smaller ideas you can employ in your work. For example, I was struggling to find a learning rate that would successfully train an embedding model, and a 5 line implementation of PercentDelta solved my issue.
Build a distributed training system
Many real world problems/datasets are too big to run on your laptop in ten minutes. Distributed systems are a whole separate beast (and possibly harder to debug than deep learning models!)
Whilst building an enterprise scale distributed training system is a large endeavor, more friendly sized projects are easily possible:
- Try using a distributed queue for training/prediction data (e.g. Kafka, Rabbit MQ, Firebase)
- Try using multiple computers/instances for training (e.g. a Kubernetes cluster, AWS instances, your friends’ laptops)
Personally, I got into building distributed training to help with a genetic algorithms/neural turing machine experiment.
Run a model on a limited device
Often real-world deployments have limited resources (e.g. it must run in a browser, or must compute answers very quickly for users).
Try one of the following:
- Get a model you trained predicting on a phone
- Get a model you trained predicting in a browser
- Make it possible to train a model on a phone/old computer
Write about the things you learn
Writing and teaching force you to shine light on the gaps in your knowledge. They are both valuable skills, as well as rewarding and fun.
Here are a few common formats you can try:
- A lab report of what you did, what the results were
- A short technical QA (e.g. how to solve a common pitfall, get around a bug)
- A tutorial
- An explanation of a complex concept
- A presentation of a new finding
Practice finding interesting problems
Spotting a good research title/problem is a skill in itself. You want to find something that is:
- Novel (has not already been solved/done)
- Interesting to others
- Possible with the time, skills and resources you have
It can be hard to find things that fulfill all of those criteria (often increasing your skills and resources is very helpful). But with each project, you can reflect on how it went and then hone your skill.
Have fun! It’s enjoyment and passion that will drive you through all this, particularly if you’re doing it in your own free time. Find activities, problems and technical architectures that excite you and follow the rabbit hole!