What I learned from spending a year as a solo machine learning researcher

Published in

Octavian

18 min readAug 14, 2019

I recently found myself needing a new challenge in my life. Bored of what my day-job had become, I stuck out on my own to pursue machine learning research.

The year was challenging and rewarding. There were many failures and a bunch of unexpected successes. It took an entirely different path than I imagined it would.

Here I’ll tell you about the experience: What I did, what worked, what failed, what I learned, what I would do differently. If you’re hankering to move into machine learning, or wondering what it takes to be a researcher, hopefully I can answer some of your questions here.

Where I started

The learning and challenges at my work had tapered over time. Where before there were great unknowns and endless skills to learn, now there was a regular pattern I could execute on each day. I get most excited about problems with no known solutions.

I was increasingly spending my free time reading academic papers and sketching out solutions to open problems.

Sitting on a tree stump in a forest, waiting for a friend to return, it hit me: I had to quit. Such moments of clarity are strange and rare, and never come on command. The next morning I told my team I was leaving.

At university I studied an Undergraduate in Computer Science and a Masters in the Foundations of Computer Science and Mathematics. I’ve always had a hunger to study further, although never found an environment that appealed to me.

At the time I quit, I was fortunate enough to have some savings that made it less scary to give up full time employment. I did paid work two days a week and reduced my outgoings to make my finances break-even. My partner helped support me. I worked many of my weekends.

Warm-up exercises

Prior to quitting my job, I’d started to work on miniature research projects — things that were interesting, could be written up online, and would not take too many hours to complete.

This provided a big boost to my confidence in pursuing this new path. By conducting and publishing them (here’s one of them) I showed myself that I had the basic skills of research.

I’d definitely encourage anyone who is considering getting into research to find some small projects and see them through. It’s fun and lets you experience the whole process.

Getting started

The first thing I did after moving to full time research was to setup Octavian.ai: a place on the web for me to write. I put together a Webflow website, a Medium publication, Facebook and Twitter handles and a Discord chatroom. Whilst I spent more time on the aesthetic than necessary (I also love graphic design), the website has been a very helpful focal point over the last year. I often send people to it when they’re interested in my work.

I also found having a brand useful to tie together all our work and various speaking engagements. I believe the brand has helped us develop our community as it helps the organization look more coherent and professional.

I didn’t raise any money for this effort. It was a vehicle for personal exploration and it would have been inappropriate to raise.

Research is not magic

One of the earliest barriers I needed to overcome was the feeling that performing research is a mystical activity that only a select elite can perform.

I’ve now heard the same feelings from a number of students, researchers and engineers. Even published researchers occasionally admit on Twitter to feeling like imposters, that they are not yet “real researchers”.

Whilst I’m still an early student in the field of research, I’ve heard so many people feel negatively on this point that I want to put out a statement: Research is not magic. It is the process of taking a problem, defining it, listing solutions, trying them, seeing what works, and documenting this. Any intelligent, hard-working individual can pursue this path.

Furthermore, it is easy to be paralyzed by the fear that you don’t know enough, or are not worthy: instead begin the practice of research and give it time to develop.

It takes a long time to produce research

One of my biggest take-aways from the year was just how much time, effort and resources it takes to produce research. Specifically, I mean to produce successful results: an approach that performs better than others, or an insight into an existing problem that helps others.

I’m well aware of planning fallacy from my professional life:

“A phenomenon in which predictions about how much time will be needed to complete a future task display an optimism bias and underestimate the time needed.”

However, it took working on research projects to really make this real to me. I found each project went through an enthusiasm cycle (not unlike the startup hero’s journey):

Honeymoon: Intense excitement, curiosity and optimism for the problem and potential solutions
Getting down to work: Breaking open the code editor, marshaling the datasets, sketching out the experimental architecture. Satisfying linear progress
First road bumps: Solving this problem is harder than initial enthusiasm implied. The data needs more work. The first ideas turned out to be bad ones.
The long drive/Trough of sorrow: Persistence becomes the sustaining force. More bugs need fixed. More test-cases need written. More variations need tried. There might be success over the horizon. Or not.
It. Finally. Works.: A moment of joy amidst being totally done with this project. You may never reach this point, or you might thoroughly prove that your approach will never work. If so, return to step -1
Just get it out: The push to write up the work and hit publish. By this point, your only motivation is to be free of this project. You hit Publish and walk out of the office.

In my mental estimation, I usually account for only steps 1 through 3 of the project. My memory has blanked out the trough of sorrow onwards, probably to protect my capability to be enthusiastic.

I’m now much more cautious about the scope of project I take on. I’ve a sort of scoring system:

New dataset? +2 points
Dataset too big to fit on one machine’s memory? +1 point
Implementing from a paper without code? +1 point
Architecture that doesn’t fit neatly into library: +1 point
Training on multiple GPUs? +1 point
Training on a cluster? + 3 points

I spent a lot of time building data and training infrastructure. For example, the majority of one project was converting a model to run on a TPU cluster (that was the only cloud resource I had then) so I could test out my variations of an attention function. After a lot of hard work, I proved a negative result.

I choose to forgo the academic paper format, instead blogging on Medium. This was a double edged sword: it sacrificed a level of rigor for having the time to work on more projects. Debating the pros and cons with my friends, one remarked “if you write a paper most people read the blog article anyway”, which settled the matter.

Thinking beyond my own personal time, I now vividly appreciate the resources to run a research organisation. Even to write one paper, you ideally want a team of collaborators, a few months, a lot of GPU time, then a few people to help with write-up.

Like startups and the book industry, ML research is a hit based model — a small number of papers will garner the most interest. Research is like a raffle: some tickets will have prizes, but you don’t know which ones. The best way is to buy a lot of tickets. As a solo-researcher, I had to carefully choose a couple of tickets.

Training without money

The rising barriers to entry in Machine Learning research

The AI winter was partially thawed thanks to the heat of GPUs — and many of them.

This graph from OpenAI shows the clear exponentially increasing trend in training resources i.e. producing state of the art results is getting increasingly expensive:

At the start of the year I gathered up what resources I could: a stash of Google cloud credits and I put some of my own money into hosting. This paled in comparison to what any research lab would have access to. Resources turned out to not be a bottleneck for a few reasons:

It takes a lot of engineering time to build things that justify a lot of resources, which acted as a natural regulator on consumption
I’ve been careful to only use resources when they will make a difference
I picked problems that are more mathematical/theoretical in nature (as opposed to “scale the model bigger” under-fitting problems)
In some cases you can pilot with a smaller version of a model, and then spend the training budget once it’s been sufficiently debugged

As the year progressed some really kind people and organizations donated resources:

FloydHub: A really simple way to run your machine learning experiment on a cloud machine with GPUs
Google TPU Research Cloud: TPUs are now my favorite way to scale training really big. The programming model shards your training across batch, so to use more machines you increase batch size.

I’m really grateful for the above support, it made a lot of work possible.

Recruiting without money

I really enjoy creating things that have a life of their own. As a kid I wrote multiplayer game engines, procedural worlds and build BEAM robots. As an adult, I started trying to create companies. It brings me a lot of joy that SketchDeck every day brings together a lot of great people to produce beautiful work.

As part of the Octavian.ai project, I wanted to create a community of collaborators. It turns out (unsurprisingly) it’s hard to convince people to work with you without money to pay them. They have bills to pay.

However, some people did step up and contribute:

Andrew Jefferson, a life-long friend, coded, reviewed, gave talks, published articles and slides, whilst working on his day-job
Ashwath Kumar Salimath, who joined for the summer, helped with a bunch of articles and published his own.

Thank you both!

Furthermore, as the year progressed, many people joined our chatroom. A community of people interested in machine learning on graphs has slowly emerged. We’ve had a lot of interesting discussions, and I’m really glad that people interested in this topic have found each-other.

Marketing without money

As deep learning has grown from an obscure corner of academia into a billion-dollar industry, we’ve seen the curious intersection of academia and global marketing budgets. Everything got quite pretty. Many conferences and bootcamps emerged.

I’ve put zero dollars into marketing Octavian. I’ve found over the year that our best ways of reaching an audience have been:

Google searches bring in most of our Medium blog traffic
Our free machine learning on graphs course had a lot of sharing through HackerNews, Reddit and Twitter
Talking at Meetups and conferences brings in traffic and valuable interactions

Machine learning on graphs is pretty niche. I’m most interested in finding people in the world this really matters to, rather than trying to scale for eyeballs or likes.

Surviving without a supervisor and without a team

One thing I really undervalued at the start of the year was working with a supervisor and team. I sometimes joke that my year has been “the first year of a PhD, except without a supervisor, by myself, at home”.

Coming from the world of being a startup founder, I have a strong lone-wolf mentality. I generally don’t want to wait on external factors, instead I believe that through perseverance I can get to wherever I want to be.

I was fortunate to have a few close colleagues, and over the year a growing community, to support my efforts. However, the bulk of my work was done in relative isolation.

I now think I could have progressed faster with a supervisor. This would have helped in a few ways:

Posing good questions for me to work on
Directing me to good solutions I wouldn’t have thought of
Pressure/inspiration to do better

Being part of a team would have helped me:

Exchanging and developing more ideas
Testing ideas faster
Learning the craft from each-other
Having more bandwidth to try more things
Having more bandwidth to write more thorough reports

It is lonely working (mostly) alone. You need to be mentally resilient and strongly believe in the destination.

Finally, on a vulnerable note, it’s very hard to do this sort of work when your non-work life is troubled. Dealing with the twists-and-turns of life consumed sufficient energy that for a few months it was hard to keep up solo research.

Finding good questions

This is the biggest skill I’ve developed over the year. I now believe that this is the alpha on research and the secret to many individuals’ success.

Choosing questions to work on is an art. So many people have come to me over the year asking for help with picking their own research question that I wrote a list of suggestions.

At the start of the year, picking a question to work on was bewildering and fraught. Now I have a long list of areas I want to explore and ideas of what the metrics and experiments would be.

I think it necessarily takes time to get better at this skill. It’s the intuition (from past experience) of what questions would fit the research process well, what fits your resources, and what solutions are likely to unearth some success.

Here are some suggestions for developing this skill:

Competing on public datasets / known problems keeps you honest — you have models to beat and established metrics of success.
Although creating your own dataset for a new type of problem is valuable too; it’s just easier here to be complacent on whether your solution is truly novel, and it’s harder to get other people to take interest in your work.
Read a lot of papers and see what problems they took on, how they tried to solve them, and how they measured success.
Talk to people.
Write research proposals and get feedback on them.
Look for problems with real world value — if you solve these, there is naturally bigger consequences and interest.
Listen to your curiosity: as a researcher your motivation is a valuable catalyst. If you feel like something is under-explored, go check it out.
If you are expert in two disparate areas, look in their intersection. You may hold some keys not many other people have.
Working the full research project lifecycle will give you insight that cannot be gained through other means. Compare your initial thoughts to the final results.

Writing is lifeblood

100% of your un-communicated ideas will be ignored. And the best way to improve at writing is to write.

In writing (either ideas or results) your imprecise thoughts are laid bare before you. Whilst writing and editing, you shape the raw clay and give it structure and rigor.

My writing ability and research thought process developed a lot through writing many articles over the year.

Technical material is evergreen

A lot of my writing prior to Octavian had been of the viral-marketing sort: You write the piece, launch it, then some get a lot of sharing and many die in obscurity.

Octavian’s articles have been the opposite model: You write a piece, launch it, not much happens, then it slowly builds up momentum. Each month Octavian receives 33,000 minutes of reading on Medium. None of those articles (except a couple) received much virality, but many crept up to high rankings in Google and bring in a regular stream of visitors.

You do not know what will resonate with your audience

Another benefit of writing is it provides engagement statistics. I’m endlessly fascinated to see which articles do well, and which don’t. I’m also often surprised.

One of the first articles I wrote, “How to pick a learning rate”, currently has the most views of any Octavian article (66k). I never would have guessed that at the time of writing.

Reading through our statistics, I’ve found the following trends:

Teaching material is more popular than research material
Basic skills (e.g. choosing a learning rate, introduction to ML on graphs) are more popular than more niche skills (e.g. review prediction)
Non-technical articles generally got fewest reads (e.g. our mission, calls for research, skills to learn)

This all makes complete sense, with hindsight :)

Here are the most viewed articles, with statistics — notice the nice power law distribution:

On not having a PhD

Since undergraduate studies I’ve wrestled with whether to do a PhD. I spent a summer as an undergraduate working with team on a paper, and spent the last term of my masters working on a research paper for the course. Ultimately I never found the combination of environment, team and question I was looking for. The energy and vibrance I’ve found in the startup world appeals to me more.

I believe that not having a PhD has had a few downsides for me:

A PhD gives you time to read a lot of papers and practice research skills
A PhD gives you a supervisor and environment conducive to good research
A PhD is the expected entry-point to a range of jobs that now appeal to me

I’m still open-minded to pursuing a PhD. My cousin is well into here career and planning her second PhD, so I appreciate there are many paths through this.

The world desires to learn about machine learning

I spent a lot of last year exploring different ideas for another company. I really enjoy company building, and after starting SketchDeck I appreciate the importance of initial idea. Creating a company is a marriage, to both your co-founders and to the idea itself.

I explored many different ideas, but ultimately didn’t find a clear business case that aligned with my passions. A lot of technology in the space of machine learning (on graphs) tends towards becoming open-source libraries, the opposite of a business.

At the end of the year I took a step back and listened to the opportunities that had grown around Octavian. The answer was clear: The biggest “market need” we’d uncovered was people’s desire to learn about machine learning on graphs. Whilst this wasn’t something I wanted to turn into a business, I find teaching incredibly satisfying. I wanted to share what I’d learned. To this end, we’ve done three things:

Published the top hit on google for “Machine learning on graphs”, a comprehensive introduction to the different strands of practice and research
Spoken at a number of companies, universities and conferences about graph ML
Launched a free course on machine learning on graphs — the announcement of this went mildly viral, and over 1,000 people signed up for it (which was actually way more than I expected). We’ve been fortunate to have an active core of students work through the exercises and share answers together.

The joy of building a community

At the start of the year, Andy and I created a Discord chatroom to contain our conversations on graph ML. I posted the invite link publicly on Octavian website.

We’ve slowly accumulated members, and whilst it’s a quiet chatroom, we do have regular interesting conversations. People share new papers, discuss the exercise and articles, and also talk more broadly about working in the field of research.

I’m really grateful that we’ve helped some people find each-other, share and learn together.

What would I do differently?

Having completed the year, I can now reflect on the ways I might have done it differently.

First and foremost, I’ve no regrets. The greatest gift I can give myself is the gift of time. Having the space and time to follow intellectual curiosity and work hard on projects that matter to me is truly special, and I’m really fortunate I could do this.

For me this year started as a true adventure. The destination was unknown and the path would be guided by what I found along the way.

I had a number of loose goals for the year, some of which I achieved:

Prove to myself that I am capable of academic research. I think I achieved this. I learned a lot over the year about both the craft of building ML systems and also the craft of research. I also got a better picture of where I am along the path of research; I see myself around the first-year PhD student level: still figuring things out, getting the basic skills down. I was able to work through all the steps of research, and feel confident in directing them. I appreciate now the time and effort involved in creating a rigorous well-researched paper.
Create a machine-learning driven business. I failed on this goal. I explored a lot of ideas and talked to a lot of people, but never found the answer. This is partly because this goal is not focused on solving a real problem; I never found a problem that I badly wanted to solve.
Improve my skills at implementing machine learning systems. I definitely achieved this. Behind every published article (and a bunch of unpublished ones) lay a lot of code, reading papers and debugging models. I spent a ton of time on training and data infrastructure. My implementation skills improved immensely and I have a strong intuition for why systems don’t work and what to do about it.
Produce a piece of academic grade research. I fell shy of this goal. By the end of the year I had built the skills and list of research questions to attempt this, and my final big project (adding positional information to a self attentional generative adversarial network to achieve non-local coordination of human forms) did not produce positive results. I ran out of time to try more solutions. It did however teach me a lot about scaling up training on TPUs. I’d love to spend more time to achieve this goal.

One piece of advice a friend gave me during the year was to seek out the best people in the world for my interests and learn from them. This is on the one thing I’d push my earlier self to do. I’m still not sure how to go about this without committing to a PhD program, but that doesn’t mean it’s impossible. On the flip side, there is a lot to be said for striking out and doing something, instead of waiting for the perfect (external) circumstances. I had a happy and fulfilling year.

What I’ve been most proud of

Every so often I introspect on what my central goals are. One enduring answer is to “create things I am proud of”. I hold myself to high standard, which rises as I learn more, and thus I’m prone to dismissing all my work as not good enough.

In the spirit of combatting this, here are things from the year’s efforts that I’m proud of:

We’re now the first or second google result for a diverse range of topics such as “language and artificial general intelligence”, “machine learning on graphs”, “choosing a learning rate”, “learning multiple tasks with gradient descent”
We’ve spoken at two universities, two conferences (Connected Data London, O’Reilly Strata New York), a few corporations (Apple, Yelp) and a couple of meetups
We get 18,000 views per month of our 19 articles on Medium
We published 20 GitHub repositories with 344 stars across them
Our website is really pretty
We’ve spread learning in the world
We performed real small-scale research on a range of topics including: multi-task learning, learning rate choice, the limits of GANs, learning graph algorithms, translating between english and graph queries, the limits of neural attention, element-wise multiplication as an alternative to dense layers.

In aggregate, I’m really proud of the Octavian project. I’m glad I can pull up the website and see this rich and varied body of work.

What next?

I’m still working (with limited time) to publish the rest of our machine learning on graphs course. Our community and resources remain online for everyone (and will continue to do so).

I hope to find ways to keep this project alive, fueled by the passions of more people in the world. I get occasional emails from people looking to contribute, and hope some of them do start to publish work. I hope our writing continues to find an audience and benefit people.

Personally, I want to take all this learning and apply it to an important problem. If you’ve an interesting opportunity, let me know.

Acknowledgements

Thanks to everyone who helped me along this path, including Andrew Jefferson, Ashwath Kumar Salimath, Scott Dimond, my parents, Sandra Johnston, my friends, the community and the many people in industry and academia that lent a hand.

Octavian’s research

Octavian’s mission is to develop systems with human-level reasoning capabilities. We believe that graph data and deep learning are key ingredients to making this possible. If you interested in learning more about our research or contributing, get in touch.