Lessons From Reading 125 Papers on Machine Learning

Patterns in the Papers

Published in

The Spekboom

7 min readJan 20, 2020

I’m a mathematician and, for a long time, I avoided writing code on computers. It wasn’t for me. I loved theoretical problems, syntax not so much.

For years, I got on fine writing research papers on networks and building models without needing to code. But, having flown the coop of academia and set myself up as a tech consultant, I decided three months ago to give myself an education in machine learning, data science and AI.

I didn’t start with a textbook or a set of notes, instead, I went another route, working my way through a variety of research papers (like a graduate student) and logging them as summaries on science2innovation; a relatively new data science platform that aims to connect research to industry. In those three months, I have written 125 summaries, called Blitzcards, and read most of the underlying papers in depth. So, for what it’s worth, here are my observations and insights.

1. There are basically two things that you can do with data and machine learning.

There are essentially two things that you can do with data: Make predictions about behavior and classify patterns. Every paper that I read, which was not a theoretical advance, was about either prediction or classification tasks. That’s pretty much all that machine learning algorithms are used for. Simple isn’t it?

Prediction involves searching out relationships between a multitude of different variables, in data sets that are just too big for us humans to comprehend, and then building tools that can project those into the future.

We know, for example, that there is a relationship between commodity yields, weather, and profits on commodity futures markets. But, we have never before been able to analyze the relationship between historical data on coffee, weather, and the price on the New York Coffee exchange with as much depth as we can do now. Machine learning models learn to mimic relationships between the underlying variables, and given some of the variables, we can predict the values of others.

Pattern classification is a little different: Here the idea is that data has categories and that these categories can be labeled, these labels can be learned by a classifier engine, so the classifier can categorize new data according to its existing “knowledge” of what belongs where.

Take music, for example. We divide music into genres based on patterns in sounds. There are grey areas and nuances too, but the basic classification by ear has to do with rhythms, instruments, vocals, and harmony. In order to use machine learning to classify music by genre, what does an algorithm need to “know”? It needs to take sound waveforms (or parts of them) as an input and then, using labeled datasets, learns which waveforms belong to which boxes. Then, given a new music instance, it looks at the distance between this instance and existing instances and puts the new instance into the box that it happens to be closest to. Pretty human isn’t it? It’s how we classify things too.

Despite this distinction into just two types of tasks, the secret sauce, however, is in the use and interpretation of data, and in the questions being asked of the data by machine learning models. Read on for what’s happening in the real world.

2. Applications to real-world problems follow patterns.

Most of the research on machine learning is on the applications side with relatively fewer researchers working on theoretical aspects of the field. Real-world applications range from understanding whale calls and identifying plants to trading Bitcoin, saving batteries in electric vehicles, and making medical diagnostics. There is a huge range of applicability and the tools for working on applied problems in machine learning are very accessible: So, how does a real-world problem get solved using machine learning?

First, there is a need to have a data source; for obvious reasons, you cannot solve a problem using machine learning without having data to learn from. Data sources for applications include audio, images, CSV files, network data, video, medical records, cellphone gyroscopic data, and words. I haven’t seen too many papers that aren’t using one of these types of datasets. But, basically, if you can find a data source, then you can write a machine learning paper about it. (If you are starting out, Kaggle is also a good place to get to grips with different data sources and real-world applications.)

Next, there is the decision of building a predictive model, or building a classifier? And, what techniques can be used? There are plenty of articles on techniques to use for machine learning applications, so that’s something that can be left alone here in order to focus on the process. Once these decisions are made, the basic theme is to train a machine model on some parts of the dataset, and thereafter to use the model to answer the relevant question. The remainder of the dataset is usually used to verify the accuracy of the model, and to evaluate how well the question has been answered i.e, classification and predictive success. Then, rinse and repeat, for a new data source or a new data set or a new machine learning technique.

Most authors use multiple machine learning methods on the same dataset for the purposes of comparison. It is common for more than one technique to be used in a single paper, with the reporting of results including a comparison of which technique works best for the dataset. Then, often, there are discussions on data constraints; which is the case when there are limited data sources and noisy datasets, and finally, authors conclude on how knowing or being able to categorize is advantageous for solving the particular problem they chose.

As you might expect, it got very boring after a while, but a few papers really stood out.

3. There are very few outstanding papers.

Most scientific research is driven by a few people at the very upper end of the stratosphere. Usually, they make breakthroughs and everyone else tries to catch-up quickly. In the machine learning community, there are papers that have been cited thousands of times, that are just a few years old: What this means is that someone made a breakthrough and everyone else hopped onboard to integrate it into their research. It’s not good or bad, it’s normal for science. A few people lead the way at the top.

The most interesting papers, in my opinion, are (1) those that have pushed the community forward from a theoretical viewpoint and (2) those that have found ways to create new data structures that pertain to real-world problems that are not obvious.

As an example of the former, the theoretical advances in deep learning made by Geoffery Hinton and his co-workers are perhaps the most economically and politically important scientific breakthroughs in history. On the applications side, the work by Kira Radinsky on predictive modeling using storylines and the network relationships between words is truly remarkable: Using news headlines over a 20 year period a predictive model was created for anticipating catastrophes sequences or storylines such as droughts, followed by floods, followed by cholera.

But, there is good news too.

4. Humans are still needed.

At the moment most machine learning research is being done by people that are finding new applications to related fields; so ecologists are finding applications to ecology, biologists to biology, and so forth. (Mainly, this is driven by the speed at which easy to use applied tools aka coding libraries are created and shared within the community.) While domain-specific knowledge is important, arguably the most important skill is to be able to build and interpret models. Data is one thing, but the key to getting the most out of the machine learning “machinery” is to be able to ask questions that are useful and to translate real-world problems into mathematical models.

I concluded that my maths skills are still relevant.

After reading all those papers, it was evident that in order to use machine learning well, we still need to be sharp at figuring out relationships, because where there are causal relationships models can be built to make sense of the impact of variables on each other. We also need to think laterally about what constitutes non-obvious data sources or combinations of data sources, because these tend to make the most interesting applications. The very good research papers are those ones that combine a new model with a new data source (because there are only so many deep learning imaging papers that one can write!).

Finally, most of all, we need to trust ourselves, so that we can navigate the power of Artificial Intelligence, with the noblest ethics of our humanity, because, yes, there is a lot of “security” research out there that can be used to abuse our rights. And, the future will need lawyer/mathematician/computer science/politics graduates that have a deep understanding of tech and people to keep us safe in this brave new world.

I am a freelance mathematician and data scientist working on problems relating to networks, tokenomics, and the fourth industrial revolution. This article is my own opinion.