“Simplicity is the ultimate sophistication.” — Leonardo da Vinci

Data Science Minimalism: Less is More!

--

“Minimalism is a tool to rid yourself of life’s excess in favor of focusing on what’s important — so you can find happiness, fulfillment, and freedom” — theminimalists

Data science is the art of finding a balance between computer science, statistics and business. As if this wasn’t hard enough, new programming languages and frameworks are developed all the time; the market is filled with job postings with endless requirements and the hype around AI grows by the minute, making it really hard to keep up without making sacrifices. But can learning something new be considered a sacrifice? Well… yes!

The problem

As humans, we are creatures of adaptation. We can learn new stuff if we feel motivated but we can also forget what we don’t use. That makes the decision of learning something new much harder, as we have to compare the benefit of selecting one thing over the benefit of selecting another over the benefit of maintaining existing knowledge. Having strong judgement skills when it comes to deciding what to do with our time is one of the most important skills a data scientist can have. Let’s start by analyzing some challenges that a data science learner has to face.

Debunking

The urge to learn something new can be driven by the promise of money and success but it can also be driven by curiosity. The term machine learning doesn’t have the same meaning to someone making his firsts steps in the field and someone that has great experience in that field.

Trying to demystify the hype is a great learning motivator. After understanding the concepts and get used to doing the work, you understand that it is just a set of tools. Sure you still enjoy the process, but the enjoyment comes from solving a problem, not from the hype around the specific tool.

Instant gratification

Learning something is done in steps. You cannot understand (and you are not motivated to learn) low level concepts; you have to start high level. You have to start by understanding the problem and then go deeper. The thing is, as you go deeper, things get harder.

The high difficulty isn’t necessarily derived by the concept itself. It might be due to some gap in knowledge that has to be covered or the fact that those providing the information assume that you are willing to put in the effort and cover the missing pieces by yourself. These and other factors make it harder to go deeper into the truth. Learning on a superficial level though can provide a quick dose of satisfaction; you get to understand things with minimal effort and make progress faster. This makes you feel better in the short term, but going deeper will help you in the long term.

Unlearning

While learning a new tool or approach, our brain is “struggling” to unlearn an old one. If you have tried to park a small yacht you know exactly what I am talking about. It’s hard to coordinate your movement while parking as it requires the exact opposite steering you would apply while parking a car. Your body is inclined to perform the same movement as if you were driving a car, to the point that if you had never learned how to drive you would probably perform better. The same applies to switching between keyboards, operating systems, programming languages and frameworks. I like to think of the brain as a bucket of “machine” learning models. As you increase the samples from a particular activity your brain learns to predict what you should do and the performance increases with the number of samples. If the training set starts changing though, it takes time for the brain to adapt as the size of successful feedback is way bigger than the size of negative feedback. So following this rationale, we can say that the more time we invest in a tool, the harder it gets to replace it.

Job market

Besides our own demons, we get pushed to learn new things on a superficial level from the job market. According to Dan Ariely, “Big data is like teenage sex: everyone talks about it, nobody really knows how to do it, everyone thinks everyone else is doing it, so everyone claims they are doing it...”. This statement does not apply only on big data though. Many firms will fill the job posting with clutter so that they maintain a high-tech profile or to filter out weak contestants (which may have the opposite effect, but this is another story).

Social Media and Ads

Once you read one or two machine learning articles, ads start popping up regarding MOOCs on deep learning and computer vision, offering to provide you with all the skills you need to become a data scientist in one month. Even if you don’t click the ad you may still have second thoughts about whether you should know about this technique or another. Then you are tempted to start reading articles about it, watching tutorials on YouTube, etc.

While companies are usually selling goods or services that have demand, that doesn’t mean you should stop working on your current goals and learning projects, just because something is marketed as more promising. Chances are you are not going to remember most of this new knowledge after one or two days.

Solution

Understanding the source of the problem is half the solution. Accepting that you have limited resources at your disposal is vital to making the right decisions. Here are some ways you can optimize your data science learning strategy:

Stick to one programming language

While learning multiple programming languages might be necessary for a software developer, this is usually not the case for a data scientist. Your job is not to build the most optimized software. Your job is to provide strong evidence that an idea is worth pursuing, to create knowledge from data, understand its business value and find the perfect way of communicating it.

Being an expert in one programming language (personally I prefer Python) will provide the skills to write really good code, given the constraints of that language and will enable you to work well getting brilliant software out into the world, working alongside a software engineer.

Select your tools wisely

The same approach as above applies to all the tools you are using to do your job. You don’t need to be an expert in 5 different plotting libraries. Having a simple one for the fast day-to-day tasks (such as seaborn, matplotlib) and then something more elegant for converting your visualizations to beautiful dashboards (I highly recommend plotly and dash!) is a good way to go.

You will frequently deploy your applications on the cloud so your will need to know how to use the terminal. Personally, I prefer using a MacBook as I get to use the same commands and I don’t have to spend a lot of time on installing the necessary software (e.g. using brew).

Optimize your code’s usage

You don’t need to write the most optimized code, but you do need to optimize its usage. Every minute spent on rewriting the same code could be spent on learning a new algorithm or studying about your business domain.

Organize your code in functions and classes during each project. You might lose some time doing that in the beginning, but think of it as an investment. As time goes by you will develop your own code base, code style and you will be amazed at how often you will be reusing the same code in the future.

Respect the pyramid of needs

Mastering the basic concepts of machine learning is really important for multiple reasons. These concepts are characterized by simplicity, interpretability and robustness. Moreover these techniques are usually built on top of really strong but simple mathematical tools that are used irrespective of the domain/scientific field and are unlikely to go out of fashion. By understanding these tools you can use them independently to build your own algorithms and to form your intuition on how they work and when is the best time to use them. This knowledge can then give you insights on the problem you are trying to solve. Finally, chances are that if you skip the introductory techniques in favor of the highly sophisticated ones, you risk using them in the wrong way.

Revise once in a while

Knowledge that is not used is going to be forgotten. Create tutorials or blog posts and share that knowledge with people that can gain from it. This way you contribute to the learning process of others, get to refresh your old (but valuable) knowledge and even develop a different point of view on the subject.

Find your balance between exploration and exploitation

During your first steps in data science make sure to explore before you specialize. Understand the basic concepts in depth and then apply them in different fields so that you get to know your preferences. Create many short term projects that can help you learn something new but are not so hard that you quit or lose valuable time drifting around.

Understand your strengths and study the market. Try to find the sweet spot between your strong points and what the market needs. For each piece of knowledge that you are willing to invest time beyond the exploration point make sure that it will be future-proof. At some point in the future you might need to shift from one domain to another or replace an old tool with a new one. Trying to minimize time invested on knowledge that is going to replaced will definitely have a good impact on your career.

Conclusion

While the points presented above are tailored to the data science domain, the truth is they apply everywhere; from science to our spending habits. Minimalism is about removing the clutter and focusing on the things, habits and knowledge that really matter. If you are easily distracted you risk losing the bigger picture and all you have to do is ask yourself one simple question: “Do I really need this?”.

If you liked this article and want to be part of our brilliant team of engineers and data scientists who produced it, then have a look at our latest vacancies here.

--

--