Under-appreciated skills that are more important than Deep Learning

Melvin Varughese
Data at Atlassian
Published in
8 min readAug 24, 2021

… or how to be a more effective Data Scientist!

Photo by Quino Al on Unsplash

This article is the second of a two-part series on deep learning. I discuss the limitations of deep learning in the first article of the series: Nope, Deep Learning is not enough.

Are you discouraged by the limited impact of your work — despite the sophistication of your analysis and the many hours of earnest toil?

Alternatively, are you thinking of a career in data science and believe that mastery of deep learning will inexorably lead to successful and rewarding work?

If you answered yes to either of these questions, then this article is addressed to you.

“To a man with a hammer, everything looks like a nail.” — Mark Twain

Data science attracts curious, highly-technical people that keenly follow the latest innovations. It is thus not surprising that many people implicitly assume that deploying a cutting-edge method will inevitably lead to a large impact. Admittedly, I have been guilty of placing various machine learning methods on a pedestal, and I call out deep learning in particular because I suspect it is the technique that is over-zealously applied to many problems. These projects often end up failing, and I have found that numerous failed data science projects arise from data scientists focusing too much on methodology. I define a project to be methodology-led if the data scientist has decided to use their favourite machine learning method without giving proper consideration to the context of the chosen problem. To give your data science projects the best chance of success, you should instead be strategy-led.

The under-appreciated skills that I list below encourage the data scientist to take a strategy-led approach.

1. Business acumen

Photo by Ryoji Iwata on Unsplash

An effective data scientist knows when to develop a sophisticated solution and when a rough-and-ready approach is more appropriate. It is all too easy to get wrapped up in improving a target metric and to lose sight of whether the project will deliver a good business outcome. For example, if I wish to estimate the potential impact of a proposed team initiative, it doesn’t make sense to immediately create a user-engagement model.

Oftentimes, a simple analysis of the volume of targeted users interacting with a proposed touch-point will allow me to place an upper bound on its potential business impact. If that impact is not high enough, it won’t be worth my time investigating the area further and I can move on to another task. A nice byproduct of a simple analysis is that it is easy to communicate any findings to external partners. If your collaborators understand your analysis, it becomes much easier to convince them to make a decision that is guided by the data.

To be effective, you also need to have a good understanding of the key challenges that your business is facing. Developing this understanding will help you to identify what sorts of insights will be invaluable to the business and, thus, is a great starting point when brainstorming and prioritising potential data science projects. A great way of developing this business acumen is to build a richer collaboration between data scientists and other key business stakeholders. The data science team should be highly responsive to the needs of the business.

As a growth data scientist at Atlassian, I work closely with people in a variety of departments and functions. My team helps Atlassian customers collaborate more effectively by creating third-party integrations for Jira and Confluence. Such work is spearheaded by a product manager, who has an intimate understanding of the market and directs strategic team initiatives. The project manager is often the first to identify risks to a project or a potentially lucrative opportunity. I lean on his insight to tackle the right problems. My responsibilities as the team’s data scientist include producing visualisations of the third-party tools that our customers use as well as profiles of the types of employees that use various integrations. The development of such dashboards is done in close collaboration with the product manager to ensure that they provide the right insight to formulate future strategies.

2. Exploratory Data Analysis (EDA)

Photo by Aaron Burden on Unsplash

Once an important business problem has been identified, it is useful to create an exhaustive set of hypotheses in the problem space; each of which, if true, will suggest a different course of action for the business. We then need to find an appropriate data source that can be used to decide which of the competing hypotheses is most likely to be true. If you are unfamiliar with the dataset, it is essential to perform Exploratory Data Analysis (EDA). That is, investigate the nature of the dataset whilst discarding your preconceived notions and assumptions about the problem space. This is not news to data scientists: EDA is a recognised component of any data science workflow. However, far too often the EDA is performed in a perfunctory manner — a few superficial checks are undertaken to ensure that a particular statistical model can be fit to the data. This can be a big missed opportunity.

“Exploratory data analysis is an attitude, a state of flexibility, a willingness to look for those things that we believe are not there, as well as the things we believe might be there.” — John Tukey

Performing EDA proved to be extremely helpful for an A/B test I ran. Analysis of A/B tests can, at times, be almost mechanical. However, for this particular experiment, some puzzling phenomena prompted me to run a deeper EDA. A combination of detective work and chats with the developer suggested that some of the instrumentation around key events could be unreliable — potentially resulting in the impact of the new feature being dramatically underestimated. The developer is still investigating the issue, but without the EDA, these instrumentation issues would not have come to light. Consequently, we may have wrongly decided not to roll out the feature due to its perceived modest impact.

In the case where the hypothesis you wish to test needs a complex model, EDA can be used to build simple model baselines against which to compare your more sophisticated models. Having such model baselines is important: if your model is unable to convincingly beat the simple model baseline, then something could well have been missed in the data science workflow. It certainly implies that the sophisticated model is not worthy of deployment.

EDA helps the data scientist build an intuition of the dataset so that they can quickly identify when things have gone wrong. It is also useful for engineering the right features as well as for data preparation as it can reveal outliers, heavy tails and missing values. It also can give us peace-of-mind: firstly, that we have chosen the right model for the data, secondly, that the results are valid and, finally, that the conclusions apply to the business context that initially framed the problem.

EDA can also help to refine the business questions that need to be addressed. Business partners are, of course, not immune to misconceptions about the nature of the business. In such cases, the EDA can yield surprising insights that can be extremely valuable. They can, for example, point to new opportunities or reveal a fundamental misunderstanding in how the customers are using the product.

3. Communication

Photo by Yaroslav Shuraev from Pexels

Communication with your key stakeholders is key to maximise the impact of any project. A data science project shouldn’t be an academic exercise. Rather, we need to convince external stakeholders within the business to make data-informed decisions. Whenever I give a data science presentation, I will face the temptation to over-share the technicalities of the project. Partly, this stems from me finding many of the technical details interesting, partly it may be a misplaced demonstration of rigour (to justify my conclusions) and partly this may be a desire to showcase the amount of work that went into the project. However, your stakeholders will likely have a different interest in your presentation and covering technical details will almost always detract from the key message.

Effective communication involves storytelling. As a presenter, you need to be able to empathise with your audience. Is your audience familiar with your work? Consequently, what elements of the project are relevant to them and what actions do you wish to elicit from them? Your goal should be to make the audience feel smart and empowered to make the right decisions based on a shared understanding of the data. Restrict your story to a few, succinct points. To this end, it is helpful to start with the key takeaway that you wish to impart to your audience and then build the logical steps leading to that takeaway. In the process, strip away the details to just what is needed to tell the story. Use thoughtful visualisations to communicate succinctly. One excellent example of thoughtful graphics is shown in Hans Rosling’s talk on the world population in 2100.

It’s not glamorous, but it’s worth it

As a rapidly evolving and high-profile field, data science has accumulated a diverse range of tools and techniques. Many of these tools are glamorous and elicit a lot of excitement. However, as data scientists, we must ensure that we also pay sufficient attention to the less glamorous aspects of the data science workflow. Many of these skills are critical to ensure that you maximise the impact of your work.

I still am excited by new data science developments and will eagerly devote time to understanding a new machine learning algorithm. At Atlassian, I’ve had the opportunity to work with some very talented people from whom I’ve also been able to learn some of these aforementioned under-appreciated skills — finding a good mentor can be a huge help! Consequently, I’ve been able to increase the impact of my work. Being able to grow and develop in what is an evolving set of skills is one of the marks of a good data scientist. Make sure your growth is not just restricted to your technical skills!

--

--

Melvin Varughese
Data at Atlassian

Helping to drive growth @ Atlassian. Visting Academic at University of Western Australia & University of Cape Town. https://www.linkedin.com/in/melvinvarughese