Early stages of data science adoption

Raúl Vallejo
All The Data We Cannot See
3 min readJan 16, 2019

--

So many applications, so little time. These are the times when I wish days lasted longer.

Dealing with the early stages of data science adoption has completely messed up our daily priorities.

Currently, I’m scrambling between the following tasks:

  1. updating all our previous (manual, excel based) analyses
  2. getting started with the code for data processing and reviewing security policies for data access
  3. laying out relevant applications for current yearly and quarterly strategies
No wonder prioritizing tasks has been so difficult

1. Updating all our previous (manual, excel based) analyses

This first task is important because it turns out that you can’t just drop everything and start coding right away. As annoying as it is, I’m back (hopefully for the last time) to manually updating reports on spreadsheets like we used to back in the day (4 months ago).

This work isn’t any less important now, the problem is the time spent on producing the analysis.

Minimum Viable Product #1: Deploy a descriptive interactive dashboard.

With an end-to-end product in mind, the goal is to quickly setup a an inital data pipeline spanning from the data extraction all the way to some sort of visualization and scale up iteratively.

The idea is for every iteration to take care of a different analytical need that we used to do manually.

2. Getting started with the code for data processing

At this stage, the manual updating task is painfully slow but allows for pockets of time to be spent on advancing and preparing the necessary environment to develop the code.

It’s important to mention that collaboration with current IT teams is key. This KDnuggets article shows exactly what the data science team will run into. This is why the better the communication, the smoother the infrastructure development process for the data scientist.

Weekly meeting #1: Check in with IT team to review the code

The IT team doesn’t have to be filled in about all the data analytics, however, it can be very helpful for them to understand what the data pipeline will be used for. They are the experts on the data, so they will probably have very useful tips on how to build efficient code.

3. Laying out relevant applications for current strategies

Here is where the first wins for the data science team can be found. If we can use data science to generate solid value observable on quarterly KPIs, there will be no one in the company that will be able to refute the potential of data science.

This is paramount in the swift implementation of data science in a medium sized company. Why? Because inter-departmental collaboration is key.

If data science can improve the reported KPIs of a certain department, all other departments will be looking to get in on that. Minimize the probability of failure by having mutiple (but not too many) projects in the pipeline.

However, we are against the clock. Somehow the team has to sprint to be able to see the results in the quarterly reports.

Quick win #1: Find a KPI that can broken down into a data science problem

Note that this quick win does not have to necessarily depend on the previous 2 tasks. This might be where other data science skills can be useful: web scraping, social media listening, A/B testing. Creativy will be important in getting this win.

--

--

Raúl Vallejo
All The Data We Cannot See

Actuary, statistician and certified Data Scientist. Music & concert junkie.