You Don’t Have Big Data and You Don’t Need Data Scientists

Conventional wisdom tells us that you need data scientists in order to get value from you data. You need statistical algorithms, machine learning, Hadoop, Spark, and a whole list of other tools that you have no idea what to do with. But your leadership told you that you have to figure out this “big data” problem and fast because you’re becoming less competitive in the market by the minute. If any of this sounds familiar, chances are you actually don’t need data scientists and you don’t have big data.

A few years ago Dan Ariely famously said the following:

“Big data is like teenage sex: everyone talks about it, and no one really knows how to do it, everyone thinks everyone else is doing it, so everyone claims they are doing it…”

While this is a very comical statement, it is sadly very true. The same could also be said of data science. Do the seniors in your organization actually know what value data scientists provide? In all likelihood your organization is not at a point where you actually need pure “data science” and the reality of your situation is probably more akin to one of the following:

  • You are collecting data that you don’t know what to do with and you need to get that data into the hands of business users.
  • You have business analysts that can’t figure out where to start in the sea of data they have.
  • You need dashboards for an executive providing situational awareness to inform strategic decisions.
  • You don’t know what you don’t know about your data and you need assistance figuring that out.
  • The analytics and business intelligence systems you have in place do not currently provide proper context.

The question then becomes, how do we leverage new technologies to make data and context driven analytics more accessible to the analysts and decision makers? Traditional BI solutions do not provide the flexibility for decision makers to change the questions they ask as they drill into the data. What people want changes based on what results they get and what they learn. Once people know what is in the realm of the possible — know what they don’t know — they ask different, better, and more informed questions. This is the premise behind context driven analytics.

Paraphrasing here on a comment by Steve Yegge in his infamous Platforms Rant — It is impossible to reliably predict what people want and deliver it to them. Current analytics solutions aim to provide flexibility, but require the user to become somewhat technically proficient to ask questions of the data so the task is typically given to a data scientist. Data Science is analogous to applied statistics and in statistics it is okay to be wrong sometimes. In some business context, being wrong is not an option, so how do we provide context driven analytics to the analysts that need them? Agile Data Engineers.

The current definition of a “data engineer” is flawed. Typical definitions of data engineers indicate that a data engineer’s purpose is to get data into the hands of data scientists so they can perform analysis by providing infrastructure and data cleansing. That definition is incomplete at best — a data engineer’s purpose is to get the right data in to the hands of an end user regardless of who that user is. This requires that the data engineers really understand the data and truly understand the use case.

The responsibility of an Agile Data Engineer is to build data discovery and data analytics solutions based on domain specific context provided by an end user through iterative exploration and discussion. This iterative process will result in the building a useable solution that helps answer the types of questions that may come up during discovery. The key to note here is that these solutions are domain specific. There is no one solution fits all; however, with the right framework of tools the agile data engineer can rapidly iterate when working with users to provide answers quickly.

While most business don’t necessarily need data science right now — that does not discount it’s effectiveness. Data science and machine learning techniques are game changers and will continue to change the way we do business as the techniques mature. In order to answer the critical questions that businesses need answered now, something more flexible and definitive may be required. Make sure you are looking for the right solutions to the problems you are currently facing.

At Bogart Associates, we encourage all of our data engineers and data scientists to engage their customers as often as possible and think from their optic. The end result is usable solutions for business analysts and decision makers. If you’re interested in hearing about how Bogart’s Agile Data Engineers can help you design a domain specific solution for your data problems, feel free to contact us.


Originally published at bogartassociates.github.io on January 22, 2016.