Your Analytics Problem is Actually a Data Integration Problem.

Julia Geller
Datalogue
Published in
4 min readAug 3, 2020

Analytics are important for your enterprise. It’s why your business is likely spending a ton of money a year on self-service analytics software cross-functional teams can use, hardware to run that software and talent to wield it.

However, it’s just as likely that the returns on those investments and efforts aren’t where you want them to be.

It’s a common problem, and if you ask a CEO why that problem exists they likely won’t know.

You know who would though?

Anyone working within a cross-functional analytics team.

That’s because they’re the ones waiting for weeks to get usable data into their self service toolkits (data warehouses, data wrangling and data analytics softwares). Once the data is in there, I’m sure they’d tell you, insights are almost never more than hours or a few days away.

So why does it take so long to integrate clean, usable data into self service toolkits?

The real crux of the issue is an inefficient process.

While cross-functional analytics teams are generally expected and empowered to complete data-driven projects end to end and while they are the ones who know the data they need for analysis best, they are almost never the ones charged with data integration.

In fact, in most organizations data integration is regarded as a squarely technical pursuit that lives in technical data teams, whether that be IT or a specialized central ingestion team. And when you look at data integration tools on the market today, that makes sense.

Legacy data integration tools such as Informatica, Oracle, IBM etc. are hard to use. Training an analyst with a myriad of other responsibilities and areas of focus to use them would take months, an investment of time most companies are unwilling to front-load. It’s also an investment that often needs to be repeated, as employee churn brings in new people who have yet to be trained in these tools.

There are also newer, very powerful, tools on the market such as Databricks, Spark, Confluent amongst many others, but they require deep technical expertise, such as distributed computing knowledge, to use.

The players that have sprung up as would-be disruptors, the self-service data integration providers of the world, while easier to use, lack the powerful data transformation capabilities and governance practices that the modern enterprise needs.

So data integration remains where it’s lived for years: in technical teams, relatively divorced from the data subject matter experts (SMEs) who reside in these cross-functional analytics teams and who know not only what data they need, but the formats and schemas they need it in.

So if every time a cross-functional analytics team needs data, what does that process end up looking like?

Highly iterative, and not in the good way.

The back-and-forths required between technical and analytics teams during the data integration process to ensure the data is delivered to the analytics environment of choice in the exact format and schema it’s needed in are a drain of time as well as both analytic and technical resources.

And by the way, if you’re wondering just why traditional data integration processes are so lengthy, check out our blog post that dives into the complex landscape of where data lives, and what formats it lives in, in a typical enterprise.

Ok, data integration takes too long. So what?

The results of data integration being a bottleneck in data driven initiatives and efforts are numerous, and best identified by posing a simple question; Why is your enterprise running data-driven initiatives in the first place?

Is it to form product roadmaps based on concrete consumer feedback?

Is it to forge company-wide business strategies based on relevant market and company-wide KPIs?

Is it to identify ways to cut departmental and company-wide spending?

Whatever your reason for having analytics teams performing data analysis — those are the goals that are on the line when access to usable data is a bottleneck.

Now imagine that you could cut the time it takes to go from raw data to insight by 75% or more. You could answer those strategic questions you have posed and have time left over to run all the projects that are sitting on the back burner today.

Trust us, your cross-functional teams have the capacity to do more than you could imagine.Thats why you’re investing in these kinds of teams. Becase their multidisciplinary skill sets allow them to take a project from its inception to its completion. They just need clean, usable data.

Take, for example, a college intern who figured out aeronautical vehicles carried way more water than was generally needed on trips, saving the industry millions in fuel costs and helping the environment to boot.

How was she able to do that?

By creating a model that accurately predicts how much water an aeronautical vehicle needs depending on that vehicle’s route.

The only thing that intern needed to create that model was a creative approach to problem solving and usable data. Your enterprise has already hired the talent needed for creative problem solving. The last step is empowering them with usable data.

Insights and strategic decisions are key for a successful enterprise, but their ROI is generally hard to measure. Another, more concrete, consequence of expending precious and scarce technical resources on menial data integration tasks is the accrued cost of pipeline building and maintenance.

In short, you’re spending too much money, sacrificing too many insights, and compromising too much on your data engineering roadmap to keep doing what you’re doing.

It’s probably time to switch things up.

Read on for how we propose you reduce the time and energy spend on data integration so that your analytics teams can do more, faster.

--

--