New O’Reilly Book: Accelerate Machine Learning with a Unified Analytics Architecture

Paige Roberts
4 min readFeb 16, 2022

--

Every company needs business intelligence (BI). Increasingly, in multiple industries, most companies need machine learning (ML) or artificial intelligence (AI) as well, if they want to stay competitive. But between 40 and 60% of ML projects fail, most at the point in the workflow between proof of concept and production. Even the ones that succeed deal with a long delay of months, a year, or even more. One day, it may be as easy for an organization to put an ML model into production as it is to put a new visualization in a BI report. The right data architecture design can be the key.

I have a remarkably broad view of how organizations work with data. Partly, this is from my job at Vertica hosting webinars with data architects, CDAOs, data engineers, etc. from companies in various industries. I discuss their data architectures, how they work, why they built it that way, and what benefits or challenges they’ve seen.

Add that to my previous twenty years or so as a data engineer, software trainer, technical writer, data integration consultant, technology evangelist, and product manager for various high scale data pipeline software systems, and you can see where I have both a wide and long cross-industry view of data architectures.

Lately, I’ve been going to various AI/ML/data science, and data technology conferences, mostly virtually thanks to COVID, to talk about the challenges in data architecture, and compare some architectures that folks have shared with me. Each individual story is a piece of the big picture. It illustrates the shifting trends in how different people tackle often the same challenges. The one challenge nearly everyone is dealing with is figuring out how to get the right data to both enable BI, and take advantage of the extra insight from ML.

For folks who don’t travel to conferences, or live on Zoom, I wrote a short e-book, co-authored with a smart young engineer named Ben Epstein. These days, Ben is a Founding Software Engineer of a startup called Galileo, but before that he was a lead engineer at Splice Machines. He knows his stuff when it comes to MLOps, and in particular, what it takes to get machine learning projects into production. Between his writing skills, his knowledge, and mine, we wrote a heck of a nice little book, if I do say so myself.

The new O’Reilly report is Accelerate Machine Learning with a Unified Analytics Architecture — Deploy Machine Learning Models in Minutes, Not Months.

You can download it for free here, or find it on the O’Reilly website.

The central concept is the convergence of the data warehouse and data lake architectures into something new. Newer architectures tend to combine the power of both historical architectures, while compensating for the past weaknesses. I prefer the name Unified Analytics Architecture, but Data Lakehouse is probably the most popular and well known term. (That’s one seriously tortured metaphor, but it seems to have caught on.) I’ve also seen SQL Lakehouse, Unified Data Analytics Platform (UDAP), Unified Analytics Warehouse (UAW), etc. This is the general idea in a single graphic:

Unified Analytics Architecture

Regardless of what name you prefer, the data lake and the data warehouse are merging, or unifying, over time. Stacks that people used to think of as pure data lake now have ANSI SQL querying, ACID compliance, and structured data storage. Technology stacks that used to be just for data warehousing now have unlimited distributed scale, semi-structured and streaming data support, and Python clients. Some, like Vertica, even have built-in ML functions and algorithms.

I’ve seen this happening all over the world, and in a wide variety of industries. Moneysupermarket.com in the UK is doing it, as is EOITEK in China. Taboola in Israel is, and Catch Media and Uber in California are. Domo, providing analytics as a service on the Cloud, and Lumenore, using natural language processing for augmented analytics interfaces, are all uniting their architectures, for a reason that makes sense across industry boundaries — you need both BI and ML to make your company hum.

Just because you’re doing something new and cool and cutting edge with ML or AI doesn’t mean you can forget lessons learned from decades of BI.

Smart companies see the writing on the wall. They’re unifying old and new to create something that is the best technical capabilities of both, and serves the needs of both business analysts and data scientists. It’s even enabling this new concept that is, itself, a bit of a convergence, the citizen data scientist.

Read the book to learn more about this movement, why it’s happening, and how it can make the difference between ML project failure, and success. See how you can get your ML models into production in minutes, not months.

--

--

Paige Roberts

27 yrs in data mgmt. Co-Author of O’Reilly’s : "Accelerate Machine Learning," “97 Things Every Data Engineer Should Know,” and "Up and Running with Aerospike."