A lack of trust will dramatically impact your efforts to become data driven unless you proactively limit the spread of mistrust from data quality incidents.

It only takes a small problem to shake someone’s trust in data, but it takes a lot of deliberate effort to make them realize it was just one problem, not a larger issue. Even mature data organizations run this risk, as it is impossible to fully eliminate all data quality issues. This can lead to the rapid spread of mistrust throughout your organization, unless we adapt some lessons from the world’s efforts to flatten the curve of the COVID pandemic.

Image for post
Image for post

Data Mistrust is already Endemic

Despite the fact that nearly 98% of organizations are making major investments in their ability to become data driven, data quality still costs the average organization $15M per year in bad decisions according to Gartner, and impacts 90% of companies according to Experian . While I covered a few data quality horror stories in a prior article, it isn’t common for a data quality problem to bring down a whole company. Additionally, there are now many modern tools (SodaData, ToroData, Trifacta) and practices (primarilly DataOps) that are making the application of data quality best practices much easier than they once were. …


A lack of trust will dramatically impact your efforts to become data driven unless you proactively limit the Blast Radius of data quality incidents.

It only takes a small problem to shake someone’s trust in data, but it takes a lot of deliberate effort to make them realize it was just one problem, not a larger issue. The difference in impact is the “Blast Radius” of the problem, and even the most mature data organizations can do a better job minimizing it.

Image for post
Image for post
Image Source: US Air Force (via Wikimedia)

Despite the fact that nearly 98% of organizations are making major investments in their ability to become data driven, data quality still costs the average organization $15M per year in bad decisions according to Gartner, and impacts 90% of companies according to Experian . While I covered a few data quality horror stories in a prior article, it isn’t common for a data quality problem to bring down a whole company. …


Simple, highly effective, faded into the background of your life

Image for post
Image for post
If this is your general working life, then Krisp will make it better without getting in the way. Image Source: PxFuel

Two ways to learn about AI Product Design

At last year’s AWS re:Invent conference, I attended a roundtable session with a group of thought leaders focused on the effects of User Experience design in AI systems. It was the most profound experience I had that week in Vegas, and I’m already looking forward to getting back together with that group next year. The discussion covered everything from the practical (how to design for failure to recognize scenarios in Natural Language Understanding products like chat bots) to the philosophical (what level of surveillance should we accept to get the benefits of ambient computing). …


Just like in chemistry, a data catalyst can dramatically speed up the rate at which your data ends up how you want it

Image for post
Image for post
Catalyzing the decomposition of (lots of) Hydrogen Peroxide created this massive foam blob. Photo courtesy of Vlog Squad. Original Video on YouTube.

Catalyzing a reaction accelerates it

In chemistry, a catalyst is a substance that accelerates a chemical reaction without itself being affected. Without a catalyst, the reaction will happen eventually, but adding the catalyst causes the reaction to happen (often dramatically) faster. Even better, the catalyst is unaffected by the reaction and can be re-used again and again. In the picture on the right, you see a massive-scale version of the classic Elephant’s Toothpaste demonstration that shows how the addition of a catalyst can lead to an explosive speed-up of a reaction that would otherwise proceed very slowly. …


A detailed look at the missing Data Owner role that keeps organizations from becoming data driven.

In a discussion with a colleague recently, I learned about an interesting paradox. Despite the massive rise in the amount of data generated, captured, stored, and analyzed (IDC Claims we will have 175 trillion gigabytes in 5 years), and the multi-trillion-dollar analytics valuations from Gartner & McKinsey, every year business leaders claim that their organizations are less and less data-driven. Executives consistently cite people and process issues as the primary blocker, with only a small percentage citing technology. This begs the question: what is stopping organizations from using technology to enable people and process changes that make organizations more data-driven?

Your data & analytics team is probably running like a recycling plant

Image for post
Image for post
Image Source: US Air Force

The challenges facing organizations as they seek to become data driven are in many ways similar to the problems that have faced the recycling industry in recent years. The costs of producing something usable are extremely high due to high cost of cleaning up the mixed mess of bagged plastic, cardboard, trash, and metals that is dumped onto the recycling plant. When you assume that a business can take whatever you give to it and be consistently profitable, you are setting yourself up for failure. …


Leveraging data tests and safe environments makes data quality an everyday activity for everyone who touches your data

Image for post
Image for post
Image Source: Pixbay

The cost of Bad Data

The costs of poor data quality are so high that many have trouble believing the stats. Gartner estimated that the average organization takes a $15M hit due to poor data quality every year. For some organizations, it can even be fatal. I’m often reminded of a story told by my Data Science Innovation Summit co-presenter, Dan Enthoven from Domino Data Labs, about a high-frequency trading firm, Knight Capital, who deployed a faulty update to their algorithm without testing its effect. …


How your organization can adopt the (non-technical) practices in DataOps to improve Data Governance outcomes

A few months ago I wrote a post on the coming rise of DataOps, in which I predicted that the world of Data Governance will see some of the same shakeups that IT Operations experienced during the rise of DevOps. In this post, I’ll share some practical tips for how your organization can get started down the path to leveraging the new set of practices that underlie the DataOps movement. The goal is to allow your organization to derive value while adopting new roles and processes.

Adopt an analytics lifecycle

Image for post
Image for post

Your organization should think of the development of analytics pipelines as a 3 stage process, with a different focus at each…


There is an art to planning a schedule for a conference this large. This guide helps to reduce it to a science with a step-by-step process and keep you in sessions and off of shuttles.

Image for post
Image for post
The Las Vegas Strip is a big place. It takes a looooong time to get between the MGM and Venetian.

Are you headed to re:Invent this year? For the first time? Feeling overwhelmed by the 500+ sessions available or the 6+ hotels? Then this guide is for you. There are several other guides out there to bring you up to speed on logistics, parrties, or relevant course tracks. This post will focus on an opinionated approach that I use to build my schedule to maximize the value I get out of the event.

I found that starting in 2017, re:invent was really hard to navigate because it had grown so big. Then last year, there were 55,000+ attendees. Every year it will only grow bigger. You have to have a plan, or you’re unlikely to going to get to do anything. Some of my clients actually left early that year because they were just waiting in lines and traveling between hotels. …


The same tech that’s beating world champions in games will soon revolutionize anything that can be simulated (and because everything is physics at it’s core — over time that’s everything)

I turned this article into a 20 min talk, so if you prefer to watch, here you go:

I was at the AWS re:MARS conference in Las Vegas last week, and the theme of the week was how the combination of Machine Learning, Automation, and Robotics (sometimes in Space) will shape the future. Many people may think that the star of the show was Robert Downey Jr, but in my mind it’s Simulation & Reinforcement Learning, which showed up in nearly every keynote speech at the conference:

Day 1: Using Reinforcement Learning, Boston Dynamics robots have learned to do backflips, jump up onto ledges and lift data. Disney Imagineering has taken this to the next level with humanoid robots performing death-defying stunts. …


A deep dive on how Amazon used AWS, ML, RL, and Simulation to power the Go Store’s “Just Walk Out” experience.

I live in Chicago, where we are lucky enough to have 4 Amazon Go stores, the futuristic convenience store where you simply walk in, grab what you want, and walk back out. Within a few minutes you get an accurate receipt for everything that you took. It really feels like magic, but in reality it’s a whole lot of machine learning innovation, it took Amazon 3 years to do the POC and another 3 years to production for the go store. Today at their re:MARS conference, they did the first tech deep dive on the vast array of innovations required to provide the “Just Walk Out” experience . …

About

Ryan Gross

Machine Learning leader at Pariveda Solutions | Interested in how people & machines learn, and how to bring them together.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store