Tom Redman, Ph.D., “the Data Doc” helps companies improve data quality. Those that follow his innovative approaches enjoy the many benefits of far-better data, including far lower cost, improved revenues, and increased productivity. Tom helps start-ups and multinationals; senior executives, Chief Data Officers, and leaders buried deep in their organizations, chart their courses to data-driven futures, with special emphasis on quality and analytics. Tom’s most important article is “Data’s Credibility Problem” (Harvard Business Review, December 2013). He has a Ph.D. in Statistics and two patents. Tom has also published other various articles and books on the topic which you can check out here.
What is your current day-to-day like and is there anything exciting that you’re working on?
Currently, I’m working on a bunch of exciting things. I spend half of my time as an advisor to companies that are trying to advance their data programs, in particular, mostly their data quality programs. That’s really exciting to me because we’re at this really interesting juncture. Many years ago, most organizations did not recognize they had a data quality problem but almost everybody recognizes it today. We’re now at this juncture where it’s really time to address this problem.
Over the past two years, I’ve convened a number of study groups to really understand why progress in the data space has been so slow. In one case we’ve dug into why it’s so hard to establish a common language across a company and in another case, we’ve looked at the structural problems in managing data science, and how to instill better ways to connect data with business teams. I’ve also done some really interesting work into answering the question, what is it going to take to really manage data properly, to get out of this pandemic?
Why don’t you tell me a little bit more about your background? How did you get into tech and what interested you about the industry?
It’s interesting because I do not consider myself a technologist at all. I think too many companies have confused data and technology and they are 2 very different kinds of assets. In the same way that a movie is a different kind of an asset from a movie theater or a streaming device platform. Too many companies have made this mistake, they conflate the two.
Perhaps they think data is in the computer therefore it’s a tech problem but conflating the two leads to bad data management and it really slows down the progress of tech. This is one of the really important points that I’m pushing with people. Almost everybody sees it when you point it out but their natural reaction is that data belongs with tech.
Your real question was how I got into it and I’m trained as a statistician, my first job was at Bell Labs. If you were lucky at Bell labs you got to pull on a thread and see where it led and mine led to data quality. At the time, AT&T was creating more data every day than anyone else.
For a long time, I had my perfect job, which was one foot in these enormous AT&T problems and another foot in a laboratory environment, and my career springboarded from there. I feel so lucky to have gotten in really early on something that is in the process of changing the entire world. My frustrations aside, it’s going to happen. It’s going to be hard and there’s going to be some bad things that come about, like how we’re fighting misinformation now, but it’s going to make the world a better place.
What are some of the data different driven trends you’re noticing?
I think trends might be a little strong of a word but here are a couple of things that I’m seeing, but they’re more like threads. One of the problems that we see is the vast majority, maybe north of eighty percent of data science projects fail and they fail for a bunch of reasons. Sometimes the people leading them don’t really understand the business problem, and sometimes they underestimate the resistance to change.
The first thread is simply this notion that data is a team sport and if you’re serious about it, you have to get a lot of people involved, and you have to be thinking end to end from the very beginning, from what’s the quality of the data we’re going to use, to how we’re going to figure something out, to how are we going to implement it to how it’s going to affect people, and you have to build that team from the very beginning. The other thread is this idea of data as human empowerment.
The first thing that struck me as I started working with people, anybody who completed an improvement project, they had to learn new skills on how to do it, they had to step a little bit outside their comfort zone, and they had to change things. People who have had to do those things, they’re forever changed. It’s kind of subtle but doing the work in data is really empowering for people. It’s certainly empowering to go from having to put up with other people’s errors to get to the root cause of things. It’s empowering to use a bunch of data to figure out how to make your work processes better. I think it’s a “there are many hands make light work” kind of thing and smart organizations are going to see this opportunity.
You touched briefly on data teams. What are some of the biggest challenges for data teams within the enterprise right now?
Every organization is different, and every person is different, but the challenge remains that data is both exciting and threatening. Some say, “well this is going to change everything!” and everybody in the back of their mind goes, “okay, well how is it going to change things for me, how is it going to benefit me?”
A lot of organizations are stuck in this crossroads and they understand the qualities of the problem and they see the data science is going to benefit them, but they’re going to have to make some changes. There’s a saying that goes, all changes bottom-up, all changes top-down. What it reflects are new ideas come into organizations in the bottom and the middle, where people are looking for ways to solve problems. In the data space, the things I follow mostly are data science and data quality. There has been plenty of those bottom-up exercises and they’ve proven that this stuff works. I can’t say it often enough the potential is enormous but with all the fear running around and so forth, we need that top-down component to kick in.
It’s really hard on leaders who didn’t grow up with data and are themselves may be threatened by it to figure out, okay, well, how do I get in front of this? How do I lead it? What do I tell people?
What is your current biggest challenge? What is keeping you up at night?
There’s no way to sugarcoat it. The forces of misinformation, disinformation and no information are powerful, just pick up the front page of any newspaper to know exactly what I’m talking about.
15 years ago, the biggest problem in the data space was apathy. Now, the biggest problem is the active misinformation, chaos, and the kinds of forces that are rising up. The question is will the collective we, and by that, I mean, everybody, seek truth, seek insights, and come out more on the side of science. I’m extremely optimistic about the answer but it’s not one battle, it’s a series of battles and it’s going to go on for a long time.
Let’s talk about some of the benefits and drawbacks of using a data lake and what is needed to improve them?
Four or five years ago, I started saying most data lakes or datasets are cesspools and so far, nobody has pushed back on that. I think the notion that you were going to be able to take all this raw unfiltered stuff and dump it into a lake, and then later somebody was going to come by and sort it all out and clean it out and find meaning in it, I think that notion doesn’t have legs. It doesn’t mean it’s never going to happen but I think that there is too much focus on just getting more data and storing more data and keeping it and making sure that if we ever need it then we will have it and not enough on what is the most important stuff that we really need. How do we make sure that data is of high quality?
Let me build on that in two ways. The quality of most data is pretty bad. We’ve published some statistics on that and the most important stuff that you have, you’ve got to focus on getting that stuff to be of high quality. And then the other important problem is if there is data you need that you don’t have and I think that the pandemic revealed this for many companies. Managers were guided by the seat of their pants, again just due to a lack of understanding of what was going on.
Personally, I think the first thing organizations need to do, is focus on getting the data they need to be of really high quality and address those gaps in what they don’t have. Those two things are absolutely essential but if that happens, then it will build trust in data and people will start asking new questions.
How according to you will emerging tech play a key role in the development and evolution of any major industry (ex. Financial industry)
I generally see technology for its productivity gains and there’s a lot of it in finance. I mean, there is a lot of productivity gains to be had but it is a two-step effort. You can’t automate a process that doesn’t work. Some major work in finance has to go on to really get those processes in shape and then automate the heck out of them, and then drive unit cost down and increase scale.
Then the other major flow is the disruptive flow. As far as I can tell in finance, the disruptors, or the new companies who have less technical debt, have less to lose in a sense by trying new things. But these fintech companies, they’re just eating away at the edges, they’re attacking some of the inefficiencies in the larger finance space and they stand to be very disruptive.
Do you have any last words or closing remarks or do you think there are any topics that the public needs more awareness around?
I’ve come to the conclusion that the thing that we need most in the data space and in the tech space is courage. I spend a lot of time thinking about this and I’ve come to the conclusion that the thing that we need most in the data space and in the tech space is courage. And we need courage at all levels. It’s going to take courage for leaders to commit their organizations to a path that fully embraces data. It’s going to take courage for people to see data as a resource, to empower their staff, and it’s going to take courage for people to figure out how to deal with all this disruption and do so in an inclusive way. Mistakes are inevitable and you simply can’t get better unless you have the courage to admit that you made one.