Data is Like Fish

No, data is not the new oil. Data is nothing like oil. If someone likes to use the data-is-the-new-oil analogy in conversations about the value of data, chances are that they don’t understand why data is valuable, or how to extract value from data. If you like to use this analogy yourself, this article is written for your consideration.

Many business leaders today believe that their organization is sitting on a gold mine of data, and they just need to find a way to monetize it. That wouldn’t be a problem if data is truly like oil. Not a single oil baron in the world needs to figure out how to turn their oil into money. In fact, if words get out that you recently came into ownership of an untapped oil field, your phone will probably start ringing. People will beat a path to your door to make money with your oil for you.

Why is data not like oil? Mainly because the utility of data is far more diverse and far less established than the utility of oil. Oil only comes in a very limited few forms, and the processes of turning oil into consumable products are well-established. With this highly coherent alignment between the raw material and the end products, oil can be considered something with intrinsic value. That alignment is elusive for data.

Data comes in a large variety of formats, each of which can in turned be used to achieve various business outcomes. The illusion that data is like oil occurs when you only pay attention to the most valuable examples among these business outcomes, and believe that a singular type of resource called “data” can lead you to these valuable outcomes.

Data is Like Fish

To better illustrate the value of data, I’d like to propose an alternative analogy: data is like fish. To actualize the value of fish, you need to know what kinds of fish are suitable to be made into what kinds of seafood. Within each kind of seafood, there are further varieties in recipes, preparation techniques, cooking methods, and so on.

Take sushi for example: not all fishes can be eaten uncooked, and certainly not all are sushi-grade. Among fishes suitable for sushi, each requires a different set of preparation techniques. A good sushi chef understands the nature of the fishes they use: why, when, how, what of, and for whom they are delicious, and prepare them accordingly. Similarly, a good data analyst understands the differences in nature between different types of data: how and why they behave differently, what and who they are good for, and how to analyze them accordingly.

Most of the times, even the best sushi chef in the world cannot make you a wonderful sushi dinner with whatever fish left in your freezer. For the same reason, it’s usually not ideal to start a data science project with a bunch of data you happen to have lying around. Have you ever had a sudden caving for a certain dish, but opened your fridge only to disappointment? Don’t get me wrong — combining whatever ingredients you happen to have can be a wonderful culinary experience sometimes, if you don’t have any expectation. But if you want a specific dish, you need to have the recipe first, and then go to the market and buy the right ingredients. Similarly, a successful data science project usually starts with a business goal: to inform certain decisions, to automate certain processes, to drive certain strategies, etc., and then the right data are collected to enable the right analytical methods that can achieve the goal. If you make a good data scientist work on whatever data you have, they may be able to whip up something for you, and the outcome may even be rather impressive, but that outcome is not very likely to be what you actually need.

Another aspect that makes data and fish alike is that they are both sensitive to time. Just like fish, data is perishable. If you want to make gasoline, last year’s oil is just as good as this year’s oil. If you want to understand market trends, however, you need fresh data. Last year’s data can only tell you about market history. A data-rich organization is not one with a lot of data in storage, but rather one with the capability to capture the right data in huge volume. Big data is not measured in terabytes, but rather terabytes per second.

A Complete Analogy

I’m usually very cautious about taking a good analogy too far. Many fallacies in business and social sciences were created that way. However, extending a good analogy to adjacent concepts can also be illuminating if we can stay true to the principles that made the analogy work in the first place. To me, the data-is-like-fish analogy keeps on giving when thinking about data-related processes and roles.

Data analytics is like making sushi. If data is like fish, making sushi becomes a pretty good analogy for data analytics. Just like making sushi, 90% of the labor in data analytics goes into preparing the ingredients, so that the rest 10% can feel like pure magic.

Just like a sushi master who can tailor a multi-course sushi meal to their customers’ tastes, a great data analyst understands what insights their audience needs, and designs analyses that can provide those insights. A sushi master painstakingly cleans and prepares specific fishes with their sushi recipes in mind, and a great data analyst cleans and shapes specific data that can enable their analysis plan. To procure the right ingredients, a sushi master gets up early in the morning, visits their local fish market, and communicates with their trusted fishmongers, who are also incredibly knowledgable about the fishes they sell. A great data analyst communicates with their trusted database administrators and data engineers, so the right data can be properly collected, transformed, stored, and queried for downstream preparation, analysis, and modeling.

A data team is like a fishing village. The term “Data Science” is incredibly ill-defined. The popularity of this term reflects how clueless most organizations are about the true value of data. It takes a village (a fishing village) to make great sushi: you need fishermen to get the fishes out of the sea, fishmongers to manage and sell the fishes on the market, and sushi chefs to turn the fishes into sushi.

Similarly, actualizing the value of data usually requires collaboration between a few different types of data specialists: you need data engineers to build databases and pipelines, database administrators to manage data storage and flow, data analysts to distill insights from the data, and sometimes machine learning engineers to train models with the data. Can one specialist titled “Data Scientist” do all of the above? Yes, some data scientists can, and unfortunately, some of them do. Why is that unfortunate? The same reasons why sushi masters spend most of their time in the kitchen rather than fishing in high seas, and why very few sushi lovers have an appetite for sushi made by fishing ship captains.

Machine learning is like canning fish. Have you ever wondered why machine learning (ML) specialists are considered data professionals? Why does artificial intelligence (AI), a field often associated to Skynet and Terminators, so closely tied to Data Science, a field frequently represented by spreadsheets and pie charts? That’s because AI/ML systems are fundamentally data interpreters. They take in one type of data, and turn them into a different, more consumable type of data: e.g. computer vision systems turn visual signals into conceptual labels, scores, and categories; speech recognition systems turn audio signals into editable text; predictive analytics systems turn quantitative descriptions of current reality into quantitative descriptions of future reality. In that sense, machine learning is much like canning fish: raw fish in, more consumable fish out, on a massive scale.

Even though the capabilities of today’s AI/ML systems have far exceeded many expectations, we are yet to figure out how to teach them general human knowledge. As a result, many complex cognitive tasks are still beyond the reach of AI/ML. Even on the tasks AI/ML systems can effectively complete today, they usually cannot outperform humans in accuracy. The main value most of today’s AI/ML systems provide, is therefore not superior quality, but superior quantity — the scalability that comes with automation.

In-depth data analysis requires many complex cognitive functions today’s AI/ML systems still lack: deep understanding of decision-maker’s needs and concerns, relevant subject matter knowledge, the ability to model human behavior in quantitative terms, the ability to tell a story with analytical results, etc. We sure can produce canned tuna in the millions now, but the time of robotic sushi masters may still be far away. So as valuable as AI/ML can be, data-savvy organizations are probably not replacing their data analysts with machine learning engineers any time soon. Instead, they foster collaboration between their analysts and engineers to automate the processes that can be automated, and enhance the processes that cannot be.

Concluding Remarks

There you have it, a complete breakdown of why data is like fish. It might not sound as punchy as “data is the new oil”. That’s because this data-is-like-fish analogy is intended to help you understand, rather than sell you anything. Next time someone claims something to be the new oil, beware that their true expertise might be in snake oil.

Poet, scientist, consultant. Been in love four times.