Why do data-driven matters?

From a data engineer manager’s point of view, why do we do data work every day?

Binlong Li
Technology Hits
6 min readJan 24, 2022

--

Photo by Markus Spiske on Unsplash

We are data engineers, data analysts, and data scientists. We work on data every day; enjoy data’s outstanding payments and excitement. Also, we spend a great deal of money on those data pipelines and machine learning models. However, have you ever thought about why? Why are we doing this work besides the paycheck? What benefits does data bring to the company? How could we help the company by doing all these data jobs? This little article hopes to help you understand the underneath goal to be more motivated to move forward.

By answering that question, I would like to start with an argument I had today with one senior and respectable developer. The argument is about adding application identification info into an existing platform column.

I would suggest creating another column to hold these application identifications because they do not match by mixing them.

However, the other developer is taking a definite opinion and using data to show that it isn’t a big deal with 100+ distinguish platforms in the system already.

If the logic is not persuasive enough, data will help. Therefore, I query the system to show that we currently have more than 1600 distinguished applications. Once we mixed them, it would significantly pollute the existing data.

Once we show this data, the other developer accepts my opinion, and we discuss the implementation details for my suggestions.

System Two

From the above example, we can notice the power of data. Data allows us to leverage System Two in the book “Thinking Fast and Slow” by enabling easy access and reality reflection information.

For a problem with long-term impact, we should use System Two to do a more rational analysis before making a final decision.

It is hard to differentiate a question to have a long-term impact or short in the daily working environment. We often would treat long-term impact questions as short-term and use our instinct to decide. The reason is that we lack easily accessed data, and we are too eager to wait.

Especially for software engineers, each design and implementation decisions seem small and short-term. Developers want to make a quick decision and move on. But each piece of code is part of a more extensive software system; once released, they would have a much longer customer experience impact and potentially damage the company. From that perspective, I intend to believe all software decisions are long-term and should guide them with rational analysis and data.

People may think that it would be too tedious to make any software changes. It would make software developments almost impossible. How could you solve the conflict between efficiency and rationality?

That is where AB Testing comes to help.

AB Testing

AB Testing or controlled testing is a tool to compare any changes by ensuring all the other variables are constant. We could test the benefits or regressions introduced by our changes based on big data.

It is widely used for ad camps, UI designs, and other feature developments. This article will not go deep into AB testing or how to implement that but answer why.

Provide an improvement direction

If you have a group of people in an open park, let them connect straight and walk forward blindfolded. Later on, you would find people are walking in a cycle; sometimes, they are even connected head to tail. ·

For a project to be successful, it needs a clear target to provide a direction. ·

For AB testing, it makes that direction quantified. Before starting the project, you first define one or a series of metrics. Your goal would be to move them in your favorable direction, such as reducing memory usage, increasing framerate, etc.

Once you keep working on your project and run the AB testing multiple iterations, you could get continuous and quantified feedback about whether you are moving in the correct direction. Once achieved the desired improvement, it would also provide you with a clear definition of success. It is the practical meaning of data-driven development.

Prevent you from damaging the others

Nowadays, software systems are complicated. When you are excited about moving your metrics to the north, you may regress the other components accidentally.

AB testing could help you prevent that.

With a basket of other features accumulated from previous feature developments, you need to ensure that your data moves north and that the rest of the features are not moving south.

All the previous metrics are like testing locks. They help all the following features works would not be regressed the former features, and all the last optimization and benefits would be kept and inherited even after a long period of product developments. The whole organization would move forward continuously instead of running into circles.

Great for 1–100, but not valid for 0–1

Even though the AB system could provide us with proper direction, it has a strong assumption: enough data.

We make all the decisions based on statistics. For statistics to work, we must have enough evidence, aka data, to support our choices. Therefore, this system is only helpful for an established business to move from 1 to 100. But it is no use for a start-up project to move from 0 to 1.

To help with 0 to 1, we need to leverage the project management skills more. It would be another topic we discuss in other articles.

Data Matters

Data-driven or even big-data is not a hot concept anymore. Not even machine learning is as sexy as before. But I always like to think things from the fundamental basis of why we need them in the first place.

From a profit-oriented company’s point of view, all the technology is worthy only if it could bring business impact by improving productivity. Once you provide value to your customers by making them more productive, you would cut one share of their profits into your plate. That is how high-tech companies deliver value to the whole society.

From a tech perspective, we will never be short of hot concepts, mobile, deep learning, computer vision, big data, machine learning, autopilot, blockchain, metaverse, etc. People are busy creating all kinds of fancy words to raise money and attract attention. However, I always believe Buffett’s words like below.

Only when the tide goes out do you discover who’s been swimming naked.

— Warren Buffett

For all those fancy words, to create a business value, they must deliver value to ending customers. If they are just fancy, they might be float for a while. But eventually, it will crash. Value creating is the gravity in the tech world. For an enterprise, value-creating means being more efficient and productive.

After these years, I genuinely believe data could bring business value to companies. Not from selling customer data point of view, just from improving productivity.

Coming back to the little story initially, even though it is tiny, it is a powerful one. It demonstrates how easily accessible and accurate data could speed up the argument among rational people. Organizations could save time for logistics debating by just checking on the reality. Reasonable people could easily see a specific decision’s potential pros/cons and predict the future impacts. System Two would lead us to a higher chance of success.

Rational people will naturally unite under the quantification flag to march into the future.

If data could help people make decisions faster, they would save more time for implementation and improve the organization’s efficiency. If data could help people make decisions more accurately, the organization would be more responsive. If data could be easy to access and responsive, the company would be more productive. If data could provide a clear direction, it would increase the chance for the organization to be more successful.

All of these, efficiency, accurate decisions, responsiveness, productivity, and directions, would ultimately help a company be more profitable. It is what shareholders care about. It is what business is really about. It is the power of data.

--

--

Binlong Li
Technology Hits

I’m a software engineer manager, and a MBA student in UIUC. Talk about business, management, career development and technology.