The power of data
One of the hottest topics in the technology world is BIG DATA. And it make sense.
I’ve been living this reality so close with my teams and I’d like to share a little bit about our experience. Why we started to talk about this? Where we started? Which effects we are living? Where we want to go?
A brief story
We changed our architecture three years ago to use microservices and at that time, the motivation was purely technical because we had to scale our applications for high throughputs. But we have always known about other advantages related to an architecture based on APIs, like technology agnosticism and mainly a mindset change in the development teams.
And yes, our mindset was changed. We stop thinking about projects and start to think about products. We pronounced the word "accountability" every single day to make clear from that moment we were the product owners. But how to make MORE clear what are the product perspectives? How to encourage the outbreak of new ideas? How to foster this internally?
A common scenario in many companies around the world is took decisions based on feeling.
— We need to go this way.
— But, why?
— Because I think this way is better.
The situation above is exactly the kind of situation we won't to live anymore.
I work for an e-commerce company that is the biggest in LatAm. To stay competitive and moving forward we must use technology in our favor, and one way to achieve this is using data to answer questions. We should use data to know about the best direction to go. Data are like a map: you need to know how to use them as well as having them.
And data access was the answer for many questions including about how turn my teams more engaged and creative about our products.
We already had a data science team and some projects using machine learning, but in a certain way this wasn’t part of my team’s culture. In this context, we were a way to provide data through our APIs for decision making, but I realized that we can go beyond.
So I set four steps with my teams for this journey:
- We need to collect and process data;
- We must understand data;
- We must take practical decisions based in our analysis;
- We’ll automate decisions through machine learning.
We started step one and most important thing was define what kind of questions we must answer, and just after how to collect and process data.
We catalog all questions that we should answer. Most interesting thing about this process was that new questions and consequently new ideas had emerged through it, but we focused on what we defined at the beginning.
The Buybox case
One of the products my team is responsible is the Buybox system. Buybox is a product recommendation tool for our customers and it consider many variables like: price, shipping time, shipping cost, sellers info, etc. Buybox goal is choose best offer for our customers considering that many sellers can sell same product and we measure buybox performance through a conversion rate metric.
In Buybox we had some ideas about new features that theoretically could increase accuracy of our algorithm. But how can we model these features? What metrics can we consider? What variable dimension we should based on? We started to collect data to help us to answer these questions.
As we want keep focus in data analysis, we chose to use some managed services of AWS. The (big) advantage of using these services is that we didn't waste time learning best way to manage a Spark cluster, for instance.
We already had an event collector running client-side, so we just had to plug this collector into a Kinesis stream and from there we defined the big picture of our technical solution. As we were starting in this subject and we wouldn’t have big volume of data to be analyzed for this case, we opted for a very simple architecture using few AWS components.
Unfortunately I can’t give too many details about features we’re analyzing and their results, but I can say that through data we improve Buybox algorithm and the most important: my team brought new ideas every sprint based in some metrics that we have.
When we have some doubts because we don't have yet any data to answer these questions, we back again to the beginning of our "data informed proccess".
As most you learn about data more fascinated you are with the possibilities that open up to new ideas and solutions.
Last week I watched a TV show about sports and a very interesting episode about football injuries. Brazilian football confederation asked football clubs about some data and then started to map the correlation between injuries and some aspects like weather, game time, if the team is a visitor or not, etc. Immediately, many ideas emerged from my mind and I discussed with my team.
Fortunately, looks like that non-tech companies realized power of data and their importance to take (precise) decisions.
Now we're facing with challenge to democratize the data in whole company. All areas must be able to took decisions based on data. Best decisions in fact.
As a software engineers, our mission is to create and provide a data platform that must be reliable, stable, secure and easy to consume whether for humans or big data pipelines.
Data and culture must be walk side-by-side. Is impossible to create a data-driven company if you can’t considere this as a part of the company culture. Airbnb is doing a great and admirable work in this way and they are sharing this with the community.
In the next post, I will discuss the technical aspects about our big data architecture.