What does a data team really do?

A guide to making the most out of your data.

Michal Szczecinski
GOGOX Technology

--

Are those data guys playing with “big data”, complex math, cool code and fancy visualizations for fun?

Hmm probably… :) The chance is that we do have fun while working but more importantly we are obsessed with improving things, solving hard problems that are worth solving and making a real impact.

In this article I will:

  • explain what our data team does
  • demonstrate why and how it does it
  • show opportunities for creating a highly effective data-driven environment

It is organized in the form of a checklist for a reference. If you are doing analytics work or considering how your organization can best benefit from the data, then you might find following points particularly useful.

  1. Purpose — what is the purpose of our work?
  2. First principles — what does fundamentally matter?
  3. Impact — what does impact mean to us?
  4. Data Driven Framework — how to systematize and scale that impact?
  5. Analytics toolset — what are the ways we can deliver value?
  6. Utilization — how to make the best use of analytics?
  7. Responsibility — what should we do to make a whole organization better?
  8. Review — how can we make sure we are doing our best ?
  9. Vision — where are we going, what’s next ?

Purpose

On a high-level analytics (for simplicity of this article I will put all data related work like business intelligence, product analytics, data science, data engineering etc in one big “analytics” bucket) is a powerful toolset that enables us to improve any aspect of the business.

GOGOVAN’s mission is “move with simplicity”. And our data team is here to make sure that whenever you need to move something from point A to B you have the best experience.

We can do that by helping team members and systems across the organization to make decisions and take actions that make us better. In GOGOVAN our data team works on all areas including operations, finance, marketing, product, customer service, engineering and strategy often closely partnering with those functional teams to help them make a difference.

Our purpose is to make a real impact by facilitating smarter decisions across the whole organization.

First principles

Based on Collins English dictionary first principles mean “The fundamental concepts or assumptions on which a theory, system or method is based.”

When we work with our teams it helps to understand what is the underlying value from the perspective of our business and what we want to accomplish.

In case of our company, we are focusing on core elements of on-demand logistics so that we can provide best possible results to our customers, partners and business stakeholders.

It’s pretty simple really:

  • Customers want their goods delivered fast, cheap and reliably.
  • Drivers want to earn money by spending their time efficiently on completing jobs.
  • Business wants to grow and retain the customer base, increase revenue and streamline costs.

As an example let’s take the service we provide to customers and break it down. For our clients what matters are:

  • Price
  • Quality
  • Time

Price is quite straightforward, the cheaper the better. Quality is related to how service is carried out particularly reliability of our partners, trust in the way we handle goods, communication, support and UX of our products. Time can be broken down into response time, arrival time and delivery time.

So if we are able to improve any of those components, that means our service becomes better and that should lead to more happy clients and consequently to business growth.

For us first principles thinking means focusing on things that fundamentally matter.

Impact

Analytics is all about making an impact on the business. While we identify what matters the key question is how can we affect it. There are many ways we can have an effect on the business, but let me just try to explain that based on one example from our operations.

One of the core competencies in our platform is about matching orders with drivers.

Example activity — drivers-orders matching

In order to improve our service to customers, our work should be focused on developing capabilities enabling us to systematically improve all of its components like price, quality and time.

So how can we make that one example of the activity of “drivers-order matching” better?

Below is an example from Singapore operations that we have spotted long time ago using interactive data exploration tool we have built. You can see that at this particular case orders could be accepted by drivers who are available and much closer to the order at that very moment.

Example of order assignment (visualized by interactive data tool).
  • by matching driver that is closer to the pickup location the arrival and delivery time will be faster, cost for the driver will be lower, utilization of driver time will be higher and consequently, he will be able to complete more orders and earn more.
  • by consolidating orders and designing optimal route we could offer a better price for customers and at the same time provide higher total value for designated drivers.
  • and by recommending for specific orders driver that is a) best suited to that particular order, b) most likely to accept that order, c) and complete it successfully (with a high rating for completing that kind of orders) we can also ensure delivering the best quality service.

At that time apart from building interactive tools in which you can scroll through time and monitor operations, we have also done deep dive analyses and created scripts for highlighting outliers. And we continue working on various automated data-driven approaches to keep improving that aspect of our operations.

This is of course just one activity where data-driven approach can make a difference. Some other examples from our work include:

  • balancing supply-demand through designing incentives and policies
  • customer segmentation and optimizing performance of marketing campaigns
  • predicting and engaging churn users
  • tracking and improving the performance of products
  • detecting fraud and anomalies

and many more…

Making an impact that affects our core competency is win-win-win-win — customers win, drivers win, business wins and data team is happy to make a real impact. :)

Data Driven Framework

Distribution of demand in Hong Kong

Sometimes it might be tempting to just say “let’s buy algorithm or hire a smart consultant to solve problem x”. While there could be a place and time for that, in a data science environment I do see one big problem with that.

Our ecosystem is not constant and there is a big value in the iterative process of refining solutions and going through learning in a systematic feedback loop. That leads to accumulated knowledge that in my experience can be extremely valuable and accelerates acquiring that magic power of “pattern recognition”.

GOGOVAN economy is a dynamic and complex ecosystem. There can be tradeoffs between some of the underlying service components. Plus what works great today can easily change tomorrow (or even during the same day) and what works great in one market can underperform in the other one.

So it’s not necessarily about having a perfect formula or implementing any particular method for solving it. But more importantly, it’s about having a framework in which we can manage all of the parameters that can lead to continuously, incrementally and systematically improving the service for our customers and partners.

That framework should allow to instantly:

  • monitor
  • design
  • deploy
  • adjust
  • evaluate

all key processes that can contribute to things we are trying to optimize for.

Similar criteria could be valuable when facing any business or technology decision. Any time we make a key decision we could ask ourselves: “How this contributes to our ability to drive improvements in service for our customers and partners ?”

Data Driven Framework is about creating an environment in which we can systematically control and continuously improve our results.

Analytics Toolset

Unlike some of the data science courses could lead us to believe, the truth is that there are much more ways to make an impact as a data scientist than developing cutting-edge deep learning model. :)

In my experience data scientists have the best results when they focus on the problem at hand and choose the most pragmatic way to solve it effectively getting advantage of the quick feedback loop. Quickly iterating, learning and improving on solution brings a lot of value and satisfaction.

Also being part of the wider organization we need to be pragmatic. Building the general production algorithm that controls all aspects of dispatching might be the ultimate solution however it requires much more input and resources than just the data team alone.

So as a data scientists what are the ways we can contribute to the business?

Matrix showing tools available to the data scientist. To play with the interactive version and see descriptions you can use this link and hover on points.

The purpose of visualization above is just to show that there are different “tools” in the inventory of a data scientists to deliver impact. Usually when we say tools we mean languages, libraries, visualization and querying tech, here I just present it in terms of the work outputs that data scientists can deliver or activities they can perform.

Sometimes it might be useful to think in terms of what is the most pragmatic way we can make impact and that is why I have visualized it using those two axes — direct impact and independent contribution.

direct impact — how directly that output or activity can impact business. For example, having an algorithm that automatically assigns drivers has a more direct impact than the report for ops team about matching drivers.

independent contribution — it just means how much we can do it on our own in the data team, without necessarily relying on other infrastructure, resources or impacting product roadmap.

Other things to consider could be also complexity, time and scalability of each of the work outputs.

It’s not meant to be “scientific” and is just for illustration only, in every organization and data team it can feel differently based on respective strategy, infrastructure, skill-set or just a moment in time and company growth.

In our case, our work includes a mix of all tools depending what the task is about, how accurate it needs to be, time available as well as who and how will use it. At GOGOVAN we have created a master data platform that provides the one-stop shop for “everything data”. It allows you to search, navigate, tag, collaborate on and contribute to thousands of charts, reports, interactive tools, notebooks, queries, dashboards, algorithms and other resources.

data platform — navigation
data platform — search

Our data platform could be easily a topic of blog article itself, if you are interested in more details please let me know.

Analytics toolset provides many ways to make an impact, so choose the most appropriate and apply them pragmatically.

Utilization

Question someone might ask is “hey, data team is doing so much but how well can we utilize all that data and work in the company?”. And the answer is it depends. It’s a team effort, we do not work in isolation and things that might influence impact from the work of data team are:

  • an ability for products and systems to integrate and iterate on data-driven features
  • data culture at the organization
  • style and experience of managers of functional teams
  • cross-team communication and collaboration
  • strategic decisions of the organization
  • operational processes and procedures

For example, the more users company has the more people will be impacted even by a small change so there is a bigger potential for optimization. The more data company has the bigger challenges and opportunities for going through it and extracting insights. And finally type of the business will decide of how much difference can tech make in relation to its core competencies. And the more open and supportive is the attitude in organization towards using data, the more people will feel empowered to make decisions and take actions based on it.

So your goal might be expanding those three spheres (data value, data culture and data output) so that they become as big and as overlapped as possible and you are able to produce data output sufficient to address key problems and utilize it well by having people well data educated and with great attitude towards using it.

In our case personally, I believe the potential and value of data is huge. Logistics lends itself greatly for optimization, with large-scale and rapid growth and by being technology startup it means we are gathering large volumes of data about our services, including apps telemetry data, GPS locations, transaction data, marketing information, customer service data, telematics information and more…

Utilization of analytics is about creating the right data output at the company with the right data culture to serve the right data value.

Responsibility

I believe data team is in a unique position to have an impact on every part of the organization. We are very fortunate to be able to spend our days working closely with data so it makes sense that often we might be able to spot problems and opportunities even before they surface out to other teams. And that is why it’s so important that we are proactive, communicate clearly, work closely with people across whole company and take our responsibilities seriously.

Key roles of data team are:

  • provide information and decision support
  • discover insights and share knowledge
  • track performance and progress of company products
  • generate signals and warn if something goes wrong
  • facilitate global cross-team collaboration and sharing best practices
  • democratize data and empower people to use it
  • promote data-driven decision making
  • optimize company services and business activities
  • provide competitive advantage through innovation and developing intellectual property
  • contribute solutions that might revolutionize service or generate new business models

It’s our responsibility to educate people and share knowledge and insights we have found across organization. That unrestricted flow of information to right people and systems is very important so that we can improve our service and resolve any issues as soon as possible. In GOGOVAN we have regular open analytics meetings where founders, management and anyone who is interested can join, learn and discuss newest projects and insights we have been working on.

With great data comes great responsibility.

Review

It’s useful to regularly review work we are doing, particularly see whether we are getting the outcomes we were expecting and what impact we are making. We can learn from that and use it for planning next actions.

Questions useful for thinking about impact:

Apart from that we constantly try to review the way we do work, best practices and techniques:

In the military there is something called AAR (After Action Review). One thing that we do is after our analytics meetings we have a quick retrospective meeting. Each of us types on slack and then discusses three questions:

  1. How did it go?
  2. What went well?
  3. What could we improve?

It’s a very open and supportive environment in which everyone can comment and suggest improvements. We then make sure we incorporate those comments in our next work. We have the best practices notebook that includes snippets of code, explanations, visualizations etc, that in our experience have worked well.

“We must never be to busy to take time to sharpen the saw.” Stephen Covey

We try to design our work environment in such a way that optimizes productivity and experience of data scientist. Some of the things we do include:

  • design our analytics infrastructure and schemas with simplicity, flexibility and performance in mind
  • use leading-edge tools and libraries (yeah we love Python, Pandas, Spark etc. and embrace open source)
  • have notebook template that improves reproducibility and collaboration
  • create utils for common functions and activities (like for example automatic publishing and tagging HTML notebooks directly from Jupyter to our data platform)
  • use dockerized environment, so that new data scientist can come in, run few commands and all is ready to start delivering value in minutes…

Review impact of your work, ask right questions, think about expected outcomes, and look back at the results.

Vision

Vision to put it simply is painting picture of a desirable future. In that future I see an awesome data team making a massive contribution to the success of the company.

Even though we have done significant work in all areas of GOGOVAN, the way I see it, it’s just a warm-up, we still have a lot of opportunities and ways to improve ahead.

We create powerful and comprehensive data capabilities that help the company to achieve its goals (in our case grow, provide thebest service to our users and develop competitive advantage). And we aspire to be the best in the world in that.

The very exciting and promising next step for us is to expand our capabilities of making intelligent decisions automatically and directly in the system. To do that we have to invest in leading edge infrastructure and applied AI/ML capabilities that can make our service even better.

We are always looking for a great tech talent, if you share our vision and passion for data, get in touch.

Our vision is to create the best in class data-driven capabilities that keep pushing company forward.

--

--

Michal Szczecinski
GOGOX Technology

Data Scientist at Google. Former Head of Data at GOGOVAN. Lifelong learner, my favourite topics include tech, data science, quantified self and psychology…