Data Org Structure @ CARS24 — an overview

Naresh Mehta
CARS24 Data Science Blog
7 min readApr 30, 2023

Core data team at CARS24 is ~95 strong supporting businesses across India, Australia, Middle East and South East Asia working closely with stakeholders across business, product, marketing & tech. In addition, we also have ~25 data professionals in the CARS24 fintech arm, independent NBFC licensed lending business.

This write up is an attempt to answer frequently asked questions (especially from friends / leaders in the Indian startup community) around how the data function is currently structured at CARS24, line of thinking behind that and pros & limitations of the same.

Disclaimer : Thoughts shared are personal & specific to the current context of CARS24 which may or may not hold for other organizations and could potentially evolve for CARS24 as well.

Role & org structure of data team

Let’s touch upon following points in the same order — expected role from data function in a business; different ways in which the data team could be organized; role of data engineering & ML Ops; ideal ways of working with other functions.

Role of the data function

This largely depends on the data-savviness of the organization from maturity & capability of the data function to the approach of decision making by the business leaders.

As per my experiences & learning, below is how the role of a data team evolves in an organization.

Basic dashboards & visualization of KPIs is the obvious first step, followed by ability to dive deeper for custom analyses & insights.

As the data ecosystem matures, stronger leverage of statistics & data science gives forecasting & predictive capability to the business. This is also the time when usually back-end data infra (logging, pipelines, database / warehouse etc) gets further streamlined & strengthened. Data function evolves further when it moves from just predicting to prescribing next steps as the accuracy of the models improve & their impact on KPIs become clearer.

Eventually, when the DS/ML solutions get seamlessly plugged into production ecosystem, data function can truly own a problem statement end to end, owning the ‘execution’ leg as well.

Centralized vs Decentralized vs Hybrid ?

Centralized vs decentralized are self explanatory terms, and hybrid is somewhere in the middle! A lot has been written about pros & cons of centralized vs decentralized team structures, and gradually everyone seems to be settling with ‘hybrid’ as the right answer — or we could say ‘easy’ answer, till we start trying to find that fine line that needs to be drawn for ideal ways of working / relative prioritization etc

Widely accepted benefits of centralized & decentralized org structure

Conventional wisdom is that any technical / niche expertise that doesn’t necessarily require deep domain understanding and can be cross leveraged should be evolved as a central horizontal capability (Center Of Excellence) e.g. Data/ML Engg, Product facing DS solutions.

CARS24 data org operates in hybrid structure

In the case of hybrid structure, the decentralized modules should be aligned with business function or central team depending on where the leverage is higher, i.e. if the analytics / insights team get more leverage through synergies with central DS/ML team they should be aligned with central data team and vice versa.

How to think about data engineering / warehousing?

Data engineering usually sits with tech function but there are also examples of data engg sitting with wider data function, former ensures proximity to the source of data & tech / production system while later ensures superior alignment with the end consumer of data (i.e. business / product analysts, data scientists etc).

At CARS24, data engg & warehousing practice was formally started by the CTO back in early 2019. Some of those responsibilities sit with data function now.

  • There are a couple of tech-aligned data engineers who are handling data transformations and click-stream ingestion in production ecosystem, while there are a couple of data-aligned engineers responsible for managed / custom pipelines, warehouse optimization, ELT procedures, data access control and back-end infra of dashboards.

We leverage managed solutions extensively to maintain a lean team. However, we also know there is a LOT more we need to do in this space than we have done so far.

What’s the deal with MLOps / Engg?

Many organizations either expect data scientists to learn production ready deployment skills OR expect DevOps to understand the nuances of ML workflows, both of which are relatively unrealistic. This is the reason why most ML projects get significantly delayed in going live or worse never even see the light of day.

Unlike software development workflows, ML workflows are non-standard (and evolving rapidly), they have model objects, data files, model formats and their compatibility matrix with underlined infra. There is also the need of monitoring model performance, resource utilization, model & data drift. Hence, ML Engg/Ops has emerged as a separate & very critical skill-set cutting across tech & data science realms

At CARS24, we have a ~3 member strong ML Ops practice within core data ecosystem which operates as a horizontal COE helping all the DS modules efficiently interact with the larger production ecosystem. This team thinks ‘engg first’ and has strong ties with DevOps and dotted line to the tech leadership.

Diving deeper into CARS24 data org structure

As an organization we have chosen to operate in a hybrid structure where Data Engg / ML Engg, Marketing analytics & product centric modules of DS (e.g. Magneto (end buyer reco / sorting algos), Auctoris (dealer reco)) operate as global horizontal capability / COE; while Business analysts & business sensitive modules of DS e.g. Profecto (pricing engine) & Fortem (fraud engine) operate decentralized and very closely integrated with respective business functions.

Current data ecosystem at CARS24 is heavily influenced by our philosophy of setting up ‘ML for business’, with data science team having direct & measurable impact on commercial KPIs vs building in isolation.

Below is a high level overview of how CARS24 data ecosystem looks like and deeper illustration of how they engage with business, marketing, product & tech for India business, similar engagement is replicated across other geographies.

Core DS/BI ecosystem @ CARS24 — a high level overview

If we dive deeper, the building blocks of this structure are ‘pods’ / natural working teams which are focused on a given problem statement e.g. buyer top funnel , seller conversion, dealer engagement, refurbishment ops efficiency etc

A typical ‘ideal’ pod has dedicated folks from business, product, data & tech who are responsible to ensure alignment on KPIs / objectives of the pod, relative priorities & timelines of different projects and establishing ideal ways of working within the pod.

A typical pod / natural working team at CARS24

As is the usual practice across most organizations, Product Managers ensure pod dedicated techies and tech lead (usually spread across multiple pods) are aligned on BRD / PRDs, timelines & deliverables.

  • Not very different from how usually ‘Product — Tech’ work together, we have also set up ‘Analytics Lead — Data Science’ relationship at CARS24, albeit a bit less formal. Most of the senior analytics leads at CARS24 have some prior experience with data science / advanced stats before they chose to go deeper into business / commercial side. Having them as an interface between DS & Business helps us create a very productive win-win answer for everyone.

Analytics leads are enabled through full stack DS team as they move from only providing data viz / custom insights to also execute & drive changes being plugged into production system through DS APIs.

Data Scientists focus on problem statements & KPIs that are truly relevant for business, where analysts are able to play ‘checker’ to the ‘maker’ data scientists.

Now while all these pods are relatively self sustained units and could ‘potentially’ operate decentralized, there are obvious upsides of ensuring data professionals across modules are connected through a central ecosystem including the data platform (data warehouse / ML engg) — this ties back to the ‘hybrid structure plugged into central team’ approach discussed in previous section.

Illustration below captures how the data ecosystem is plugged into different pods and still tightly interconnected along with the data platform providing data engg & ML engg capabilities.

Illustrative : Hybrid structure plugged into central team— Data org @ CARS24

Concluding thoughts…

I trust the write up above provides a good high level overview of how we have been thinking about the data org at CARS24. We are still learning, unlearning and relearning as we go along the journey!

It is a fast evolving world. And with the kind of exponential tech advancement being seen across data platforms (advanced data structures, storage & data access mechanisms), AutoML / Explainable AI (XAI) becoming increasingly more real, LLMs kicking in and likely to dramatically change data querying interfaces, and the upcoming ‘Infra as Code’ tools on ML engineering front we can expect to see very different kind of data org structures in a future not so far. New ways of doing old stuff — faster / better / simpler way.

However, till we get there, we all need to find our own answers that work for our specific constraints & context. Let’s keep building!

--

--

Naresh Mehta
CARS24 Data Science Blog

VP, Data & Strategy @ Cars24 | Ex Zomato, ZS Associates, dunnhumby | IIT Madras