5 tips to remember — Data Platform growth

Maximo Alves
Just Eat Takeaway-tech
10 min readMar 4, 2024

--

My experience in the data industry started many years ago at Liberty Global when I was given the chance to lead a team in charge of building a real time analytics platform. The platform we built grew very fast in size and usage becoming a critical part in the company’s main entertainment product. Currently I am leading a team of 50+ engineers responsible for JET’s data platform. As a team we build and maintain platform capabilities. These capabilities empower around 500 data professionals ( aka:data analysts, engineers and scientists ) to build Data Products which deliver value to the business. These Data Products are used daily by thousands of internal users, and empowers customer facing products.

Throughout the years within the data industry, I’ve struggled with early stage mistakes made by myself and others. It’s part of life! We all make mistakes when doing stuff for the first time. After all, that’s how we learn. In this blog post I’ve compiled a list of five oversights and tips on how to avoid them before your data operation has grown large.

Number 1: Starting it ‘on prem’

Nowadays unless you have a very specific use case to solve and a very savvy (and large) engineering team you should choose one of the cloud service providers to run your data platform. This mistake I made myself back in 2016. What started as an engineering adventure with me and the first three engineers of my newly formed team unpacking and hacking the first 12 servers became a nightmare as our clusters grew to more than 700 servers.

The logic is simple and obvious. Data grows exponentially. So your needs for storage, CPU and RAM memory also grow sharply. This implies continuous effort on purchasing, hacking and patching servers. Also engineering effort is required on all levels of your data platform architecture. From booting newly patched servers, formatting HDs, installing OS and much more before you can start configuring the systems which will be the basic blocks of your data platform. Of course this can all be automated! But nevertheless it’s a lot of engineering effort and it’s an engineering skill you won’t find easily.

Nowadays going directly to the cloud is certainly best-practice. However if you’re still considering going on prem there is one last aspect that might not be that obvious. You won’t have the resource ( storage, CPU and RAM ) elasticity provided by cloud providers. In case your platform usage is growing organically you can mitigate resource constraints with good capacity planning. However the operations of the data platforms might require a large chunk of idle capacity. This is needed for routine operations like rebalancing partitions of a large Kafka cluster or relocating shards in ElasticSearch. You will need plenty of empty space to execute these tasks. If you don’t have it, be prepared for a challenging and time costly Tetris game.

Number 2: Hiring only data experts first

Your company has put you in charge of launching a data initiative which will empower the business. It’s a great opportunity and you’re very excited, willing to deliver value as quickly as possible. You’ve done your homework well and know that going on prem is not an option. You start talking with ‘sales architects’ from Cloud Providers and/or other data technology vendors. It’s mind blowing! It’s become so easy to use these services to ingest, process and visualise data. So you go and hire your first team members: data engineers, data warehouse engineers and data analysts. As you’re driven, smart, hard working and, because all areas of your business need to be empowered by data, your platform will grow like mushrooms do in the autumn forest.

This huge business success will eventually cause a data platform operational struggle once the amount of data products and users grow large. Without early guidance and enforced consistency your access management will be a mess with way too many engineers having near full admin rights. This will turn on a proliferation of distinct cloud services being used for no good reason. Why ? Well, engineers love to experiment with new technology and cloud providers are hugely exciting in this aspect . Overall service quality might be low due to lack of consistent monitoring and automated deployments. Worst of all, you might even have service passwords and authentication tokens exposed in the ETL code.

Easiest way to mitigate all this trouble is to build a partnership with an already existent team of infrastructure engineers, or hire them yourself if needed, as you hire your first data engineers. The first engineers you hire must define an architectural blueprint for doing the basics of data tasks like ingest, transform, egress and visualise. Also you must give the infra team some time to get the basics in place before the masses of data engineers and analysts arrive very keen to deliver business value. Few things that need to be initially in place are: access management policy (and roles), code repository, automated deployment and naming conventions.

Number 3 : Relegating Data Governance to last position in your priority list

Most of us have, at some point in our lives, neglected something which at that moment wasn’t critical just to be surprised later on how critical it became. Data Governance certainly falls in this category of things one should not neglect initially. You don’t need to get all too paranoid about it either and overcomplicate things. Initially most of your engineering resources will necessarily be focused on other areas anyway. The point I want to make is: in the early stage just a few actions around Data Governance will save you a great deal of problems later.

It’s very tempting to take advantage of modern cloud data services ease of usage to deliver value in record time to business. At any corporation all business pillars generate lots of data. All of them have senior stakeholders keen to take advantage of data analytics to create insights which will improve their performance. This means your Data Platform will experience accelerated growth in many aspects. Eventually you might long for the early days when almost all platform users had access to almost all data. Managing access to data in an agile, secure and compliant manner for hundreds ( or thousands ) of employees isn’t easy. Another interesting phenomenon is how data pipelines grow in quantity, complexity and dependency. It sometimes looks like the NY, London and Paris subway networks combined together and increased by a large factor. It’s great, right? Your small data platform baby has grown into a colossus! Well it becomes less funny when that amazing dashboard, full of fancy business KPIs, which is viewed daily by the CEO stops showing accurate data for no clear reason.

A lot of this trouble can be avoided by implementing a few key governance guidelines during the early phase. However, this is not a one time exercise because your business needs, your data platform and data governance policies will be in constant change. One good place to start is to define how data will be classified? PII, confidential, restricted , etc. Don’t go crazy here. If it’s too complex no one will use it. Next, how will access to data be managed? Who can access which dataset? Who will grant access and how? If you get these basics implemented right you are in good shape to roll. Get them somehow automated and you’re ready for the hordes of datasets and platform users in the near future. At JET we opted for a data governance strategy around clear data ownership because it solves many problems at once. Basically the data owner, next to being responsible for the data quality, is also in charge of defining who has access to the data.

Number 4: Underestimating compliance controls and audit efforts

When I joined JET in 2021 to lead the data platform engineering team I was surprised by the effort put regularly on executing compliance controls and supporting auditing efforts.. With today’s technology it’s easy to ingest distinct data sources, transform and merge them together creating time saving business empowering data products which make all types of business operations more efficient. What once started as a humble dashboard showing daily KPIs might grow into a powerful product supporting critical business processes.

So, why does a Data Platform which was supposed to be used for generating analytics and data science driven insight become a sort of “swiss knife” enablement platform supporting so many business critical processes? My guess is: because solving business problems in the Data Platform requires less effort than solving them elsewhere. Let me give you one example. Suppose your company works with external sales representatives which receive a fixed salary. Then the Sales Team decides to boost the representative’s performance by adding a monthly bonus based on their performance. Chances are that all data required for the bonus report is already ingested into the Data Platform because it’s being used to generate operational dashboards. A data analyst can probably create the required bonus report in days while it would take the IT team who maintains the Sales System a few weeks to achieve the same. It’s not that the data analyst is smarter than the IT team members! It’s just that the Sales System is way more complex, and riskier to change, than the single data pipelines required for the bonus report.

The rule of thumb here is the following: given the chance, tools and services boosted by data will find their place throughout the company. This is amazing! However if there is money involved the auditors (internal , external or both) will eventually knock on your door. They will ask tons of questions, find process gaps and leave you with a list of improvement actions and a few compliance controls to execute regularly. In time the audits and work overheard generated by them will bite a good chunk of your engineering capacity. In case you own the data strategy for your company, you can decide not to allow the Data Platform to host ‘auditing sensitive’ services. However, if you’re like most in this position, your strategy will be to empower your business with data. In which case you will do well by mitigating this risk by making sure that 1) your engineering team is dimensioned to cope with the extra effort and 2) design ‘audit proof’ systems. By the way, the best strategy here is to team up with the internal audit team. They will help you to identify design requirements.

Number 5: Believing that the ‘platform team’ will fully own operational costs

Data organisations in large companies evolve the same way as other parts of the broader technology organisation. Initially a small bunch of infra engineers provide a tiny abstraction layer on top of cloud provider services for other teams to use. This tiny layer covers the basics. So, each team using this tiny cloud layer will end up spinning up and maintaining whatever cloud services are needed for them to deliver value to the business. In this scenario they will deploy and maintain their own DBs, computing instances, service monitoring, etc. Eventually when the organisation grows large enough a Platform Team will be created. Much can be said about this topic. Nevertheless, simply put, a Platform Team provides high level services ( or capabilities) for other teams of engineers aiming to make their work more efficient. For example, all teams need observability for their applications. So, the platform team will create an observability capability for all to use. Or provide a Kubernetes cluster which all developers can use to launch their applications.

Within data orgs the same can happen. First, the infra guys, assuming you didn’t make the second mistake, will deploy the basics: basic identity and access management, code repo and means to deploy automatically, etc. This initial infra setup will be used by teams of data engineers to create their pipelines ingesting and transforming data. Eventually it will make sense to put together a team of technically savvy data engineers to start building the data platform capabilities other teams will use to deliver business value. This organisational topology works great for a few reasons. First, it helps you to scale scarce resources like data engineers as a group of few can enable the work of many. It also allows you to use cloud resources efficiently as the platform capabilities can be used to enforce a ‘golden path’ of best practices. It also gives you a nice chance to excel on data governance practice by designing the platform capabilities having data governance principles in mind.

Assuming your data organisation serves a large corporation and their goal is to empower the whole business, your operational bill will eventually catch your CFO’s attention. Asking the Platform Team to ‘own’ the costs and try either to reduce or contain it won’t be efficient as they don’t own the data products created by value stream-aligned teams. It’s more efficient to initially design the data platform in a way that cloud costs can be easily allocated to teams using it. Data processing and storage are the biggest cost drivers. So having accurate visibility on how much processing and storage resources each stream-aligned team is using will give you the leverage to keep these teams accountable for their platform costs.

Main Takeaway

Data Platforms are flexible and powerful. This combination might impulse them to grow exponentially within your business. As described above basic infrastructure setup, data governance principles, auditing and cost control are relatively easy to tackle initially but a big headache if dealt with later.

What are your thoughts on this ? Have you experienced something similar ? Please share your comments

Just Eat Takeaway.com is hiring! Want to come work with us? Apply today

--

--