Data Arena
Published in

Data Arena

The Modern Cloud Data Platform war — A Data Series, Case Study & Reference Architectures

Image created by the author

Case Study: Let us take a simple example, Company X’s Indonesia Branch, Source, and Ingest Data from 3 regions (of the many). The company has a central Data Warehouse and the BI and ML consumption layer is across multiple regions. They run several businesses and is an e-commerce firm — Internet Organization. The volume of the business, number of users, types of applications, sharing and use of data are discussed in the below points.

Massive data input: Data sourcing and ingestion from 100’s of places out of which the below listed 3 regions load petabytes of data as these are central partner locations.

Image created by the author

Data fluctuations: Imagine that there is a flux of data at a different point in time. Between Jan to Jun this year, the data fluctuations was between 500 M to 900 M. With Indonesia Data Architecture relying heavily on the on-premise Data platform, such fluctuations demand scalability (on-premise = vertical scalability, mostly) warrants provisioning for a minimum of 1200 M records and should be able to readily scale anytime. We are to architect using Cloud- let us explore and research different options from Sourcing to consumption overarching Data Governance and Security and build reference architectures with different combinations.

Each bar represents 100 M records. Image created by the author

Massive loads of Data Sharing: Another case for the same firm is that, it has to share loads of data with other organizations — say every month-end they transfer 100 PB of data over.

Image created by the author

User Fluctuations: User access to the portal keeps fluctuating. Certain times there are defined patterns such as festive times, Covid Lockdowns, etc, certain times it goes up with major e-commerce sales days such as Amazon or local providers' sale day.

User fluctuations. Image created by the author

Machine Learning: Different types of Machine Learning algorithms run on these massive data sets from Recommendation engines to fraud detection etc.

ML & outcome Analytics. Image created by the author

Search: several GB’s of data is searched per hour. There are also fluctuations in search.

Image created by the author

and there are few more issues and use cases that will be discussed as we progress.

What we will explore as part of this series:

Image created by the author
The modern Cloud Data Platform stack will be explored as part of this series. Image created by the author

Summary: Modern Cloud Data Platform going strong and Private Blockchain along with the Public Blockchain is catching up big time. But, what is the right solution for my firm, what is in it for the business, how would I ensure that technology spend on such technologies can be shown as a profit centre as it brings in new avenues and opportunities to business. Loads of questions, we shall discuss it all in this series of articles in the form of case studies and reference architectures. Refer to the next part of the article — DataBricks (Part 1).




Data Arena is a place where you will find the most exciting publications about data in general. Here you can exchange ideas with people who are really making things happen with data. Join us, share your ideas, concepts, use cases, codes and let’s make the data community grow.

Recommended from Medium

Engineering tales from Semantics3–2017 edition

Creating an enemy that dodges in Unity

Flappy Ball

Let’s get ready to rumble!

Running Rake Tasks with Cron and File Locks

Simple, working Android LiveData examples

Spring Security: Implementation Basics

Sorting in Scala — a cat shop example

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store


All the views expressed here are my own views and does not represent views of my firm that I work for. Data | Big Data | Cloud | ML

More from Medium

MFine Data Platform

Making a Modern Data Centric Organization — Part 1

Cloud Migration for Enterprise Analytics Environment with On-Demand Clusters

Auto Scaling Impact on Data Scientists Day-to-Day