Photo by Joshua Sortino on Unsplash

Due to the diversity of data sources, and the volume of data that needs to be processed, traditional data processing tools fail to meet the performance and reliability requirements of modern machine learning and data analytics applications.

Part 1 of this series will focus on big data processing engines — Hadoop, Spark, Presto & Airflow. Following Parts will cover how to setup cost efficient and highly scalable & reliable Data pipelines on GCP and AWS.

Apache Hadoop/Hive — batch analytics

Hive is an Apache-open source project built on top of Hadoop for querying, summarising and analysing large data sets using a SQL like interface (similar…


With the rapid R&D of cloud based AI services from Google, Microsoft, IBM Watson and many other players who offer ground breaking AI services on the cloud, it has now become quite easy to utilise these AI services in different segments.

One of the segments in which AI has recently been used to fullest is AI based IVR systems. With these intelligent IVR systems, a customer does not need to waste time in listening and following IVR’s lame instructions and press different keys, in order to route his problem or query to a specific department. …

Him Bhankar

I am a Google certified Professional Cloud Architect who loves to work on distributed computing and designing highly available systems.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store