On-prem to Cloud — Hadoop

Neeraj Sabharwal
May 6 · 3 min read

Disclaimer: I have worked for a Hadoop vendor called Hortonworks who is Cloudera now and I have worked many customers in my technical pre-sales role where I have sold Hadoop and also, helped customers on comparing Hadoop vs EMR, Dataproc

Let’s talk about Cloud giant offerings on Hadoop. EMR from AWS, Dataproc from Google and HDInsight from Microsoft/Azure

EMR https://aws.amazon.com/emr/

Dataproc https://cloud.google.com/dataproc/

Azure https://azure.microsoft.com/en-us/services/hdinsight/

Why do we need to move to Cloud-based Hadoop solution?

To reduce CapEx and OPEX cost. There is no need to keep servers up and running if there is no workload. Also, having a centralized data storage layer with the flexibility of spinning up the compute layer without moving the data around is a powerful idea if done right.

Imagine shutting down the on-prem Hadoop cluster during off-peak hours or idle time.

In this article, let’s take a look at GCP

1 — Move data to cloud buckets.

I will be using “EXPLORE DRAG-N-DROP, GSUTIL AND JSON API INGEST TOOLS”

gsutil cp Iamme.csv gs://nstesting


Have you heard of distcp? If not then please google it.

Using DistCp to copy your data to Cloud Storage

neerajsabharwal@gcp ~> gcloud dataproc clusters create nsab

neerajsabharwal@gcp ~> gcloud dataproc clusters list

NAME WORKER_COUNT PREEMPTIBLE_WORKER_COUNT STATUS ZONE

nsab 2 RUNNING us-central1-a

let’s connect to master node, create test data

copy test data to gcs

hadoop distcp hdfs://nsab-m.c.demons123.internal:8020/gcp_mig/ gs://gcp_mig_hadoop/

let’s play around with hive

In this article, we created the dataproc cluster, copied data from hdfs to gs and then created a hive table on gs.

Now, if you have on-prem Hadoop cluster then using cloud VPN you can set up the connectivity between on-prem and your cloud setup.

There is a lot more work to do when it comes to putting together an end to end story. I will find time to play around more and create more articles with the demo.