Launch a Hadoop Cluster in 90 Seconds or Less in Google Cloud Dataproc!

Kimoon Kim
Google Cloud - Community
5 min readNov 21, 2017

--

“I meant what I said and I said what I meant. An elephant’s faithful one-hundred percent!” — Dr. Seuss

Hadoop, our favourite elephant, is an open-source framework that allows you to store and analyse big data across clusters of computers. This is achieved by using Google’s MapReduce programming model which you can learn more about here.

Now launching an on-premise Hadoop cluster is not an easy job. I have been fortunate enough to be involved in the process and know the effort it takes to build one. For reference, it generally takes a day and a half per node to launch a working cluster (e.g. a 10-node cluster usually takes around 15 working days). Who has 15 days to build a 10-node cluster?

Lately in South Africa, there have been a lot of interest in Hadoop among corporates and big data startups. That is why I thought it was a good time to write an article which shows how easy it is to launch a Hadoop cluster (step-by-step) using Google Cloud Dataproc. Cloud Dataproc is Google’s fully-managed Hadoop, Spark, and Flink services that allows you to deploy clusters in a simpler, more cost-efficient way. Let me show you how.

What you need

  • A Google Cloud project with billing enabled which you can learn how do do it here.
  • Write down your public IP address by typing “What is my IP” in Google.

1. Network Setup

Step 1

Go to your project dashboard that you have created on console.cloud.google.com

Project Dashboard

Step 2

Click on the “Hamburger-stack” or Menu at the top left hand side of your screen.

‘Hamburger-stack’ or Menu

Step 3

Under NETWORKING, hover around Network Services and click on the Firewall rules button.

Firewall rules

Step 4

Then click Create Firewall Rule.

Firewall rules for ingress traffic under VPC network

Step 5

Type in the name you want to give to your Hadoop network.

Firewall Rule Settings

Step 6

  1. Change the Targets from “Specified target tags” to “All instances in the network”.
  2. Under the Source IP ranges, put in your public IP address that you have noted down in the beginning.
  3. Under Protocols and ports, type: tcp:8088;tcp:50070;tcp:8080 to open specific ports.
  4. Click Create.
Firewall Rule Settings

2. Dataproc Setup

Step 7

Click on the “Hamburger-stack” or Menu and navigate to Dataproc under the BIG DATA section.

Menu

Step 8

Click on the Create cluster button.

Cloud Dataproc Menu

Step 9

  1. Give the cluster a name.
  2. Set your specific Machine type for your Master node(s) and Primary disk size (e.g., n1-standard-4 with 10 GB disk size).
  3. Set your specific Machine type for your Worker node(s) and specify how many nodes you require (e.g., x2 n1-standard-4 with 10 GB disk size).
  4. You can go into more detail by expanding the “Preemptible workers, bucket, network, version, initialization, & access options” link.
  5. (Optional) You can create a Hadoop cluster under a different network. For the purpose of this article, I created a hadoop-firewall rule under the default network which this cluster sits under.
  6. Then click on the create button to create the cluster when you are done.
Cloud Dataproc Settings

Step 10

Wait (around 90 seconds) till your cluster is created. You’ll know it is done when you see the green tick.

Dataproc Dashboard

3. Test your Hadoop cluster

Step 11

Click on Compute Engine under the COMPUTE section in your menu and note.

Step 12

Note down your Master Node’s External IP address (it is your cluster name followed by -m).

VM Instances

Step 13

On your browser type the External IP: followed by the port (e.g., 35.195.107.25:8088 in my case) and you should see the same screen as I see below.

Hadoop Port 8088: Cluster Metrics

Congratulations! See how simple it is?

If you want to know more about Hadoop on Dataproc, please go read Tino Tereshko’s great article on Why Dataproc — Google’s managed Hadoop and Spark offering is a game changer.

Follow me on twitter at @kimoon92 and let me know what you think!

-Kimoon Kim

--

--