How to install Hadoop on Mac OS

Wilfred Labue
3 min readAug 31, 2020

--

What is Hadoop ?

Hadoop is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage.(https://hadoop.apache.org/_)

Steps :

  1. Install Homebrew
  2. Check if you have Java install
  3. Install hadoop
  4. Make Configuration changes
  5. Format NameNode
  6. Starting Hadoop

1. Install Homebrew

Homebrew is a package manager for Mac OS

In other to install homebrew ; copy the code below and paste it into a macOS terminal window (https://brew.sh/)

$ /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install.sh)"

2. Check if you have Java installed

If not install Java 8 or later version. For this tutorial we are going to use Java 10.0.2.

Check if you have Java installed:

$ java -version
  • Install java

To install the latest Java version , enter the following into your terminal.

$ brew cask install java

3. Install Hadoop

We are going to install hadoop using homebrew. Enter the following code into your terminal.

$ brew install hadoop

4. Make Configuration Changes

In this step we are going to update the environment variable settings and make changes to the following files:

a. hadoop-env.shb. hdfs-site.xmlc. core-site.xmld. mapred-site.xmle. check if ssh is enabled

a. Configure hadoop-env.sh

  • Go to /usr/local/Cellar/hadoop/3.3.0/libexec/etc/hadoop and open hadoop-env.sh. Search for JAVA_HOME and configure it
  • Make sure to change the version of hadoop to the one you currently have installed( for this tutorial i have hadoop 3.3.0 ).
  • After that you can vim into hadoop-env.sh. This will open the file so that you can edit it.
$ vim hadoop-env.sh

Change the export JAVA_HOME variable to :

export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk10.0.2.jdk/Contents/Home

b. configure hdfs-site.xml

$ vim hdfs-site.xml

and add the following code;

<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>

c. configure core-site.xml

$ vim core-site.xml

add the following code;

</property>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:8020</value>
</property>
</configuration>

d. configure mapred-site.xml

$ vim mapred-site.xml

add the following code;

<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:8021</value>
</property>
</configuration>

e. Check if ssh is enabled

$ ssh localhost

If you get an error saying ;

ssh: connect to host localhost port 22: Connection refused

run the following command:

$ ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
$ chmod 0600 ~/.ssh/authorized_keys

This will authorized SSH Keys and allow your system to accept login.

If you are having issues after authorizing SSH keys , make sure you have enable remote login by going to:

system preferences →sharing →remote login ( select remote login)

5. Format NameNode

enter the following commands ;

$ cd /usr/local/cellar/hadoop/3.3.0/libexec/bin
$ hdfs namenode -format

6. Starting Hadoop

In other to start and run hadoop ;

$ cd /usr/local/cellar/hadoop/3.3.0/sbin
$ ./start-dfs.sh # to start hadoop
$ ./stop-dfs.sh # to stop hadoop

After starting hadoop , run jps to comfirm that hadoop and other services are running.

$ jsp17664 Jps
17537 SecondaryNameNode
17299 NameNode
17401 DataNode
10826 ResourceManager

Access hadoop web interface to see the configuration by going to;

http://localhost:9870

I hope this tutorial was helpful!

Twitter: @labue_wilfred

--

--