Photo by Tim Gouw on Unsplash

How to start Hadoop on macOS this weekend

Jayden Chua
Feb 11, 2018 · 5 min read

Over the weekend, I wanted to learn a little more about distributed computing, and Hadoop seemed like a good starting point.

To learn Hadoop, I really wanted to get my hands on it to give it a spin.

So now let’s see, how can we try to get Hadoop running on a MacOS. There are many ways to install Java and Hadoop, but in this article, we will use homebrew as the method of installation.

Modes of Hadoop

Also, there are 3 different modes of Hadoop. We will only be place our focus on using the Pseudo-distributed mode in this article.

  1. Stand-alone mode
  2. Pseudo-distributed mode
  3. Distributed mode

Setting Up SSH on MacOS

Before moving ahead to install anything, it is important to get SSH working locally on the MacOS first.

Check that you have SSH enabled properly.

$ ssh localhost

If it prompts you for your password and returns the Last login time, you are good to skip the next step.

To enable SSH in MacOS, go to System Preference > Sharing, enable “Remote Login” and “Allow access for: All Users”

System Preference > Sharing. Enable Remote Login and Allow Access for All Users

Once all is setup, test it with ssh localhost again. If you see the Last login time, you are good to go, if not you might face the following problems.

ssh: connect to host localhost port 22: Connection refused

To fix this, you need to first check that remote login is actually OFF.

$ sudo systemsetup -getremoteloginRemote Login: off

If you see the above message, it tells you that remote login is off, proceed to turn on remote login.

$ sudo systemsetup -setremotelogin on
$ ssh localhost

If all is well, you should see the last login time. If you are not greeted with Last login… , then you might need to generate ssh keys (which are generated in the ~/.ssh/id_rsa.pub file) concatenate it to ~/.ssh/authorized_keys .

$ ssh-keygen -t rsa
$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

Finally test with ssh localhost, this should get you the Last login: … message.

Install Java

Install Java using brew cask. As of this writing, Java 9 is not compatible with Hadoop 2.8.2 yet. So to prevent receiving errors such as the following

WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.hadoop.security.authentication.util.KerberosUtil (file:/usr/local/Cellar/hadoop/2.8.2/libexec/share/hadoop/common/lib/hadoop-auth-2.8.2.jar) to method sun.security.krb5.Config.getInstance()
WARNING: Please consider reporting this to the maintainers of org.apache.hadoop.security.authentication.util.KerberosUtil
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release ...

We will specifically be using Java version 8.

$ brew tap caskroom/versions
$ brew cask install java8

Install Hadoop

Install Hadoop using brew.

Depending on your xcode version, you might need to update xcode. For me, I updated xcode from App Store.

$ brew install hadoop

After a successful installation, you should see the following messages shown in the screenshot below.

The Hadoop version installed for me was 2.8.2. For the rest of this article do remember to replace this version number with what is applicable to your version.

Successful installation of Hadoop 2.8.2

Hadoop was installed under /usr/local/Cellar/hadoop. In normal circumstances, brew would have automatically created a symlink from /usr/local/opt/hadoop to /usr/local/Celler/hadoop/<your-version-of-hadoop>

For simplicity, we will refer this to /usr/local/opt/hadoop from here on.


Configuration

Next, you have to start configuring a couple of files. Go to /usr/local/opt/hadoop . In there you will need to make some changes or create the following files

  1. hadoop-env.sh
  2. core-site.xml
  3. mapred-site.xml
  4. hdfs-site.xml

In hadoop-env.sh look for

export HADOOP_OPTS="$HADOOP_OPTS -Djava.net.preferIPv4Stack=true"

Replace it with, remember to change the <JDK_VERSION> with what you currently have.

export HADOOP_OPTS="$HADOOP_OPTS -Djava.net.preferIPv4Stack=true -Djava.security.krb5.realm= -Djava.security.krb5.kdc="export JAVA_HOME="/Library/Java/JavaVirtualMachines/<JDK_VERSION>/Contents/Home"

In core-site.xml, you will configure the HDFS address and port number.

<!-- Put site-specific property overrides in this file. --><configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/local/Cellar/hadoop/hdfs/tmp</value>
<description>A base for other temporary directories</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:8020</value>
</property>
</configuration>

In mapred-site.xml you will configure the jobtracker address and port number in map-reduce. If you cannot find mapred-site.xml, copy from mapred-site.xml.template first.

$ sudo cp mapred-site.xml.template mapred-site.xml

Add the following into mapred-site.xml .

<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:8021</value>
</property>
</configuration>

In hdfs-site.xml , set the dfs.replication from the default value of 1 to 3.

<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>

Finally, the last step before starting to launch the different services would be to format the HDFS.

$ cd /usr/local/opt/hadoop
$ hdfs namenode -format

HDFS Services

Go to /usr/local/opt/hadoop/sbin , there you can use the following scripts

# To start HDFS service
$ ./start-dfs.sh
# To stop HDFS service
$ ./stop-dfs.sh

Next, go to your browser to visit the link http://localhost:50070 . You should see the following page

Hadoop running successfully on http://localhost:50070

MapReduce Framework Services

Remember to go execute all of this in /usr/local/opt/hadoop/sbin

# To start Yarn
$ ./start-yarn.sh
# To stop Yarn
$ ./stop-yarn.sh

To check that Yarn is working properly, you can visit http://localhost:8088 to see that it is running well.

MapReduce Framework running properly

Pro-tips

Constantly having to go and type each command to start and stop services can be bother some, so instead you can actually use

# to start all services
$ ./start-all.sh
# to stop all services
$ ./stop-all.sh

Lastly, you can add the environment variables to /etc/profile

export HADOOP_HOME="/usr/local/opt/hadoop"
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin

With this, you can now use the scripts anywhere. For example,

# you can use this in any directory
$ start-dfs.sh
$ stop-dfs.sh
$ start-yarn.sh
$ stop-yarn.sh

Jayden Chua

Written by

An avid web developer constantly looking for new web technologies to dabble in, more information can be found on bit.ly/jayden-chua

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade