Over the weekend, I wanted to learn a little more about distributed computing, and Hadoop seemed like a good starting point.
To learn Hadoop, I really wanted to get my hands on it to give it a spin.
So now let’s see, how can we try to get Hadoop running on a MacOS. There are many ways to install Java and Hadoop, but in this article, we will use homebrew as the method of installation.
Modes of Hadoop
Also, there are 3 different modes of Hadoop. We will only be place our focus on using the Pseudo-distributed mode in this article.
- Stand-alone mode
- Pseudo-distributed mode
- Distributed mode
Setting Up SSH on MacOS
Before moving ahead to install anything, it is important to get SSH working locally on the MacOS first.
Check that you have SSH enabled properly.
$ ssh localhost
If it prompts you for your password and returns the Last login time, you are good to skip the next step.
To enable SSH in MacOS, go to System Preference > Sharing, enable “Remote Login” and “Allow access for: All Users”
Once all is setup, test it with
ssh localhost again. If you see the Last login time, you are good to go, if not you might face the following problems.
ssh: connect to host localhost port 22: Connection refused
To fix this, you need to first check that remote login is actually OFF.
$ sudo systemsetup -getremoteloginRemote Login: off
If you see the above message, it tells you that remote login is off, proceed to turn on remote login.
$ sudo systemsetup -setremotelogin on
$ ssh localhost
If all is well, you should see the last login time. If you are not greeted with Last login… , then you might need to generate ssh keys (which are generated in the
~/.ssh/id_rsa.pub file) concatenate it to
$ ssh-keygen -t rsa
$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
Finally test with
ssh localhost, this should get you the Last login: … message.
Install Java using brew cask. As of this writing, Java 9 is not compatible with Hadoop 2.8.2 yet. So to prevent receiving errors such as the following
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.hadoop.security.authentication.util.KerberosUtil (file:/usr/local/Cellar/hadoop/2.8.2/libexec/share/hadoop/common/lib/hadoop-auth-2.8.2.jar) to method sun.security.krb5.Config.getInstance()
WARNING: Please consider reporting this to the maintainers of org.apache.hadoop.security.authentication.util.KerberosUtil
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release ...
We will specifically be using Java version 8.
$ brew tap caskroom/versions
$ brew cask install java8
Install Hadoop using brew.
Depending on your xcode version, you might need to update xcode. For me, I updated xcode from App Store.
$ brew install hadoop
After a successful installation, you should see the following messages shown in the screenshot below.
The Hadoop version installed for me was 2.8.2. For the rest of this article do remember to replace this version number with what is applicable to your version.
Hadoop was installed under
/usr/local/Cellar/hadoop. In normal circumstances, brew would have automatically created a symlink from
For simplicity, we will refer this to
/usr/local/opt/hadoop from here on.
Next, you have to start configuring a couple of files. Go to
/usr/local/opt/hadoop . In there you will need to make some changes or create the following files
hadoop-env.sh look for
export HADOOP_OPTS="$HADOOP_OPTS -Djava.net.preferIPv4Stack=true"
Replace it with, remember to change the <JDK_VERSION> with what you currently have.
export HADOOP_OPTS="$HADOOP_OPTS -Djava.net.preferIPv4Stack=true -Djava.security.krb5.realm= -Djava.security.krb5.kdc="export JAVA_HOME="/Library/Java/JavaVirtualMachines/<JDK_VERSION>/Contents/Home"
core-site.xml, you will configure the HDFS address and port number.
<!-- Put site-specific property overrides in this file. --><configuration>
<description>A base for other temporary directories</description>
mapred-site.xml you will configure the jobtracker address and port number in map-reduce. If you cannot find
mapred-site.xml, copy from
$ sudo cp mapred-site.xml.template mapred-site.xml
Add the following into
hdfs-site.xml , set the dfs.replication from the default value of 1 to 3.
Finally, the last step before starting to launch the different services would be to format the HDFS.
$ cd /usr/local/opt/hadoop
$ hdfs namenode -format
/usr/local/opt/hadoop/sbin , there you can use the following scripts
# To start HDFS service
$ ./start-dfs.sh# To stop HDFS service
Next, go to your browser to visit the link
http://localhost:50070 . You should see the following page
MapReduce Framework Services
Remember to go execute all of this in
# To start Yarn
$ ./start-yarn.sh# To stop Yarn
To check that Yarn is working properly, you can visit
http://localhost:8088 to see that it is running well.
Constantly having to go and type each command to start and stop services can be bother some, so instead you can actually use
# to start all services
$ ./start-all.sh# to stop all services
Lastly, you can add the environment variables to /etc/profile
With this, you can now use the scripts anywhere. For example,
# you can use this in any directory
$ stop-dfs.sh$ start-yarn.sh
After installation is done, you can also check out some simple operations that can be done on hdfs.