How to install Hadoop on Mac OS
What is Hadoop ?
Hadoop is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage.(https://hadoop.apache.org/_)
Steps :
- Install Homebrew
- Check if you have Java install
- Install hadoop
- Make Configuration changes
- Format NameNode
- Starting Hadoop
1. Install Homebrew
Homebrew is a package manager for Mac OS
In other to install homebrew ; copy the code below and paste it into a macOS terminal window (https://brew.sh/)
$ /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install.sh)"
2. Check if you have Java installed
If not install Java 8 or later version. For this tutorial we are going to use Java 10.0.2.
Check if you have Java installed:
$ java -version
- Install java
To install the latest Java version , enter the following into your terminal.
$ brew cask install java
3. Install Hadoop
We are going to install hadoop using homebrew. Enter the following code into your terminal.
$ brew install hadoop
4. Make Configuration Changes
In this step we are going to update the environment variable settings and make changes to the following files:
a. hadoop-env.shb. hdfs-site.xmlc. core-site.xmld. mapred-site.xmle. check if ssh is enabled
a. Configure hadoop-env.sh
- Go to /usr/local/Cellar/hadoop/3.3.0/libexec/etc/hadoop and open hadoop-env.sh. Search for JAVA_HOME and configure it
- Make sure to change the version of hadoop to the one you currently have installed( for this tutorial i have hadoop 3.3.0 ).
- After that you can vim into hadoop-env.sh. This will open the file so that you can edit it.
$ vim hadoop-env.sh
Change the export JAVA_HOME variable to :
export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk10.0.2.jdk/Contents/Home
b. configure hdfs-site.xml
$ vim hdfs-site.xml
and add the following code;
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
c. configure core-site.xml
$ vim core-site.xml
add the following code;
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:8020</value>
</property>
</configuration>
d. configure mapred-site.xml
$ vim mapred-site.xml
add the following code;
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:8021</value>
</property>
</configuration>
e. Check if ssh is enabled
$ ssh localhost
If you get an error saying ;
ssh: connect to host localhost port 22: Connection refused
run the following command:
$ ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
$ chmod 0600 ~/.ssh/authorized_keys
This will authorized SSH Keys and allow your system to accept login.
If you are having issues after authorizing SSH keys , make sure you have enable remote login by going to:
system preferences →sharing →remote login ( select remote login)
5. Format NameNode
enter the following commands ;
$ cd /usr/local/cellar/hadoop/3.3.0/libexec/bin
$ hdfs namenode -format
6. Starting Hadoop
In other to start and run hadoop ;
$ cd /usr/local/cellar/hadoop/3.3.0/sbin
$ ./start-dfs.sh # to start hadoop
$ ./stop-dfs.sh # to stop hadoop
After starting hadoop , run jps to comfirm that hadoop and other services are running.
$ jsp17664 Jps
17537 SecondaryNameNode
17299 NameNode
17401 DataNode
10826 ResourceManager
Access hadoop web interface to see the configuration by going to;
I hope this tutorial was helpful!
Twitter: @labue_wilfred