Big Data — Installing Hadoop, Apache Spark and Scala on Ubuntu
Before installation process to give summarized info about Big Data tools would be useful.
1.Hadoop — is a framework for distributed processing of large data collections.Main aim is conducting this process not only over super computer, but also within low-cost hardwares. No doubt that minimum specifications is required as well. Hadoop breaking data sets into several blocks and manages these via nodes.Replication factor lets HDFS become fault tolerant file system.
2.Apache Spark — is cluster computing framework. No mandatory to use Apache Spark beside Hadoop, but to speed up map-reduce processes over big data analysis Spark is essential. Apache foundation argues Spark’s map-reduce computation is 100 times faster than Hadoop’s. Spark is built on JVM via Scala. Main factor is RDD which lets tool become fault-tolerant as well.
3.Scala — which is main language using Apache Spark (others are Java,Python,R).Scala is scalable and powerful language, has several advantageous sides over Java-Python-R.
So, i guess that is pretty good to see big picture for essential big data management processing.Before installation, giving version information about mentioned tools would be better.
1.Hadoop 2.7.6
2.Apache Spark 2.2.0
3.Scala 2.12.6
To run whole mentioned tools, we should install JDK or JRE (1.8 or newer versions).
Warning: Linux Ubuntu distro will be used, not Windows or Mac ^_^
First step completed, what we did is installing required JDK environment,building SSH/SSHD connections and creating new user with sudo privileges to manage Hadoop Ecosystem.
Next, download files,create folders and installing frameworks.
If installation is success, then we can run and test applications.
Note that, before running Hadoop, namenode should be formatted.
Then, start or manage namenodes/datanodes.Control whether whole processes is on duty.If not go to directories and format (remove) and restart again.
Fill free to give feedback kindly to improve blogposts.
