Important Topic of Big Data Hadoop -2018
Big Data Hadoop:
Hadoop changes the perception of Big Data management, especially unstructured data. We know how the Apache Hadoop software library, which is the framework, plays a key role in Big Data processing. Apache Hadoop allows the rationalization of leftover data for all distributed processing systems into groups of computers that use simple programming models. It is really trying to increase from individual servers to a large number of machines, each of which offers local storage and computing. Instead, depending on the hardware that provides high availability, the library is designed to detect and deal with flaws in the application layer by providing an exceptionally accessible service along with a group of computers, since both versions may be vulnerable to errors.
The Hadoop community pack includes:
- System of files and abstractions of operating system level
- Engine MapReduce (MapReduce or YARN)
- Hadoop distribution file system (HDFS)
- ARchive Java Files (JAR)
- Scripts needed to run Hadoop
- Source code, documentation and contribution
- Activities carried out in Big Data:
Storage: Large data must be collected in an integrated repository and not stored in a single physical database.
Process: the process becomes more boring than the traditional one in terms of cleaning, enrichment, calculation, transformation and execution of algorithms.
Access: there is no commercial importance for this, if the data can not be searched, they are easy to obtain and can be displayed in a practical way in commercial areas.
Hadoop Distributed FileSystem (HDFS)
HDFS is designed to work with basic hardware. Many files, which are usually stored in gigabytes, are stored in a terabyte on different computers. HDFS guarantees knowledge of the data between task tracking and job trackers. The task scheduler plans a map or reduces the number of jobs for crawlers with information about the location of the data. This simplifies the data management process.
The two main parts of Hadoop are the data processing framework and the HDFS. HDFS is a file system with rack recognition for efficient data processing. HDFS implements a model with a writer, multiple reader and supports operations to read, write and delete files and operations to create and delete directories.
HDFS architecture
