Hadoop 2.x vs Hadoop 3.x

Feature wise Comparison Between Hadoop 2.x vs Hadoop 3.x
This section will let you know the Top 22 differences between Hadoop 2.x vs Hadoop 3.x. Let us now discuss each feature one by one-

  1. License
    Hadoop 2.x — Apache 2.0, Open Source
    Hadoop 3.x — Apache 2.0, Open Source

2. Minimum supported version of Java
Hadoop 2.x — Minimum supported version of java is java 7.
Hadoop 3.x — Minimum supported version of java is java 8

3. Fault Tolerance
Hadoop 2.x — Fault tolerance can be handled by replication (which is wastage of space).
Hadoop 3.x — Fault tolerance can be handled by Erasure coding.

4. Data Balancing
Hadoop 2.x — For data, balancing uses HDFS balancer.
Hadoop 3.x — For data, balancing uses Intra-data node balancer, which is invoked via the HDFS disk balancer CLI.

5. Storage Scheme
Hadoop 2.x — Uses 3X replication scheme
Hadoop 3.x — Support for erasure encoding in HDFS.

6. Storage Overhead
Hadoop 2.x — HDFS has 200% overhead in storage space.
Hadoop 3.x — Storage overhead is only 50%.

7. Storage Overhead Example
Hadoop 2.x — If there is 6 block so there will be 18 blocks occupied the space because of the replication scheme.
Hadoop 3.x — If there is 6 block so there will be 9 blocks occupied the space 6 block and 3 for parity.

8. YARN Timeline Service
Hadoop 2.x — Uses an old timeline service which has scalability issues.
Hadoop 3.x — Improve the timeline service v2 and improves the scalability and reliability of timeline service.

9. Default Ports Range
Hadoop 2.x — In Hadoop 2.0 some default ports are Linux ephemeral port range. So at the time of startup, they will fail to bind.
Hadoop 3.x — But in Hadoop 3.0 these ports have been moved out of the ephemeral range.

10. Tools
Hadoop 2.x — Uses Hive, pig, Tez, Hama, Giraph and other Hadoop tools.
Hadoop 3.x — Hive, pig, Tez, Hama, Giraph and other Hadoop tools are available.
Learn Apache Hadoop Ecosystem Components in detail.

11. Compatible File System
Hadoop 2.x — HDFS (Default FS), FTP File system: This stores all its data on remotely accessible FTP servers. Amazon S3 (Simple Storage Service) file system Windows Azure Storage Blobs (WASB) file system.
Hadoop 3.x — It supports all the previous one as well as Microsoft Azure Data Lake filesystem.

Read Complete Article>>

One clap, two clap, three clap, forty?

By clapping more or less, you can signal to us which stories really stand out.