Word Count Example using Hadoop and Java

Code With Arjun

Published in

Javarevisited

2 min readJul 18, 2022

In this comprehensive tutorial I have shown how you can write code for Word Count Example Hadoop for Map Reduce.

Check out youtube channel for detailed explanation.

Step 1 : Add the dependencies

These are the dependencies you have add on pom.xml file.

Source Code :

Dependencies :

        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-common</artifactId>
            <version>3.3.3</version>
        </dependency>        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-mapreduce-client-core</artifactId>
            <version>3.3.3</version>
        </dependency>

Step 2: Create Word Count Mapper file

Create WC_Mapper.java file and add the following code on it.

WC_Mapper.java


import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.MapReduceBase;
import org.apache.hadoop.mapred.Mapper;
import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Reporter;
public class WC_Mapper extends MapReduceBase implements Mapper<LongWritable,Text,Text,IntWritable>{
    private final static IntWritable one = new IntWritable(1);
    private Text word = new Text();
    public void map(LongWritable key, Text value,OutputCollector<Text,IntWritable> output,
                    Reporter reporter) throws IOException{
        String line = value.toString();
        StringTokenizer  tokenizer = new StringTokenizer(line);
        while (tokenizer.hasMoreTokens()){
            word.set(tokenizer.nextToken());
            output.collect(word, one);
        }
    }

}

Step 3: Create Word Count Reducer file

Create WC_Reducer.java file and add the following code on it.

WC_Reducer.java



import java.io.IOException;
import java.util.Iterator;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.MapReduceBase;
import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Reducer;
import org.apache.hadoop.mapred.Reporter;

public class WC_Reducer  extends MapReduceBase implements Reducer<Text,IntWritable,Text,IntWritable> {
    public void reduce(Text key, Iterator<IntWritable> values,OutputCollector<Text,IntWritable> output,
                       Reporter reporter) throws IOException {
        int sum=0;
        while (values.hasNext()) {
            sum+=values.next().get();
        }
        output.collect(key,new IntWritable(sum));
    }
}

Step 3: Create Word Count Runner file

Create WC_Runner.java file and add the following code on it.

WC_Runner.java



import java.io.IOException;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.FileInputFormat;
import org.apache.hadoop.mapred.FileOutputFormat;
import org.apache.hadoop.mapred.JobClient;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapred.TextInputFormat;
import org.apache.hadoop.mapred.TextOutputFormat;
public class WC_Runner {
    public static void main(String[] args) throws IOException{
        JobConf conf = new JobConf(WC_Runner.class);
        conf.setJobName("WordCount");
        conf.setOutputKeyClass(Text.class);
        conf.setOutputValueClass(IntWritable.class);
        conf.setMapperClass(WC_Mapper.class);
        conf.setCombinerClass(WC_Reducer.class);
        conf.setReducerClass(WC_Reducer.class);
        conf.setInputFormat(TextInputFormat.class);
        conf.setOutputFormat(TextOutputFormat.class);
        FileInputFormat.setInputPaths(conf,new Path(args[0]));
        FileOutputFormat.setOutputPath(conf,new Path(args[1]));
        JobClient.runJob(conf);
    }
}

Step 4: Create txt file and push it into hdfs

Create input.txt file and write some text on it. Now create a folder called input on hdfs using the following command.

hadoop fs -mkdir /input

Now to push the created input.txt file you can type the following command :

hadoop fs -put input.txt /input

Step 5: Run the project

Now to run the project just use the following command.

hadoop jar /target/WordCount-1.0-SNAPSHOT.jar org.codewitharjun.WC_Runner /input/input.txt /output

Once you run this file. You can again go into the hadoop hdfs and see the ouput folder there you will part-00000 file. If you click on this file you will be able to see the words with the number of repetition.

6 Big Data and Hadoop Online Courses for Beginners and Experienced in 2024

Hello guys, If you are learning Hadoop or Big Data and are looking for some awesome courses, then you have come to the…

medium.com

Top 5 Online Courses to Learn Hadoop and Big Data in 2024 — Best of Lot

A blog about Java, Programming, Algorithms, Data Structure, SQL, Linux, Database, Interview questions, and my personal…

javarevisited.blogspot.com