Deploy a React Application | Extract Timeline Tweets using Tweepy

MapReduce with Python

Learn how MapReduce deals with BIG data using the MRjob Python library

Andrea Perera
Geek Culture
Published in
4 min readAug 23, 2021

--

Created by the author

When we deal with “BIG” data, as the name suggests, dealing with a large amount of data is a daunting task.MapReduce is a built-in programming model in Apache Hadoop. It will parallel process your data on the cluster.

This article will look into how MapReduce works with an example dataset using Python.

MapReduce: Analyze big data

MapReduce will transform the data using Map by dividing the data into key/value pairs, getting the output from a map as an input, and aggregating the data together by Reduce.MapReduce will deal with all your cluster failures.

How MapReduce Works

To understand MapReduce, let’s take a real-world example. You have a dataset that consists of hotel reviews and ratings. Now you need to find out how many reviews each rating has.

Dataset Snapshot

--

--

Andrea Perera
Geek Culture

Technical Writer | Software Engineer | MSc in Big Data Analytics | Email:andriperera.98@gmail.com | Linkedin: Andrea Perera