Paper that influenced Big data movement

Published in

Software System Design

5 min readSep 11, 2019

MapReduce: Simplified Data Processing on Large Clusters
This paper presents MapReduce, a programming model and its implementation for large-scale distributed clusters. The main idea is to have a general execution model for codes that need to process a large amount of data over hundreds of machines.
The Google File System
It presents Google File System, a scalable distributed file system for large distributed data-intensive applications, which provides fault tolerance while running on inexpensive commodity hardware, and it delivers high aggregate performance to a large number of clients.
Bigtable: A Distributed Storage System for Structured Data
This paper presents the simple data model provided by Bigtable, which gives clients dynamic control over data layout and format, and the design and implementation of Bigtable.
Dynamo: Amazon’s Highly Available Key-value Store
This paper presents the design and implementation of Dynamo, a highly available key-value storage system that some of Amazon’s core services use to provide an “always-on” experience.
The Chubby lock service for loosely-coupled distributed systems
Chubby is a distributed lock service; it does a lot of the hard parts of building distributed systems and provides its users with a familiar interface (writing files, taking a lock, file permissions). The paper describes it, focusing on the API rather than the implementation details.
Chukwa: A large-scale monitoring system
This paper…

Written by Omar Faroque