Paper that influenced Big data movement

Omar Faroque
Software System Design
5 min readSep 11, 2019

--

  1. MapReduce: Simplified Data Processing on Large Clusters
    This paper presents MapReduce, a programming model and its implementation for large-scale distributed clusters. The main idea is to have a general execution model for codes that need to process a large amount of data over hundreds of machines.
  2. The Google File System
    It presents Google File System, a scalable distributed file system for large distributed data-intensive applications, which provides fault tolerance while running on inexpensive commodity hardware, and it delivers high aggregate performance to a large number of clients.
  3. Bigtable: A Distributed Storage System for Structured Data
    This paper presents the simple data model provided by Bigtable, which gives clients dynamic control over data layout and format, and the design and implementation of Bigtable.
  4. Dynamo: Amazon’s Highly Available Key-value Store
    This paper presents the design and implementation of Dynamo, a highly available key-value storage system that some of Amazon’s core services use to provide an “always-on” experience.
  5. The Chubby lock service for loosely-coupled distributed systems
    Chubby is a distributed lock service; it does a lot of the hard parts of building distributed systems and provides its users with a familiar interface (writing files, taking a lock, file permissions). The paper describes it, focusing on the API rather than the implementation details.
  6. Chukwa: A large-scale monitoring system
    This paper…

--

--