Systems at scale

Large scale programs generate peta bytes of data. Their servers receive millions of requests per second. A small part of the system fails approximately every 5 minutes. A larger part fails approximately every month. They are served out of a fleet of hundreds of hosts. Most of their code is 4–5 years old on average. Correcting a bug that is one month old could mean correcting hundreds of millions of records. Correcting a bug that is more than 1 year old is not worthwhile as the cost to correct the bug exceeds the cost of living with the bug or working around it. They have hundreds of thousands of (usually) poorly written lines of code and hundreds of libraries. Developing such a system is interesting and the programmer must attempt to address multiple concerns. I will explore some of the concerns in this blog post.

Broadly, the programmer should be concerned about the following (In the order of priority) when building large scale systems.

  1. Value
  2. Simplicity
  3. Creation of the structural model
  4. Latency
  5. Throughput
  6. Concurrency
  7. Fault tolerance
  8. Growth in request rate
  9. Growth of storage
  10. Resource constraints
  11. Operational visibility
  12. Patchability

In Subsequent posts, I would try to define each of these terms and discuss techniques to address them.