Computer System Architecture Part 1 — Fundamentals of Quantitative Design and Analysis

19 min readOct 27, 2023

Computer system architecture is like a complex game of Jenga — one wrong move, and the whole thing comes crashing down. And just when you think you’ve got it all figured out, someone adds a new block, and everything gets even more precarious!

Introduction
Classes of Computers
Defining Computer Architecture
Trends in Technology
Trends in Power and Energy in Integrated Circuits
Trends in Cost
Understanding Dependability in Computing
Measuring, Reporting, and Summarizing Performance
Quantitative Principles of Computer Design
Appendix

Introduction

Over the past seventy years, the world of computer technology has experienced rapid progress, making computing power more accessible. Today’s affordable smartphones possess computing capabilities that surpass those of multimillion-dollar computers from the early 1990s. This evolution has transformed various aspects of modern life.

The late 1970s saw a significant milestone with the advent of the microprocessor, which led to a 35% annual surge in computing capabilities. This innovation propelled the integration of microprocessors into various devices, reshaping the technological landscape and integrating computing power into daily life.

Advancements in computing technology led to the development of commercially viable computer designs. Reduced reliance on assembly language programming and the introduction of standardized, vendor-agnostic operating systems such as UNIX and Linux facilitated the emergence of Reduced Instruction Set Computing (RISC) architectures in the 1980s. These systems focused on enhancing computational efficiency and speed through various techniques.

Moore’s Law, The Rise of Parallelism and Future of Microprocessors

Moore’s Law, originally predicting a doubling of transistors on a chip every year (later revised to every two years), held true for almost five decades before its applicability began to wane. This law has played a significant role in the development of computer technology.

Future microprocessors are expected to feature specialized cores optimized for specific computational tasks, surpassing the capabilities of general-purpose cores. This shift highlights a focus on tailored processing methods and improved performance optimization, signifying a new era in microprocessor development.

Classes of Computers

In the realm of computing, distinct categories have emerged, each serving unique purposes and leveraging diverse technological frameworks to accomplish specific tasks.

Personal Mobile Devices (PMD) — Personal Mobile Devices (PMDs) encompass a spectrum of wireless gadgets, including cell phones and tablets, prioritizing cost and energy efficiency. In PMDs, the preference for flash memory over magnetic disks stems from energy and size constraints. Although PMDs share similarities with desktop computers, their capacity to run externally developed software sets them apart. Real-time constraints necessitate responsive and predictable performance, underscoring the significance of memory optimization and efficient code size.
Desktop Computing — The domain of desktop computing, ranging from cost-effective netbooks to high-performance workstations, has predominantly been led by battery-operated laptops since 2008. Despite a decline in sales, this market focuses on optimizing price-performance for users, emphasizing compute and graphics capabilities. While well-characterized in terms of applications and benchmarking, the surge in web-centric applications poses fresh challenges for performance evaluation.
Servers — The proliferation of servers in the 1980s, spurred by the rise of desktop computing, served as the backbone for large-scale enterprise operations, supplanting conventional mainframes. Servers prioritize critical aspects like availability and scalability, vital for handling increasing demands and functional requirements. Efficiency is gauged based on the ability to manage multiple requests within a specific time frame.
Clusters — The surge of Software as a Service (SaaS) has propelled the development of clusters, interconnected networks of desktop computers or servers operating as a cohesive unit. Clusters, particularly Warehouse-Scale Computers (WSCs), underscore the importance of price-performance efficiency and power management, relying on redundant components and robust software layers to ensure high availability. Unlike traditional servers, WSCs leverage local area networks for scalability.
Embedded Computers — Embedded computers permeate various everyday appliances, forming the backbone of the Internet of Things (IoT), from microwaves and printers to automobiles and smart devices. This sector prioritizes cost-effective solutions over raw performance, ranging from basic 8-bit to advanced 64-bit processors.

Parallelism in Computer Design

Parallelism, a pivotal facet of contemporary computer design, enables the simultaneous execution of multiple tasks, driven by considerations such as energy efficiency and cost. Key forms of parallelism include data-level parallelism (DLP) and task-level parallelism (TLP), harnessed through diverse hardware mechanisms to optimize computational efficiency and throughput.

Computers can be classified into four main categories based on the movement of instructions and data, including Single Instruction Stream, Single Data Stream (SISD), Single Instruction Stream, Multiple Data Streams (SIMD), Multiple Instruction Streams, Single Data Stream (MISD), and Multiple Instruction Streams, Multiple Data Streams (MIMD). These distinctions provide crucial insights into the diverse processing methodologies employed by different computing systems.

Understanding Computer Architecture

Computer architecture constitutes the intricate process of crafting efficient computing systems that strike a delicate balance between critical factors such as performance, energy efficiency, cost, power consumption, and availability. At its core, Instruction Set Architecture (ISA) serves as the boundary between the software and hardware realms within a computer. Contemporary ISAs predominantly adhere to the framework of general-purpose register architectures, leveraging either registers or memory locations for executing operations.

Components of Computer Implementation

The construction of a computer involves the orchestration of two fundamental elements: organization and hardware. Organization encompasses the overarching facets of a computer’s blueprint, encompassing the intricacies of its memory system, interconnections, and the architecture of the central processing unit (CPU) responsible for executing arithmetic, logical operations, branching, and data transfer. Occasionally, the term “microarchitecture” is used interchangeably with organization. On the other hand, hardware delves into the granular details of a computer’s design, encapsulating the precise logic framework and the packaging technology employed in its construction.

Understanding “Core” in Modern Computers

In contemporary computing parlance, the term “core” often refers to a processor, reflecting the evolution wherein multiple processors are integrated within a single microprocessor. This transformative shift has contributed to the accelerated processing capabilities and enhanced performance of modern computing systems, enabling the seamless execution of complex tasks and resource-intensive applications.

In essence, computer architecture represents a dynamic realm where software and hardware converge, orchestrating the intricate symphony of computational efficiency. It encompasses the meticulous organization and design of various computer components, constantly adapting to the ever-evolving landscape of technological advancements and reshaping the trajectory of computer construction and utilization.

Trends in Technology

In the dynamic realm of computing, a comprehensive grasp of the swiftly evolving technological landscape is paramount. Five key technologies, characterized by their rapid transformations, assume pivotal roles in the contemporary computer ecosystem:

Integrated Circuit Logic Technology: This pivotal domain revolves around the fabrication of minute electronic components that collaboratively undertake the task of processing information, akin to the fundamental building blocks underpinning the functionality of a computer.
Semiconductor DRAM (Dynamic Random-Access Memory): Conceptualize this as the computer’s transient memory, facilitating the storage and utilization of data during the operational phase, ensuring swift access and retrieval of information as required.
Semiconductor Flash: Analogous to the computer’s enduring memory, this technology serves as the repository for critical data, persistently retaining essential information even during periods of power-off states.
Magnetic Disk Technology: Envision this as the repository akin to your computer’s hard drive or a large-scale data storage device, serving as the depository for various files and programs, ensuring accessibility and retention of crucial data resources.
Network Technology: This critical facet embodies the mechanism facilitating inter-computer communication over the internet, facilitating activities such as web browsing, email correspondence, and seamless video streaming, fostering interconnectedness and data exchange across diverse digital platforms.

In discussions concerning these technologies, two pivotal terms assume prominence:

Bandwidth or Throughput: This metric embodies the pace at which tasks are executed, exemplified by the speed of data transfer during activities like file copying, quantified in units such as megabytes per second, offering insights into the velocity of data transmission and processing.
Latency or Response Time: This parameter encapsulates the duration between the initiation and completion of a task, exemplified by the time lapse experienced when accessing a file, typically measured in milliseconds, delineating the interval required for the successful culmination of a task.

Understanding Power and Energy in Computer Chips

Navigating the complexities of power and energy management stands as one of the foremost challenges confronting contemporary computer designers. Computers rely on electrical energy to function, with intricate networks of minute pins and layers orchestrating the inflow and circulation of electricity within the chip. The operational activity of a computer generates heat, necessitating robust heat management strategies to avert potential damage and ensure optimal performance.

Thermal Design Power (TDP)

TDP serves as a pivotal metric for quantifying the power consumption of a computer chip over a specific duration. Modern processors integrate sophisticated mechanisms to regulate heat. In the event of escalating temperatures, the chip might adjust its clock rate or even initiate shutdown protocols as a precautionary measure to maintain operational integrity and prevent heat-induced damage.

Energy vs. Power

Energy represents a more comprehensive measure of power, accounting for both the volume of work executed and the duration required to accomplish the task. By considering the average power consumption and the time frame necessary to complete a specific operation, the concept of energy offers a nuanced understanding of the resources expended during computational tasks. This holistic perspective serves as a valuable tool for assessing the overall efficiency and efficacy of computer chip operations, enabling designers to optimize performance while mitigating energy-related challenges.

CMOS (Complementary metal–oxide–semiconductor) chips are common computer chips. The main source of energy consumption in CMOS chips happens when tiny electronic switches (transistors) change their state. The energy used during a switch is related to the load that the transistor is handling, and the voltage applied.

This equation is the energy of pulse of the logic transition of 0 → 1 → 0 or 1 → 0 → 1.

The energy of a single transition (0 → 1 or 1 → 0)

The power required per transistor is just the product of the energy of a transition multiplied by the frequency of transitions.

Improving Energy Efficiency

Computer designers use various techniques to make computers more energy-efficient:

Sometimes, the best way to save power is to do nothing when the computer is idle.
Dynamic voltage-frequency scaling (DVFS) adjusts the chip’s voltage and speed to match the workload.
Designing for typical tasks helps save energy.
Overclocking makes the chip work faster but uses more power.

Static Power is Important

Recognizing the impact of static power assumes critical importance, as even in the inactive state, transistors can facilitate a flow of electric current, contributing to power leakage and necessitating meticulous management strategies. The relationship between static power and current is defined by the formula:

This fundamental understanding underscores the need for stringent measures to regulate static power and minimize power leakage, thereby optimizing energy efficiency and bolstering the overall sustainability of computing systems.

Trends in Costs

Within the sphere of computer design, cost considerations assume paramount significance. While certain computing systems, such as supercomputers, may prioritize performance over costs, in many other instances, particularly those marked by cost sensitivity, financial considerations wield significant influence, dictating the feasibility and accessibility of computer technologies.

Driving Down Costs

The principle of the learning curve stands as a cornerstone for cost reduction strategies. Over time, the progression of manufacturing processes typically leads to a decline in production costs. Assessing the learning curve involves evaluating the yield, denoting the percentage of devices that successfully pass the testing phase. A design boasting a higher yield ratio experiences proportionally reduced costs, exemplifying the inverse relationship between yield and manufacturing expenses.

The scale of production plays a pivotal role as well in cost optimization. Increased production volume impacts costs in multifaceted ways:

Accelerated learning curve: The pace of the learning curve is closely intertwined with the aggregate number of systems or chips produced, facilitating enhanced manufacturing efficiencies and cost reductions over time.
Enhanced purchasing and manufacturing efficiency: Amplified production volumes contribute to economies of scale, fostering streamlined procurement processes and bolstering manufacturing efficiencies, ultimately culminating in cost reductions across various operational facets.

Cost of Integrated Circuits

In contemporary computer devices, particularly in the context of personal mobile devices (PMDs) leveraging system-on-chip (SOC) technology, a significant portion of the overall cost is attributed to integrated circuits. The computation for the cost of an integrated circuit (IC) is characterized by the following formula:

This formula encapsulates the comprehensive components contributing to the total cost of an integrated circuit, encompassing the expenses associated with the die, testing of the die, packaging, and the final testing phase. The final test yield denotes the percentage of successfully validated integrated circuits following the conclusive testing stage. By factoring in each element of the production process and considering the impact of the final test yield, the formula provides a holistic perspective on the cost dynamics associated with integrated circuit manufacturing, enabling comprehensive cost analysis and informed decision-making in the realm of computer device production and design.

To estimate the cost of a die (a single piece of an integrated circuit), you need to know how many dies fit on a wafer and predict the percentage that will work. The formula for die cost is:

The number of dies per wafer can be estimated as:

Die Yield Calculation

Die yield is a crucial factor, as it tells us the fraction of good dies on a wafer. The formula for die yield is:

In 2017, typical defect values were 0.08–0.10 defects per square inch for a 28-nm node and 0.10–0.30 for the newer 16 nm node. N, the process-complexity factor, varied from 7.5–9.5 for 28 nm processes and 10–14 for 16 nm processes.

Note — Wafer yield accounts for wafers that are completely bad and so need not be tested. For simplicity, we’ll just assume the wafer yield is 100%.

Operating Costs (CAPEX and OPEX):

In the context of warehouse-scale computers (WSCs) housing extensive server networks, the expenses incurred extend beyond the initial purchase cost, encompassing both capital expenses (CAPEX) and operational expenses (OPEX). CAPEX represents the upfront costs associated with acquiring and establishing the necessary infrastructure, equipment, and resources essential for the functioning of the computing system, whereas OPEX encompasses the ongoing expenses incurred during the operation and maintenance phase, covering diverse aspects such as utility costs, maintenance expenditures, and staffing requirements.

Understanding Dependability in Computing:

Dependability in the context of computing underscores the reliability and robustness of a computer system, encompassing its capacity to consistently deliver accurate and error-free performance.

Traditionally, computer components, particularly integrated circuits (ICs), were renowned for their exceptional reliability, exhibiting minimal failure rates. However, the advent of newer, smaller chips, such as those employing 16-nanometer technology, has ushered in a shift in this paradigm. The prevalence of both temporary and permanent faults has become more commonplace, compelling computer architects to devise resilient systems capable of mitigating and managing these emerging challenges effectively.

Measuring Dependability

A pivotal aspect of ensuring dependability involves discerning the operational state of a computer system. This became increasingly critical with the proliferation of internet-based services, prompting companies providing internet or power services to institute Service Level Agreements (SLAs) as a means to guarantee reliable service delivery. The status of a system concerning an SLA can be characterized in one of two states:

“Service Accomplishment” denotes the seamless functioning of the service in accordance with the stipulated SLA parameters, affirming the system’s dependability and adherence to performance benchmarks.
“Service Interruption” signifies the failure of the service to meet the predefined SLA standards, underscoring a breach in the expected level of dependability and necessitating corrective measures to restore optimal operational efficiency and reliability.

Dependability Measures

Dependability, serving as a pivotal metric for assessing the reliability and robustness of computer systems, is characterized by key measures, each offering insights into distinct facets of system performance and operational efficacy:

Module Reliability: Module reliability centers around evaluating the duration for which a system can function without encountering failures, often quantified through the Mean Time to Failure (MTTF). This metric denotes the average time until a malfunction occurs, with its reciprocal, the failure rate, typically expressed as failures per billion hours (FIT), offering a comprehensive perspective on the system’s endurance and operational stability.
Service Interruption: Service interruption, as measured by the Mean Time to Repair (MTTR), offers insights into the timeline required for resolving system issues and restoring operational functionality. The Mean Time Between Failures (MTBF), calculated as the sum of MTTF and MTTR, serves as an essential metric for gauging the overall system resilience and the efficiency of remedial measures in averting service disruptions.

Module Availability

This measures how often a system is in the “Service Accomplishment” state. For systems with repair (fixing problems), module availability is calculated as:

Failure Rate for the System

If you want to know how often the whole system might fail, you sum up the failure rates of all the parts in it. It’s calculated as:

The MTTF for the system is just the inverse of the failure rate:

Components Reliability for the ’n’ Components can be calculated as:

In essence, dependability is about how reliable a computer system is. We measure it using terms like MTTF, MTTR, and Module Availability. It’s essential for ensuring that systems work as expected, especially in critical services like the internet.

Measuring, Reporting, and Summarizing Performance

When we compare the performance of different computers, we want a way to say which one is faster or better. We often want to compare two computers, like X and Y, and say if one is faster. “X is faster than Y” means that X takes less time to complete a task than Y. When we say, “X is n times as fast as Y,” it means:

“The throughput of X is 1.3 times as fast as Y” means X can do 1.3 times more tasks at the same time as Y.

“Wall-clock time” or “response time” is the time it takes to complete a task, including everything like storage, memory, and more. “CPU time” is just the time the processor is actively working, not waiting for other things like input/output.

Benchmark programs are used to test and compare computer performance. Companies use these to see how their computers stack up. Benchmark suites are collections of benchmark applications used to measure performance with various tasks. They reduce the risk of relying too much on a single benchmark. SPEC is a well-known organization for creating standardized benchmark application suites. Desktop benchmarks are divided into processor-intensive and graphics-intensive benchmarks.

When reporting performance measurements, it’s essential to provide all the details so that others can replicate the results. If we have computers A and B with SPECRatios, we can compare their performance using the geometric mean. It’s calculated like this:

So, we use different methods and ratios to measure and compare computer performance, ensuring that our evaluations are as accurate and reproducible as possible.

Quantitative Principles of Computer Design

To make computers faster, we use parallelism, which means doing multiple things at the same time. At the big system level, we can have multiple processors and storage devices working together. This makes tasks like handling requests faster. It’s like having more workers to do the job. This ability to add more processors and storage is called “scalability,” and it’s super important for servers. Also, when we spread data across many storage devices and read or write data at the same time, we call it “data-level parallelism.”

Inside each processor, we also want to do things in parallel. This helps us achieve high performance. One way to do this is using “pipelining.” It’s like an assembly line for instructions. Each part of an instruction is worked on by a different part of the processor at the same time. This speeds up how quickly a sequence of instructions is completed.

Principle of Locality

Programs like to use data and instructions they’ve used recently. Think of it like you’re more likely to pick up a book you just read. A good rule of thumb is that 90% of a program’s time is spent in only 10% of the code. There are two types of localities:

Temporal Locality: If you used something recently, you’ll probably use it again soon.
Spatial Locality: If things are close together in memory, you’ll probably use them one after the other.

Focusing on the Common Case

When making choices in computer design, it’s smart to favor what happens most often. For example, if you’re deciding how to spend resources or improve performance, focus on what happens frequently. It’s like making sure the most popular dish in a restaurant is the best because that’s what most people order.

This principle also applies to saving energy, not just performance. Concentrate on what happens often because that’s where the biggest impact is. These principles help make computers faster, more efficient, and cost-effective.

Amdahl’s Law

The performance gain that can be obtained by improving some portion of a computer can be calculated using Amdahl’s Law. Amdahl’s Law states that the performance improvement to be gained from using some faster mode of execution is limited by the fraction of the time the faster mode can be used. Amdahl’s Law defines the speedup that can be gained by using a particular feature. It tells us how much quicker a task will be when we use a faster computer or an improvement.

Two important things affect speedup:

Fraction(enhanced): This is the part of the task that can be made faster. For example, if a program takes 100 seconds, and we can make 40 seconds of it faster, then the fraction is 40/100.
Speedup(enhanced): This is how much faster the enhanced part is. If the original part takes 40 seconds and the enhanced part takes 4 seconds, the speedup is 40/4 or 10.

When we make part of a task faster, the whole task’s execution time changes. Here’s how we calculate it:

To know how much faster the entire task becomes, we use this formula:

Amdahl’s Law reminds us that the more we improve just a part of the task, the less extra speed we gain. It’s like the law of diminishing returns: You get less benefit from each improvement as you add more. If an enhancement can only be used for a part of the task, we can’t speed up the whole task by more than the reciprocal of 1 minus that part.

The Processor Performance Equation

When we talk about how fast a computer’s processor is, we use some key terms and equations to understand its performance. All computers have a clock that ticks at a constant rate. We call these ticks “clock periods” or “cycles.” Think of it like the heartbeat of the computer. Designers measure the duration of a single clock period (e.g., 1 ns) or its rate (e.g., 1 GHz, meaning one billion cycles per second).

CPU Time is the time it takes for the Central Processing Unit (CPU) to execute a program. We can calculate CPU time in two ways:

CPU time = CPU Clock Cycles for a Program * Clock Cycle Time
CPU time = CPU Clock Cycles for a Program / Clock Rate

We can also count the number of instructions a program executes. This is known as the Instruction Count (IC). If we know both the number of clock cycles and the instruction count, we can calculate the average number of clock cycles needed for each instruction (CPI).

CPI (Cycles Per Instruction) tells us how many cycles, on average, it takes to execute one instruction. Some designers prefer IPC (Instructions Per Clock), which is the inverse of CPI. You can calculate CPI as:

CPI = CPU Clock Cycles for a Program / Instruction Count

We can use CPI in the formula for CPU time, which depends on three factors: clock cycle (or rate), CPI, and instruction count. Here’s the formula:

CPU time = Instruction Counts * CPI * Clock Cycle Time

All three characteristics (clock cycle, CPI, and instruction count) affect how fast a processor is. If any of them improves by, say, 10%, the CPU time also improves by 10%. So, a faster clock, fewer cycles per instruction, or fewer instructions can make the CPU faster. Sometimes, it’s useful to calculate the total clock cycles for a program. You can do this using a formula that sums up the products of the instruction count and CPI for each instruction in the program. It looks like this:

Where “i” represents each instruction in the program.

The overall CPI is calculated by dividing the total CPU cycles by the instruction count:

In a nutshell, the performance of a processor depends on how quickly it can complete instructions, and this involves factors like the clock cycle, cycles per instruction, and the number of instructions executed. These factors help us understand and improve a computer’s performance.

Appendix

Dennard Scaling

Dennard scaling, named after Robert H. Dennard, suggests that as transistors shrink in size, their power density remains constant, enabling smaller transistors to operate faster without significantly increasing power consumption. This principle facilitated the advancement of microprocessor technology for decades but has posed challenges as transistors approach atomic scales, leading to concerns about power consumption and heat dissipation in modern designs.

What is a Die, Wafer, Core and a Chip and how are they related

Die: A “die” is a small, individual silicon wafer segment that contains various electronic components like transistors, capacitors, and resistors. It is the fundamental building block of integrated circuits (ICs) and microchips.
Wafer: A “wafer” is a thin, round, and usually large piece of silicon. It serves as the substrate on which multiple dies are manufactured. Wafers are cut from a single crystal of silicon and act as the canvas where semiconductor devices are etched and interconnected.
Core: A “core” is a processing unit within a microprocessor or CPU. Modern CPUs often contain multiple cores, each capable of executing tasks independently. Cores are like separate computing engines on a single chip, enabling parallel processing and better performance.
Chip: A “chip,” in the context of computing, typically refers to a microchip or an integrated circuit. It’s a small, flat piece of silicon on which various electronic components, including multiple cores and other circuitry, are fabricated. The chip is the central processing unit of a computer or electronic device, responsible for executing instructions and performing computations.

In summary, you start with a silicon wafer, which is a large round piece of silicon. On this wafer, many identical dies are created, and each die contains electronic components. These dies are often the cores, which are like individual processors. These cores, along with other circuitry, are combined on a single chip, which is the heart of a computer or electronic device. So, the relationship is hierarchical: wafer > die > core > chip.

More Useful Measure than MTTF

When it comes to measuring the reliability of computer components, like hard drives, we often use terms like MTTF (Mean Time To Failure) and AFR (Annual Failure Rate).

MTTF — Mean Time to Failure: MTTF is a prediction of how long, on average, a component like a hard drive will last before it fails. For example, if a hard drive has an MTTF of 1,000,000 hours, it means it’s expected to run for about 1,000,000 hours before failing.

AFR — Annual Failure Rate: AFR is a more practical measure. It tells us the percentage of components that are likely to fail in a year. This is a more useful metric because it helps us understand how many components might need replacement over time.

AFR = (Number of components * Hours of operation per year) / MTTF

Example: Let’s say you have 1,000 hard drives, and each one has an MTTF of 1,000,000 hours. If these drives run 24 hours a day, the AFR can be calculated as follows:

AFR = (1,000 drives * 8,760 hours/year) / 1,000,000 hours = 0.09 or 9%

This means that about 9% of the hard drives are expected to fail in a year. Over a 5-year period, you’d expect around 4.4% of them to fail.

In summary, while MTTF gives us an idea of how long a component can last on average, AFR provides a more practical understanding of how many components may fail in a given time frame. It helps in planning for replacements and maintenance, making it a valuable metric in computer reliability.