Why learning history helps learning engineering

Published in

LLM Projects & Philosophy on How to Build Fast

11 min readMar 7, 2024

We coded coded and coded to learn to engineer. Yet, without knowing the past, we only get a screenshot of the presence and lack of vision for the future. After reading The Dream Machine, I’m going to use some examples to explain why the past development of computer science concepts is closely tied to the present software architecture. I selected several concepts that are closely related to the development of modern machines.

Where does even the idea of a computer come from? Imagine a mathematician with a pen and paper.

There are already some basic raw forms of computers, such as punch cards, but most of them are perceived as an engine with specific usage. The initial idea of the computer is from Turing’s imagination. He imagined a computer is like a mathematician with pen and paper. The hands, eyes, and pencils are replaced as a scanning device that can move forward on the tape, representing the read and write of the machine. And an internal search pad to record “the state” (The current intermediate steps of computation) of a machine.

Let’s replace the two-dimensional sheet of paper with an infinitely long tape divided into squares, like a roll of postage stamps. Each square will either be blank or have a symbol written on it. (The symbols could be anything — numbers, letters, colors, pictures, whatever — so long as there was just one per square.)
Next, said Turing, let’s replace the mathematician’s hand, eye, and pencil with a scanning device that can move backward and forward along the tape one square at a time, reading and writing symbols as it goes. (A modern analogy would be the read/write head of a tape recorder or a VCR.) Let’s also replace the mathematician’s “state of mind,” his minute-to-minute sense of what’s going on in the calculation, with a kind of internal scratch pad that records the current “state” of the machine. (Example: “I am currently in the process of carrying a 4 to the next column of addition.”) Each different state can also be labeled by a symbol.

How does a machine know what operation to do next? Inspiration of cellular automata.

Before computers were programmable, they were operator-supervised, meaning that machines could only perform based on the instruction of the operator. They do not know the state they are in, so they cannot make their own decision. For instance, the decision is based on some conditions “If variable X = 0, then do operation Y, else do operation Z.”

The start of understanding the states of machines is the famous automata. If you know cellular automata, you will probably say “Ah ha!” When we learn it in our CS class, we are given some initial rules, such as:

At any given time and in any given cell, the automata could be in only one of a finite number of states, which could be thought of as red, white, blue, green, and yellow, or 1, 2, 3, and 4, or living and dead, or whatever. At each tick of the clock, the automata would make a transition to a new state, which would be determined by its own current state and that of its neighbors.

If you have learned this in school, you will probably know that the teacher will ask you the formula of state transition, the equilibrium, to the evolution of states.

My simulation class note: different initial densities and radius (the range of neighbors that will influence the cell) will create different output

Things start to click all of a sudden.

How does a computer become a general-purpose tool? Von Neuman’s “First Draft”: we fetch execute and return.

To describe why computers are general purpose today, I will need to use an analogy to describe software and hardware. You can imagine software is music and hardware is the instrument. You can play all kinds of music on a piano, is like you can use a computer for anything from writing a novel to flying jets. On the other hand, the same music can be played on different instruments like tubas and pipe organs. It is like you can run Google Sheets on both Windows and Mac.

Before Turing and Von Neuman’s “First Draft,” we think that a computer is just a fancier adding machine. This is how Von Neuman imagined the computer,

Finally, said von Neumann, the central control unit would be the heart of the computer, the part that decided what to do next. (Today this is usually known as the central processing unit, or CPU.) Its decisions would in turn be governed by a program stored in the memory unit.
How was the central control unit supposed to go about executing those programs? Von Neumann had quite a bit of leeway in his answer, thanks to Alan Turing. As he was undoubtedly well aware, his abstract architecture was logically equivalent to a Turing machine, with the memory, input, and output units collectively corresponding to the tape, and the central arithmetic and central control units collectively corresponding to the read/write head.

The fundamental basis of how a machine operates is: fetch execute and return. It feels similar to the language we use in javascript, we fetch the data, execute some functions, and return the results. This is the underlying of how machines operate our codes.

He envisioned the central controller as going through an endless cycle: fetch the next chunk of instructions or data from the memory unit, execute the appropriate operation, and then send the results back for storage in memory. Fetch, execute, return. Fetch, execute, return.

Before this, we can either ask a machine to do one set of operations or use an operator to make modifications. Back at this time, it’s not good enough because it goes through everything step by step. It’s like trying to make a dinner salad the hard way: go out and buy the lettuce, bring it home, and chop it up. Go out and buy the carrots, bring them home, and chop them up. But at least, they can perform many different kinds of things based on the instruction in the memory address.

Where does RAM come from? To speed up the computation by not waiting for the correct address.

I always find it weird to call RAM (random access memory). Why does it need to be random? Why is RAM faster than hard disk?

Since now you understand that the machine’s operation process is fetch, execute, and return, they need to know the memory address the fetch the next instruction they need to do, but that’s slow.

Among many other things, the 1946 report pointed out how much more efficient a computer would be if it could get access to each memory address at “random”-that is, instantaneously, without having to wait until the correct address came around on a circulating tape or a mercury delay line. Naturally, such a storage scheme became known as Random-Access Memory, or RAM. (Of course, the report also concluded that the memory unit should store the data as charged spots on the face of a cathode-ray tube — a cutting-edge technology in 1946, but now so thoroughly obsolete that almost no one remembers it.)

Hence, the term “random access” means that the CPU can access any memory cell directly, without having to go through all the preceding cells. This contrasts with sequential access, where data must be accessed in a predetermined order.

Where does a function come from? Subroutine in Lisp, a reusable and modular function.

We already have some idea of subroutines in Von Newman’s first draft, but the real application of it is from Lisp, a programming language from the 1950s by John McCarthy.

The use of software building blocks was hardly a new concept even then, of course. Starting in the 1940s, long before Lisp, or Fortran, or any of the other languages appeared, programmers had learned that they could save themselves a lot of time and confusion by breaking up programs into subroutines-reusable, self-contained procedures that could each do one specific task. One subroutine might calculate the cube root of any given number, for instance, while another might sort any given list of names alphabetically. These procedures could in turn be broken up into still simpler subroutines, and so on, all the way down to the level of individual commands, if need be. Indeed, as programmers tackled tougher and tougher challenges in the 1950s — the SAGE project, for example-this kind of decomposition had become increasingly critical.* Even today, under the rubric of “structured programming.”

Before the idea of a subroutine, our code is just a big chunk of things together. Super hard to debug when it gets bigger. But in Lisp, the programmer can finally start with some basic functions and create their own. It makes so much sense why everyone is saying that writing modular code is good. Logically, it makes sense, but it’s indeed an evolution of thoughts.

Where does the idea of multitasking come from? Time-sharing, the computer switches tasks fast enough without humans noticing it.

Project Mac from the 1960s is the first machine that allows time sharing. Time-sharing is a computing technique that allows multiple users to share a single computer system simultaneously. Before time sharing, we are only using batch processing. In batch processing systems, users submit their jobs or programs to a computer operator, who then compiles and executes them one after another sequentially. Is there a better way to allocate computer resources?

So, McCarthy wondered, why not let the CPU skip from one user’s memory area to the next user’s in sequence, executing a few steps of each task as it went? If that cycle was repeated rapidly enough, the users would never notice the gaps (think of a kindergarten teacher holding simultaneous conversations with a dozen insistent five-year-olds). Each of them would perceive his or her program to be executing continuously. And more to the point, each would be able to create and modify and execute programs interactively, as if he or she had sole control of the computer. Since the users would be sharing the computer’s processing time as well as its storage space, McCarthy took to calling his scheme time-sharing.

One of the most fascinating things about time sharing is not only sharing computer resources, it also allows information sharing because they are accessing the same system. Hence, the system became the repository of the knowledge of the community, and people started to discuss whether should we have passwords or we should just share all the resources together.

Where do TCP/IP come from? The invention of a network.

ARPA community is trying to build a network so different computers can talk to each other. They are facing some challenges. The two main challenges are:

How to send the information from one computer to another?
How can computers know where to send the information?

To send the information, first, they need to break the information chunks into smaller chunks, which is called a packet.

A second conclusion was a bit more esoteric, said Roberts, but less of a surprise: digital messages could not be sent through the network as a continuous stream of bits. Instead, they would have to be broken into segments, with some fixed number of bits in each one (think of a long letter written on a series of postcards).
The farther a message traveled, the greater the chances that one or more bits would be garbled by static and distortion on the line. And in the digital world, one erroneous bit might easily spell disaster. Thus the digital postcards, or “packets,” in modern parlance.

Second, they need the computer to read the digital address (destination address, return address…) on the packet, so they know where to send it, which is the routing.

the clerks in each telegraph office had to write down messages as they arrived, then key them in again to send them farther down the line.) The idea was to keep the complex, highway-map structure of the network, with lines running every which way and an ARPA site sitting at each intersection. But the individual sites would share routing responsibilities equally. That is, the computer at each site would first read the digital address on each packet as it came in. (The packets would actually carry the digital equivalent of an entire bill of lading, with destination address, return address, error-checking codes, message identifiers, and so forth.)

That’s good, but what if there are many packets being sent? It will become a congestion! So they come up with the idea of having small computers as designated points for entry and exit of packets like limited access highways.

Clark remembers saying. “So my idea was simply to define the network to be something self-contained.” That is, make the ARPA network into the digital equivalent of a limited-access highway, with an “interchange” located just outside each town. Each of these digital interchanges would actually be a small computer, of course, separate from the main computer. But like its asphalt counterpart, the digital interchange would handle all the routing chores. It would provide an on-ramp to the network for new packets coming out of the main computer; an off-ramp for incoming packets addressed to the main computer; and traffic directions for packets passing through on their way to other computers.

So now, they need to have small computers to handle the routings. They are called independent routing computers.

Now, said Clark, the beauty of this scheme was that it would simplify life for everybody. ARPA could take responsibility for designing and implementing the network proper-meaning the information highways and the digital interchanges-without having to worry that some contractor somewhere would mess up his site’s programming and thereby bollix up the whole system. And the contractors, for their part, could focus on one comparatively simple task-establishing a link from their central computer to the routing computer — without having to worry about all the ins and outs of all the other computers on the network. So, said Clark, that was the idea: small, independent routing computers.

We just learned the history of network protocols. That’s why we talk about IP (internet protocol) packets and TCP (Transmission Control Protocol). The IP address is like the phone number assigned to your smartphone. TCP is all the technology that makes the phone ring, and that enables you to talk to someone on another phone.

And then those scientists spent lots of time convincing, fighting, and discussing how to set a protocol that everyone uses nowadays. That’s why we are using so many protocols e.g. Hypertext Transfer Protocol (HTTP), following the same rules.

Closing notes

When reading the history of computer science, I found it really funny that our ancestors were mind-blown by things that I find so stupid today, like copy and paste.

This quote from Steve Jobs resonates with me after reading the book.

“Life can be so much broader, once you discover one simple fact, and that is that everything around you that you call ‘life’ was made up by people who were no smarter than you.”

All these complicated concepts that we have struggled with in school are evolutions of thoughts for decades. But it starts pretty simple, such as Turning’s imagination of a computer as a mathematician with a pencil and paper. Learning history demystifies a lot of fear we have when we start learning computer science, gives us an understanding of why we need to do things like this today, and a glimpse of how our ancestors imagined the future of computers. In contrast to just learning how to code, I believe learning the why of all inventions will bring us to the next level of learning engineering.