How I learn Software Engineering

Hussein Nasser
6 min readDec 30, 2022

--

Software engineering is a big and constantly evolving domain with new innovations. While this is true, most -if not all- technologies in software engineering tend to eventually coalesce into few first principals. The engineer is better served learning the fundamentals of software engineering instead of frameworks, languages or platforms that keep changing.

Even learning the fundamentals takes time and effort. I’m often asked how I do I learn software engineering and initially I couldn’t answer this objectively. But recently I started watching myself learn and I think I have a better idea of the process.

The truth is I never force myself to learn anything. If you have been following my work, you will notice that I rarely talk about new shiny stuff, if anything I often talk about dated software engineering topics that no one discuss anymore. Not because the new stuff isn’t good, it is because I feel there is more to it that I don’t get and I rather pursue that instead.

In this post I’ll walk you through one of my recent learning experiences while it is fresh in my mind.

Q&Q — Questions and Questions

For me learning starts with questions. I ask genuine questions that put me in a path of discovery and in this path I stumble upon obstacles that I clear one after the other. Each question creates another question until I reach a fundamental knowledge which reconciles everything. It is like a recursive function hitting its’ base condition.

What causes the questions to raise varies, it might be a student of mine, or a random bug at work. Not to get philosophical, but I found that admitting that I don’t know something helped me tremendously in the journey of learning. When a student asks a question that the teacher doesn’t know it often creates an insecurity in the teacher. I have been working towards getting over that and genuinely explore the question for myself instead of showing the student how much I know.

And there is a lot that I don’t know.

The post gets technical in the next sections. Let us get into the weeds.

What I don’t know

When a backend application listens on an address and a port, it can start receiving requests from clients. Requests (whether HTTP or any other protocol) comes in packets. This is how the application receive packets from the network.

I’m omitting the other queues created for connection creation/accepting for simplicity here. This assumes a connection is already created.

  • The kernel creates a receive buffer for the application in the kernel memory.
  • The kernel puts incoming network packets into the receive queue.
  • The application reads packets from the receive buffer and copies it into its own process memory.
  • The application processes the data (decrypt, parse protocol, raise events)

What I find myself missing is:

  1. How exactly does the NIC (Network Interface Controller) transfers data to the kernel’s memory?
  2. Why does the data have to go the kernel first and not the process memory directly?

I attempt to answer the two questions in the coming paragraphs.

Q1 — How NIC transfers packets to the Kernel

To answer the first question, I had to learn how the CPU works and in the process I discovered many things, what is relevant here is the concept of the interrupt. To read data from any peripheral device (mouse, keyboard, hard disk, or NIC) the CPU must be interrupted and told where to read or write to.

So I applied this knowledge to my question and this is what I came up with:

When the NIC receives the electric, optic or radio signal (be it Ethernet, fiber or WIFI/5G) and converts it into binary data in its local buffer, it sends an interrupt to the CPU asking it to stop what it is doing and transfer the data to main memory.

The CPU reads the data from NIC and puts it into its cache lines and then flushes the cache lines to memory. But where in memory exactly? this is where the NIC driver — a software running in kernel space — instructs the CPU of the address location. The CPU finally flushes the data to provided memory address. The cycle keeps repeating until no data is in the NIC. From there the kernel takes over.

There are hours spent here attempting to answer even more specific questions, how does the CPU flush the cache lines to memory? What if other cores are involved reading from the same memory location? But I’m skipping those for now.

It all made sense to me except there is one thing that didn’t, this sounds very very chatty. If I know anything in software engineering is we try avoid chattiness.

While interrupts work for small I/Os such as mouse move or keyboard presses, it is very time consuming to the CPU for large data transfers such as networking or disk reads or writes. So I thought this can’t be how things are done something is missing. Putting large data into the tiny CPU registers and cache and flushing it will take ages. Some more searches and I discovered the DMA.

It turns out this is exactly why the DMA (or Direct Memory access) was invented. The NIC is given direct access to the main memory for the device to read and write while the CPU is freed. The CPU initiates the transfer by setting the destination memory address on the DMA where the data should go providing which device it should read from which is the NIC. This is all based on instructions from the device driver. After that the NIC starts transferring the data directly to the memory. The kernel/driver can then process the data normally once in memory.

To my second question.

Q2 — Why does the data goes to the kernel first?

But really, why not move the packet data directly from the NIC to the process memory instead? The cost of copying the data from NIC to kernel then from kernel to the application must really add up. What I found is device drivers runs in the kernel and the kernel is the one talking to the NIC so that is why the packet data lives in the kernel. In addition, the kernel doesn’t know (yet) where to put the data in the process memory and moreover, it doesn’t know if process is ready to read it. I don’t see why a rearchitecture of the API wouldn’t allow for this to be honest. I think this could be done with io_uring but I haven’t explored this yet.

Even with DMA, it takes a toll on the DMA controller to do all this transferring for every single packet. so I found that Intel came up with this idea called DMA coalescing where the NIC buffers packets locally and then delay the initiation of DMA transfer for received packets. This minimizes the number of transfers saving energy at a cost of latency.

During this quest I also learned virtual memory, translation look aside buffers, NUMA architectures, context switching and much more.

Summary

What I just did there is called collateral knowledge. Having a goal to learn something but in the process discovering other things.

You might say, really? You didn’t know what an interrupt is or even DMA? Truth of the matter, no. While I did hear about these concepts before, it was never in a context that actually made sense or raises interest to me.

I don’t know how to describe it but it feels different to find things for yourself instead of someone handing it over to you in a plate.

If you enjoy my work, consider checking out my other articles on medium or check out my courses.

--

--

Hussein Nasser
Hussein Nasser

Written by Hussein Nasser

Software Engineer passionate about Backend Engineering, Get my backend course https://backend.win

Responses (6)