fork() — what is it?

Sadman Amin
5 min readNov 17, 2021

--

Whenever we talk about computers and operating systems, one of the coolest features we always refer to is concurrency. Concurrency is the OS’s ability to execute multiple instructions simultaneously. Fork() enables this concurrency. As I was studying this very feature named fork(), I became confused and tried to find some easier explanations of what it does and how it does it!

fork() is how you create new processes in Unix. When you call fork, you create a copy of your own process with its own address space. Think of it as a spider-man multiverse! By doing this, multiple tasks can run independently of one another as if they each had the full memory of the machine.

To keep things simple, let’s jump to an example.

Run this python code that contains a fork call and a simple print statement. How many times was the output printed?

Hello from my multiverse 
Hello from my multiverse

Two times! Why? When we called os.fork(), it creates a child process. fork() duplicates the current process, creating another independent process. What does it mean? After calling fork in line 3, a new process is created. We call it a child process. Previously when we ran our program, one process was executing the code line by line. Now, there are two processes — parent and child. Both will run from line 4 to the end of the file in their process independently. When child executes line 4, it prints the line. Again when parent process executes line 4, it prints the same line. Thus, we got two outputs!

Now the question is, if two multiverse or process is created, do they share the same resource or data? Lets check the following piece of code.

A variable num is initiated. Since there are two outputs, lets check whether the value changes after incrementing it in one process.

Hello from my multiverse and our num is 1 
Hello from my multiverse and our num is 1

The value is same even though we incremented it! Why? As mentioned previously, each process runs independently as if there’s no other process running. So child doesn’t know about the value of parent and vice-versa. Well it should be the case in a multiverse, right?

Let us change the example once again.

The multiverse is getting complicated now! We now try to track the sequence of execution using this piece of code. There are two forks and they call the fibo function to find the nth fibonacci number which is our famous piece of code for any computer science guy!

After executing, we get the following output.

It starts here 
Before Forking
----
308697
Calling from P 308698 and Q 308699
before fibo from pid 308697 and P 308698 and Q 308699
after fibo from pid 308697 and P 308698 and Q 308699
308699
308698
Calling from P 308698 and Q 0
before fibo from pid 308699 and P 308698 and Q 0
after fibo from pid 308699 and P 308698 and Q 0
Calling from P 0 and Q 308700
before fibo from pid 308698 and P 0 and Q 308700
after fibo from pid 308698 and P 0 and Q 308700
308700
Calling from P 0 and Q 0
before fibo from pid 308700 and P 0 and Q 0
after fibo from pid 308700 and P 0 and Q 0
got result from pid 308697 and P 308698 and Q 308699
Printing from P 308698 and Q 308699. 35th Fibonacci number is 5702887
----
got result from pid 308699 and P 308698 and Q 0
Printing from P 308698 and Q 0. 35th Fibonacci number is 5702887
----
got result from pid 308700 and P 0 and Q 0
Printing from P 0 and Q 0. 35th Fibonacci number is 5702887
----
got result from pid 308698 and P 0 and Q 308700
Printing from P 0 and Q 308700. 35th Fibonacci number is 5702887
----

My mind exploded while trying to figure out the sequence from this output! I wanted to know which lines get executed after which one. It seems there’s no order of such execution. If I run the code again, a different sequence will come up.

To figure out the sequence, lets try one final example.

Before diving into the details, let discuss about the pid or process id. We have been printing this pid for the previous two examples. This is basically the id that was assigned to a process. If we closely look at the outputs, we can see that some pid are 0 and some are non zero like 308700 or some other values. Now, this two types of value does have some meaning.

When we are into two multiverse or process — parent and child, how are we going to figure out in which process we are currently in? Are we in the parent process or in the child process? Is there any way to tell? Well, lets jump to line 3 of the previous code where p = os.fork() . This line creates a child process. This fork returns two values! One for the parent and one for the child. For parent, it returns the pid of the new child process created. For the child, the value returned is 0! So, when p’s value is 0, we can assume that we are inside our child multiverse! Otherwise, we are inside parent. This concept is heavily used when using fork().

Now, lets focus on our sequence of execution.

Inside another multiverse from pid 309327 
Outside the multiverse from pid 309328
Outside the multiverse from pid 309327
Inside another multiverse from pid 309329
Outside the multiverse from pid 309329

When line 3 is executed, a child process is created and there is a parent. Both this process starts running from the next line. So, when we are inside child, value of p is 0. So, it will not enter the if block. It will directly jump to the final print statement which for our case is Outside the multiverse from pid 309328 . For the parent process, p is a non zero value. So it enters the block and then executes another fork! (That’s tedious) Now, the new child process will start its execution from line 7. After printing it, it comes out of the block and prints the outside print. For parent, it also executes from line 7 and does the same!

Thus, we can find 5 output from this piece of code! It tells us the way it working. But the sequence? Its never the same. How the kernel chooses to schedule these processes is a different, very broad question.

--

--

Sadman Amin

Backend Developer with a deep interest in DevOps pursuing higher studies in Cybersecurity