If you build (simple shell) they will come
If you’re using a computer for much more than word processing and the internet, you’re probably going end up becoming familiar with THE SHELL. Also known as a command-line interface (CLI), shell programs give you access to the hitherto unexplored reaches of your computer. Fundamentally, shell programs offer simple but very powerful facilities for interacting with your file system and executing text-based programs. These days, with the existence of graphical operating systems, like Windows or MacOS, in order to access a CLI, you use a terminal program, which allows you run a shell program from within your GUI OS.
There are a large number of shell programs out there, although several clearly have become most popular: those are bash (the Bourne-again shell), csh (C shell), ksh (Korn shell), and zsh (named for Zhong Shao). Each of them provides several essential functionalities:
- Ability to execute other programs in a subprocess.
- An interactive command line into which commands are entered.
- Varying levels of scripting; i.e. short programs in a concise syntax that can executed non-interactively using variables and simple flow control.
- Access permissions to safely administer computer resources.
- Many others, but this is a good basic list…
While I have been using the shell for a while, in order to gain a deeper appreciation for how shell programs work internally I have attempted to write a simple shell program of my own, on Linux, using the system-level API to interact directly with the kernel. Spoiler alert: it’s really hard to make a good shell and you probably will never see mine, unless I know you, which is likely, since not many other people will ever read this. I will demonstrate how it all works using a combination of sh, the original Bourne shell, as it is relatively simple in comparison with bash, and some system-level C code to show how it all works beneath the hood.
I’m going to try to explain, more or less, what happens when you start running sh and execute the command
When you start up your terminal program, typically there is a default shell already specified, which is read from one of several (often hidden) files located in your home directory. Every user on a system has their own home directory, which contains all of the files that they have created or installed for their own use. Depending on the level of permissions a given user is granted by the system administrator they may have access to additional files and programs, such as information about other users, and the ability to modify system properties.
When you log into your terminal, on Linux/UNIX-based operating systems, you will often be given a bash shell, unless otherwise specified in the file
~/profile. To invoke an instance of different shell you can call it as a command:
As you can see, my (nicely colored) command prompt from bash changes to a simple
$ once I begin a simple sh session. I am now operating from within a subprocess of the original bash session that I started up with. This is an important point to be aware of. Most commands entered on the command line are executed as a subprocess and are given a totally separate virtual memory during their execution. When they complete their execution or encounter an error, they return to the calling process, such as bash or sh. As a consequence, changes made in the subshell do not affect the parent shell, which I will explore more thoroughly below.
Shell programs are typically initialized with a number of variables that affect their execution environment and provide information about where commands are located and how to display them on the screen. One critical variable is called
PATH. Within the
PATH variable is stored a list of directories that the shell should search when a user enters a command after the prompt. For example, when I enter
ls without any arguments sh will find absolute path of
ls to be
which command, itself located at
/usr/bin/which let’s me know where the executable is located.
Not all commands are executable programs in this sense, however. Shell programs provide builtin functions that are hardwired into the shell source code. Some of these are written into the shell by convenience, while others are written this way by necessity. The
type function, which is a shell builtin, will tell you whether its argument is an executable or internal to the shell.
To appreciate why this is necessary you have to know a bit about how processes deal with memory. When a child process is spawned from a parent process it inherits its environment and whatever variables were set to be exported to its children. Child processes are unable to alter their parental environments, however. As a result, any operation that requires altering or interacting with the parent process directly requires the parent to perform this for itself. The execution environment, which the kernel keeps track of, includes things like the user ID, permissions, and current directory of the process. So, if you want to change the current directory, the only way to do it is to make a system call from the process whose directory you want to change. Thus,
cd has to be implemented as a builtin and uses a system call (aside: there are always convoluted ways to get around — or break — these restrictions, but by and large, the kernel must be invoked to manage such tasks).
System calls are the interface between user-space and kernel-space. It all boils down to ~1700 commands (at last check; cf. The Linux Programming Interface) that the operating system provides for input, output, and managing the file system.
Here is an example of short program to fork a process and execute a program before returning to the main program (notably using only system calls, including
unistd.h is an important header file that gives access to the underlying Linux system calls. It is not part of the standard library, but underlies all of the GCC library source code. Amazingly, the above code compiles and runs
ls -l as requested. Proof:
The fork, execve, wait paradigm is a hallmark of setting up and using a child process. The
fork() system call copies everything from the memory of the currently executing process into a new process. It returns the process id of the child process to the calling process (the parent). In contrast, it returns
0 to the child process. This allow you to write a single program that can determine which process it is looking at. Since it only returns
0 to the child process, by checking the value of
child in the above code it is possible to do different things in the parent and child, which exactly what we want. Here, we are executing the
ls -l command in the child and waiting in the parent for the return value.
While processes spawned using
fork() have a parent-child relationship, calls to
execve() produce a sibling process. That is, after forking the child, execve replaces the entire memory of the calling (child) process with its own memory and behaves as a completely separate program. Before
execve(), the child possesses an exact copies of all code and data in the scope of the parent at the time of forking.
execve(), then, overwrites the memory of its calling process if it is able to execute the specified program. If it succeeds, it severs the connection to the parent and goes on to live on its own in the real world…
While all of this is going on, the parent process can either keep on executing until it returns (leaving the child process with nowhere to return its exit status), or it can pause its own execution until the child process finishes executing. Even though the connection between the exec’d child and parent is severed, the kernel is aware of all processes. The system call
wait() essentially requests information about any existing child processes from the kernel, which sends a signal back to the parent to resume execution when a signal set by the child process tells the kernel that the child has finished executing. Once this happens,
wait() relays the exit status of the child and returns control to the parent. Finally, the parent can rest easy knowing the child process completed successfully.
Of course, not all processes are so lucky. When memory is limited, or the maximum number of processes allowed by the system are already running, various errors will be set by the system, which the parent can check before proceeding. A hallmark of system calls is that they must be very robust to all sorts of invalid input or resource limitations to preserve the integrity of the file system. Programmers should check all possible errors when using system calls.
execve(), besides the possibility that the file you are hoping execute does not exist, other expected sources of error include incorrect permissions of the user executing the parent program, which are inherited by all the children, or file locks due to another process interacting with the requested file simultaneously. Complications due to the latter type of problem can become extremely hairy when dealing with large, multiuser systems and presage issues surrounding concurrency and threading.
The shell uses the fork, execve, wait paradigm for virtually all its processes, apart from builtins, however to have useful shell there is another layer to getting repeated input. When you enter some text on the command line and press
return, all of that text gets passed to the stdin file stream. This is where input is stored before being read and interpreted by your program. Unlike the example program above, in which I hardcoded the path to the executable, a shell program has to accept arbitrary input from the user and request the correct resources in response. To do this, it goes through repeating loop of:
- Getting input using
read(), or the library function wrapper
- Tokenizing input into separate tokens (e.g. splitting at whitespace and separating commands at semicolons and ampersands).
- Performing pattern matching, e.g. via *’s and brace expansions.
- Searching for a matching command (first among aliases, then builtins, then external programs in
- If found, it then executes the command in a child process, or displays some error message.
- Then prints the prompt again and prepares itself for further input.
This is typically implemented in a while loop that breaks only when the program is no longer receiving input from stdin. When a request to the
read() system call generates no data from stdin — or possibly a script or a pipe — the shell interprets it as a signal to exit back to its calling process, bash in our case. Interestingly, any unclaimed children of a process, aka “zombies”, that are still hanging around a process exits are “adopted” by the mother of all processes, called
init, the first and last process to run when a Linux operating system starts up and shuts down.
A truly simple shell — absolutely no bells or whistles — using all of the above information could then be implemented like so:
Check it out:
There are a couple things I’d like to point out:
First, it worked, which is really nice.
Second, the entered commands produce the same result, including with arguments beyond the command path. This is because they are just the same old executables executing in their own processes, as usual.
Third, I can even spawn an instance of bash within my verysimpleshell. You see that the command line changes to my fancy colored prompt after calling
/bin/bash. Then I can call
exit — a bash builtin — and return to verysimpleshell. In contrast to a fully featured shell, like bash, if I try to call
echo in my shell without the full path it fails because it doesn’t know anything about the
Fourth, to quit my shell I type control-D. This flushes the
stdin and signals
EOF, or end-of-file, as
getline() has received no further data from a call to
read(), and returns
-1. I check for this in the loop, and then
Ah, classic shell behavior. There is so much more to “real” shells that get into some complicated territory, but hopefully this gives you a taste for the most basic version, which is already quite powerful.