A Journey into Process Isolation: kernel namespaces, control groups
1 — NAMESPACES :
Kernel namespaces were introduced into the Linux kernel in version 2.6.24, which was released in January 2008. Their implementation was primarily driven by the need for better resource isolation and process containment, especially in the context of virtualization and containerization technologies.
1-PID Namespace:
The PID namespace virtualizes the process ID (PID) space, providing each namespace with its own range of PID values. Processes within a PID namespace perceive only the processes and child processes that belong to the same namespace, offering process isolation and hierarchical organization.
note that the concept is hard to understand at first but with some patience, we’ll prevail. first, we’ll see the process tree that we’re working under
we’ll have to run this command pstree
systemd─┬─2*[agetty]
├─cron
├─dbus-daemon
├─init-systemd(Ub─┬─SessionLeader───Relay(4750)───zsh───p+
│ ├─init───{init}
│ ├─login───bash
│ └─{init-systemd(Ub}
├─networkd-dispat
├─packagekitd───2*[{packagekitd}]
├─polkitd───2*[{polkitd}]
├─rsyslogd───3*[{rsyslogd}]
├─snapd───10*[{snapd}]
├─5*[snapfuse]
├─subiquity-serve───python3.10─┬─python3
│ └─5*[{python3.10}]
├─systemd───(sd-pam)
├─systemd-journal
├─systemd-logind
├─systemd-resolve
├─systemd-udevd
└─unattended-upgr───{unattended-upgr}
keep in mind that the examples are experienced in wsl so the processes are too few for a normal bare machine OS, but that's not the topic of today
now before even creating a namespace, we’ll have to learn a little about the command unshare
the unshare command runs the program in a new namespace which is created based on the flags given which specify the type of namespace you want I'd recommend reading more about it on this page: https://www.man7.org/linux/man-pages/man1/unshare.1.html
to test the flags let’s create this namespace
unshare --user --pid --map-root-user --mount-proc --fork bash
this creates a new pid and user namespace, maps this user to the new namespace, and maps a new proc filesystem and fork bash. let’s see what are the processes running under this namespace
root@????:~# pstree
bash───pstree
root@????:~# ps
PID TTY TIME CMD
1 pts/0 00:00:00 bash
9 pts/0 00:00:00 ps
not that no other processes are visible, I am completely isolated.
now if we go back to the parent process and monitor what happens or which namespaces are created in our system using lsns( list namespaces)
$ lsns
NS TYPE NPROCS PID USER COMMAND
4026531834 time 7 440 ridwane /lib/systemd/systemd --user
4026531835 cgroup 7 440 ridwane /lib/systemd/systemd --user
4026531837 user 5 440 ridwane /lib/systemd/systemd --user
4026531840 net 7 440 ridwane /lib/systemd/systemd --user
4026532242 ipc 7 440 ridwane /lib/systemd/systemd --user
4026532254 mnt 5 440 ridwane /lib/systemd/systemd --user
4026532256 uts 7 440 ridwane /lib/systemd/systemd --user
4026532257 pid 6 440 ridwane /lib/systemd/systemd --user
4026532259 user 2 5274 ridwane unshare --user --pid --map-
4026532260 mnt 2 5274 ridwane unshare --user --pid --map-
4026532261 pid 1 5275 ridwane bash
now as we can see parent process can really monitor what you run in the child because it’s only isolated to other processes on the same branch not parents, let’s run a command in the child
root@DESKTOP-VGRURMU:~# sleep 2100 &
[1] 20
root@DESKTOP-VGRURMU:~# sleep 2200 &
[2] 21
root@DESKTOP-VGRURMU:~# sleep 2300 &
[3] 22
root@DESKTOP-VGRURMU:~#
From the child perspective :
root@DESKTOP-VGRURMU:~# pstree
bash─┬─pstree
└─3*[sleep]
root@DESKTOP-VGRURMU:~# ps
PID TTY TIME CMD
1 pts/0 00:00:00 bash
20 pts/0 00:00:00 sleep
21 pts/0 00:00:00 sleep
22 pts/0 00:00:00 sleep
24 pts/0 00:00:00 ps
root@DESKTOP-VGRURMU:~#
From the parent perspective :
systemd─┬─2*[agetty]
├─cron
├─dbus-daemon
├─init-systemd(Ub─┬─SessionLeader───Relay(5422)───zsh───unshare───bash───3*[sleep]
│ ├─SessionLeader───Relay(5496)───zsh───pstree
│ ├─init───{init}
│ ├─login───bash
│ └─{init-systemd(Ub}
├─networkd-dispat
├─packagekitd───2*[{packagekitd}]
├─polkitd───2*[{polkitd}]
├─rsyslogd───3*[{rsyslogd}]
├─snapd───10*[{snapd}]
├─5*[snapfuse]
├─subiquity-serve───python3.10─┬─python3
│ └─5*[{python3.10}]
├─systemd───(sd-pam)
├─systemd-journal
├─systemd-logind
├─systemd-resolve
├─systemd-udevd
└─unattended-upgr───{unattended-upgr}
$ pstree -p 5479
bash(5479)─┬─sleep(5546)
├─sleep(5547)
└─sleep(5548)pstree -p 5479
now to make this topic even longer hhh let’s create a namespace inside a namespace :
root@DESKTOP-VGRURMU:~# lsns -t pid
NS TYPE NPROCS PID USER COMMAND
4026532261 pid 2 1 root bash
root@DESKTOP-VGRURMU:~# unshare -p -f --mount-proc sleep 4000 &
[1] 8
root@DESKTOP-VGRURMU:~# ps
PID TTY TIME CMD
1 pts/2 00:00:00 bash
8 pts/2 00:00:00 unshare
9 pts/2 00:00:00 sleep
10 pts/2 00:00:00 ps
root@DESKTOP-VGRURMU:~# lsns -t pid
NS TYPE NPROCS PID USER COMMAND
4026532261 pid 3 1 root bash
4026532263 pid 1 9 root sleep 4000
root@DESKTOP-VGRURMU:~# pstree
bash─┬─pstree
└─unshare───sleep
now we see that the parent process can really see the child in every case.
little summary :
- processes can only see other processes in their own pid namespace, and any descendant namespaces they have
- the root pid namespace is the initial pid namespace other pid namespaces descend from it. thus, the root pid namespace can see all processes in all pid namespaces on the system
- process can have the same pid as long as they are in different pid namespaces
- processes have a unique pid in their own pid namespace, and they also have an additional pid in each parent in the tree above them up to and including the root pid namespace
2 -Network Namespace:
now the same goes for the network except when creating a network namespace it will not be connected to the network until you assign it a virtual network, now for us we’ll create two network namespace and make them talk to each other
creating the two network namespaces using ip-netns(network namespace manager)
➜ ~ sudo ip netns add red
➜ ~ sudo ip netns add blue
➜ ~ ip netns
blue
red
➜ ~
let’s first see the interfaces in the host machine using ip-link
➜ ~ ip link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
link/ether ??:??:??:??:??:?? brd ff:ff:ff:ff:ff:ff
now let’s try to run a command in one of the network namespaces
➜ ~ sudo ip netns exec red ip link
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
➜ ~ sudo ip netns exec blue ip link
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
let’s use the arp command to display the Address Resolution Protocol(ARP) cache table in the host machine
➜ ~ arp
Address HWtype HWaddress Flags Mask Iface
???? ether ??:??:??:??:??:?? C eth0
➜ ~ sudo ip netns exec red arp
➜ ~ sudo ip netns exec blue arp
➜ ~ sudo ip netns exec blue route
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
➜ ~ sudo ip netns exec red route
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
both the two namespaces do not have entries in the arp cache nor the routing table
now to connect these two network namespaces together we’ll have to create what’s known as a virtual cable or a pipe
➜ ~ sudo ip link add veth-red type veth peer name veth-blue
➜ ~
now let's attach every end to the appropriate namespace
➜ ~ sudo ip link set veth-red netns red
➜ ~ sudo ip link set veth-blue netns blue
now we assign to each of these namespaces a virtual IP address
➜ ~ sudo ip -n red addr add 192.168.14.1 dev veth-red
➜ ~ sudo ip -n blueaddr add 192.168.14.2 dev veth-blue
after that, we bring up the interface
➜ ~ sudo ip -n red link set veth-red up && sudo ip -n blue link set veth-blue up
now to summer up the process you can just write a small bash script :
#!/bin/bash
# Create network namespaces
sudo ip netns add red
sudo ip netns add blue
# Create veth pair
sudo ip link add veth-red type veth peer name veth-blue
# Move veth-red to namespace "red"
sudo ip link set veth-red netns red
# Move veth-blue to namespace "blue"
sudo ip link set veth-blue netns blue
# Set IP addresses for veth interfaces
sudo ip -n red addr add 192.168.14.1/24 dev veth-red
sudo ip -n blue addr add 192.168.14.2/24 dev veth-blue
# Bring up the interfaces
sudo ip -n red link set veth-red up
sudo ip -n blue link set veth-blue up
# Test connectivity
sudo ip netns exec red ping -c 4 192.168.14.2
➜ ~ bash test.sh
PING 192.168.14.2 (192.168.14.2) 56(84) bytes of data.
64 bytes from 192.168.14.2: icmp_seq=1 ttl=64 time=0.049 ms
64 bytes from 192.168.14.2: icmp_seq=2 ttl=64 time=0.044 ms
64 bytes from 192.168.14.2: icmp_seq=3 ttl=64 time=0.085 ms
64 bytes from 192.168.14.2: icmp_seq=4 ttl=64 time=0.082 ms
--- 192.168.14.2 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3132ms
rtt min/avg/max/mdev = 0.044/0.065/0.085/0.018 ms
➜ ~
we see that with that we achieved connectivity between these two network namespaces. now that’s as deep as we’re gonna go for this part but you can definitely go deeper if you interested
3 — Mount Namespace:
Mount namespaces provide isolated views of the filesystem mount points within a process’s environment, enabling independent filesystem configurations without affecting the global filesystem or other namespaces. In this section, we’ll explore how mount namespaces work and demonstrate how to create and manipulate them.
How Mount Namespace Works:
When a new mount namespace is created, it starts with an initial set of mount points inherited from the parent namespace. However, processes within the new namespace can modify these mount points dynamically, including mounting and unmounting filesystems, changing mount options, and creating new mount points.
Creating and Manipulating Mount Namespaces:
To create and manipulate mount namespaces, we’ll use command-line tools like unshare
and mount
. Below are the steps to create a new mount namespace and experiment with it:
Create a New Mount Namespace:
sudo unshare --mount
- Manipulate Mount Points: Within the new namespace, manipulate mount points using commands like
mount
andumount
. For example, you can mount a filesystem;
sudo mount -t tmpfs none /mnt
2. View Mount Points: Use the mount
command to view the mount points within the namespace:
mount
3. Experiment: Experiment with different filesystems, mount options, and mount point configurations to observe their effects on the filesystem view within the namespace.
Example Bash Script:
You can also automate the creation and manipulation of mount namespaces using a Bash script. Below is an example script that creates a new mount namespace, mounts a temporary filesystem, and lists the mounted filesystems:
#!/bin/bash# Create a new mount namespace
sudo unshare --mount # Mount a temporary filesystem
sudo mount -t tmpfs none /mnt # List mounted filesystems
mount
Conclusion:
Mount namespaces provide powerful capabilities for isolating and manipulating filesystems within Linux environments. By creating and experimenting with mount namespaces, you can gain a deeper understanding of filesystem management and resource isolation in Linux.
4 — UTS Namespace:
UTS namespaces, short for Unix Timesharing System, provide isolation for the hostname and domain name identifiers within a Linux system. Each UTS namespace has its own unique hostname and domain name, allowing processes to have independent identification within the system. In this section, we’ll explore the capabilities and use cases of UTS namespaces.
Capabilities and Use Cases:
- Hostname Isolation: UTS namespaces allow processes to have their own isolated hostname within the system. This is useful for containerization platforms like Docker, where each container can have its own hostname independent of the host system or other containers.
- Domain Name Isolation: In addition to hostnames, UTS namespaces provide isolation for domain names, allowing processes to have unique domain identifiers within the system. This can be helpful for networked applications that rely on domain names for communication.
- Containerization: UTS namespaces are commonly used in containerization environments to provide complete isolation for container identities, including hostnames and domain names. This ensures that containers behave as independent entities within the system, even if they share the same underlying kernel.
- Process Identification: UTS namespaces allow processes to be identified uniquely based on their hostname and domain name, facilitating resource management, debugging, and monitoring within the system.
Creating and Manipulating UTS Namespaces:
To create and manipulate UTS namespaces, you can use command-line tools like unshare
and hostname
. Below are the steps to create a new UTS namespace and experiment with it:
→ Create a New UTS Namespace:
sudo unshare --uts
→ Set Hostname: Within the new namespace, set a unique hostname using the hostname
command:
sudo hostname new-hostname
→ View Hostname: Use the hostname
command to view the hostname within the namespace:
hostname
Conclusion:
UTS namespaces provide isolation for hostname and domain name identifiers within a Linux system, allowing processes to have independent identification. By creating and experimenting with UTS namespaces, you can gain a deeper understanding of process isolation and resource management within Linux environments.
5 — IPC Namespace:
IPC namespaces, or Inter-Process Communication namespaces, provide isolation for various IPC mechanisms such as message queues, semaphores, and shared memory segments within a Linux system. Each IPC namespace has its own set of IPC objects, allowing processes to communicate independently of other namespaces. In this section, we’ll delve into the capabilities and use cases of IPC namespaces.
Capabilities and Use Cases:
- Isolation of IPC Mechanisms: IPC namespaces isolate various IPC mechanisms, including message queues, semaphores, and shared memory segments, between processes within the namespace. This ensures that processes within the same namespace have their own independent communication channels without interference from processes in other namespaces.
- Resource Management: IPC namespaces enable better resource management by providing separate communication channels for processes. This helps prevent resource contention and ensures that processes within the same namespace can communicate efficiently without impacting processes in other namespaces.
- Containerization: IPC namespaces are essential for containerization platforms like Docker, where each container requires its own isolated communication channels for inter-process communication. By using IPC namespaces, containers can communicate independently of each other and the host system, enhancing security and isolation.
- Process Isolation: IPC namespaces facilitate process isolation by providing separate IPC mechanisms for processes within the same namespace. This ensures that processes cannot interfere with each other’s communication channels, enhancing security and reliability.
Creating and Manipulating IPC Namespaces:
#!/bin/bash
# Create a new IPC namespace
sudo unshare --ipc
# List IPC objects within the namespace
ipcs
# Create a message queue
sudo ipcmk -Q
# Display message queue information
ipcs -q
# Send a message to the message queue
sudo ipcsnd -q <message_queue_id> "Hello, world!"
# Receive a message from the message queue
sudo ipcrm -q <message_queue_id>
Conclusion:
IPC namespaces provide isolation for inter-process communication mechanisms within a Linux system, allowing processes to communicate independently within their own namespaces. By creating and experimenting with IPC namespaces, you can gain a deeper understanding of process isolation and inter-process communication in Linux environments.
useful :
NAME
ipcs - show information on IPC facilities
SYNOPSIS
ipcs [options]
DESCRIPTION
ipcs shows information on System V inter-process communication facilities. By default it shows information
about all three resources: shared memory segments, message queues, and semaphore arrays.
root@DESKTOP-????:/home/???# ipcs
------ Message Queues --------
key msqid owner perms used-bytes messages
------ Shared Memory Segments --------
key shmid owner perms bytes nattch status
------ Semaphore Arrays --------
key semid owner perms nsems
2 — control groups:
Control Groups (cgroups) are a powerful feature in the Linux kernel that provide mechanisms for managing and controlling the allocation of system resources to processes or groups of processes. Introduced in kernel version 2.6.24, cgroups enable administrators to enforce resource limits, prioritize resource usage, and isolate processes, thereby ensuring efficient resource utilization and system stability.
Key Features and Concepts:
- Resource Management: Cgroups allow administrators to manage various system resources, including CPU time, memory, disk I/O, and network bandwidth. By defining resource limits and isolation policies, administrators can prevent resource contention and ensure fair allocation of resources across different workloads.
- Hierarchical Structure: Cgroups are organized in a hierarchical structure, forming a tree-like hierarchy that reflects the organizational structure of the system. Each control group can contain processes or subgroups, allowing for flexible resource management at different levels of granularity.
- Subsystem Integration: Cgroups support various subsystems, each responsible for managing a specific set of resources. Common subsystems include CPU, memory, block I/O, network, and others. By integrating with these subsystems, cgroups provide a unified interface for managing diverse system resources.
- Usage Scenarios: Cgroups are used in a variety of scenarios, including process isolation (e.g., containerization with Docker, LXC), workload management (e.g., prioritizing critical processes, limiting resource usage of non-critical processes), resource accounting (e.g., tracking resource usage for billing or accounting purposes), and performance tuning (e.g., optimizing resource allocation for specific workloads).
- Control Interface: Cgroups are managed through a virtual filesystem interface located at
/sys/fs/cgroup
(or/sys/fs/cgroup/<subsystem>
for subsystem-specific settings). Administrators can interact with cgroups using command-line tools likecgcreate
,cgset
,cgexec
, andcgdelete
, as well as through programmatic interfaces provided by programming languages like C, Python, and others. - Integration with Containerization: Cgroups play a crucial role in containerization technologies like Docker, Kubernetes, and others. Container runtimes use cgroups to enforce resource limits, isolate container processes, and ensure predictable performance and behavior.
Below is a full script with explanations that demonstrate how to limit CPU usage for a group of processes using control groups (cgroups) in Linux:
#!/bin/bash
# Experiment: Resource Limitation with Control Groups
# This script demonstrates how to limit CPU usage for a group of processes using control groups (cgroups) in Linux.
# Step 1: Create a Control Group
# Use the cgcreate command to create a new control group named cpu_limit.
sudo cgcreate -g cpu,cpuacct:/cpu_limit
# Step 2: Set CPU Usage Limit
# Use the cgset command to set a CPU usage limit for the cpu_limit control group.
# In this example, we set the CPU usage limit to 50% (50000 microseconds).
sudo cgset -r cpu.cfs_quota_us=50000 cpu_limit
# Step 3: Move Processes to the Control Group
# Identify the PIDs of the processes you want to limit (e.g., using ps or top commands).
# For demonstration purposes, we'll use the current shell process as an example.
PID=$$
# Use the cgclassify command to move the processes to the cpu_limit control group.
sudo cgclassify -g cpu,cpuacct:/cpu_limit $PID
# Step 4: Verify Resource Limitation
# Monitor the CPU usage of the processes within the control group using tools like top, htop, or ps.
# Observe that the CPU usage of the processes is limited to the specified quota (e.g., 50% in this experiment).
# Print confirmation message
echo "Experiment completed. CPU usage limited for process with PID $PID."
to monitor the CPU usage an check if it works properly I recommend you use htop or top or btop if you love visual effects
Conclusion:
By performing this experiment, you have demonstrated the use of control groups to limit CPU usage for a group of processes. This capability is useful for controlling resource usage, preventing resource contention, and ensuring fair allocation of resources across different workloads.