Linux Performance Observability Tools

Introduction

Most of the software deployments I have worked on are using Linux servers. When running software on Linux, it’s important to have an understanding on various tools available to check how the system behaves.

There are many Linux commands available to analyze the performance of a server. This article focuses on the tools available to observe the system activity. There are tools to measure activity in various components of the system, such as Operating System, CPU, Memory, I/O etc. Brendan Gregg, who is a computer performance analyst, calls these tools as “Linux Performance Observability Tools”.

Most of these tools are safe to run without any additional overhead on the system. See following diagram by Brendan Gregg about “Linux Performance Observability Tools”. Brendan has categorized these tools as follows: Basic, Intermediate and Advanced

Linux Performance Observability Tools — Taken from http://www.brendangregg.com/linuxperf.html

Each of these tools has very detailed manual pages. It’s highly recommended to have a look at the manual pages for more information on the commands.

Please note that most of the descriptions for the columns in this article are taken directly from the manual pages. This story is also meant to be a quick reference and it’s easy since all details are in one place. The information copied from manual pages are formatted in italic text within quotation marks.

Let’s look at some of the standard Linux tools in detail. I will not be covering all tools mentioned in Brendan Gregg’s presentation on Linux Performance Tools.

Please note that I have tested all commands in Ubuntu 17.10. I changed the command prompt and title using following command.

PS1='$ \[\e]2;Terminal: Linux Performance Tools\a\]'

Basic Observability Tools

Uptime and Load Averages

The command “uptime” prints the system up time with load averages. It also shows the current time and number of user logged in.

The system load average is a good way to measure the demand for CPU resources. There are 3 numbers and those numbers represent the load average over the last minute, the last 5 minutes and the last 15 minutes. With these 3 numbers, we can determine the server load is increasing or decreasing over time.

uptime command

It’s important to check that the load average will not go beyond the number of processors in the system. There are several ways to find the number of processors. The “lscpu” command is a quick way to get details about the CPU architecture and see the number of CPUs. The “nproc” command prints the number of processing units available in the system.

If the load average is greater than the number of processors, there is more demand for resources and we need to identify which processes use more resources.

See following article to understand about the load average: “Understanding Linux CPU Load — when should you be worried?

Displaying the running processes in the system

The “top” command is used to display the system tasks. The top programs provides a limited interactive interface and the screen refreshes every 3 seconds by default. This delay time can be configured by pressing “d”. Since the screen refreshes periodically, the top can miss short-lived processes.

When you run the “top”, you can see following sections: 1) Summary Area; 2) Fields/Columns Header; 3) Task Area.

top command

The summary area shows following information

  • Uptime and load averages
  • Task and CPU states. Threads will be shown if the “Threads-mode” is toggled (Press Shift+h or H)
  • Memory Usage. Shown in KiB (kibibyte) by default. 1 Kib = 1024 bytes. Physical Memory, classified as: total, free, used and buff/cache. Swap, classified as: total, used, free and avail (which is an “estimation of physical memory available for starting new applications without swapping”)

CPU state percentages are based on the interval since last refresh.

  • us, user: “time running un-niced user processes”
  • sy, system: “time running kernel processes”
  • ni, nice: “time running niced user processes”
  • id, idle: “time spent in the kernel idle handler”
  • wa, IO-wait: “time waiting for I/O completion”
  • hi: “time spent servicing hardware interrupts”
  • si: “time spent servicing software interrupts”
  • st: “time stolen from this vm by the hypervisor”

Let’s see what are the meanings of default columns shown in top. For more information, see “Fields/Columns in the top manual page

  • PID (Process Id): “The task’s unique process ID.”
  • USER (User Name): “The effective user name of the task’s owner.”
  • PR (Priority): “The scheduling priority of the task.”
  • NI (Nice Value): “The nice value of the task. A negative nice value means higher priority, whereas a positive nice value means lower priority. Zero in this field simply means priority will not be adjusted in determining a task’s dispatch-ability”
  • VIRT (Virtual Memory Size (KiB)): “The total amount of virtual memory used by the task. It includes all code, data and shared libraries plus pages that have been swapped out and pages that have been mapped but not used.”
  • RES (Resident Memory Size (KiB)): “A subset of the virtual address space (VIRT) representing the non-swapped physical memory a task is currently using.”
  • SHR (Shared Memory Size (KiB)): “A subset of resident memory (RES) that may be used by other processes. It will include shared anonymous pages and shared file-backed pages. It also includes private pages mapped to files representing program images and shared libraries.”
  • S (Process Status): The status of the task which can be one of:
  • D = uninterruptible sleep
  • R = running
  • S = sleeping
  • T = stopped by job control signal
  • t = stopped by debugger during trace
  • Z = zombie
  • %CPU (CPU Usage): “The task’s share of the elapsed CPU time since the last screen update, expressed as a percentage of total CPU time.” Please note that the top shows CPU usage from all processors. So, the %CPU can go over 100%. For example, the if there are two processors, the top can show %CPU upto 200%.
  • %MEM (Memory Usage (RES)): “ A task’s currently resident share of available physical memory.”
  • TIME+ (CPU Time): “Total CPU time the task has used since it started. This reflects more granularity through hundredths of a second”
  • COMMAND (Command Name or Command Line): “Display the command line used to start a task or the name of the associated program. You toggle between command line and name with ‘c’, which is both a command-line option and an interactive command.”

There are lots of information available in top. See 30 Linux TOP Command Examples With Screenshots

Some important options are:

  • Order by Memory Usage (Press Shift+f and selecting the %MEM or just press Shift+m or M)
  • Display individual threads (Threads-mode) (Press Shift+h or H)

The “htop” command is also a good interactive process viewer. It’s not available by default and you will have to install the “htop” package.

What is ‘Nice’ in Linux?

In “top” and most of other Linux tools, we see a “Nice” value and it’s the CPU scheduling priority. A negative nice value means higher priority and a positive values means lower priority.

When a process is having a high nice value, it’s nicer to other processes.

In Linux, the priority range is -20..19. (The actual priority range may change in different kernel versions)

With the “nice” command, we can run a program with a custom nice value. We can change the nice value with the “renice” command.

Only the superuser can set a higher priority for a process.

The changing priority will be useful in many cases. One example is running a long background process with a high nice value (lower priority) is good to allow more CPU time for other processes.

Reporting a snapshot of the current processes

The “ps” command displays the active processes in the system. The ps command accepts UNIX style, BSD style and GNU style long options.

The default command just shows the processes for the current user and associated with current terminal. So, you need to use additional options if you want to see all active processes in the system.

As mentioned earlier, there are different styles for “ps” command options and most common ways to see all running processes are “ps aux” and “ps -ef

ps aux (Using BSD Syntax):

“ps aux” command
  • a: “list all processes with a terminal (tty), or to list all processes when used together with the x option.”
  • x: “list all processes owned by you (same EUID as ps), or to list all processes when used together with the a option.”
  • u: “Display user-oriented format.”

Following are some details on the column headers.

  • USER: “effective user name”
  • PID: “a number representing the process ID”
  • %CPU: “cpu utilization of the process in “##.#” format. Currently, it is the CPU time used divided by the time the process has been running (cputime/realtime ratio), expressed as a percentage. It will not add up to 100% unless you are lucky.”
  • %MEM: “ratio of the process’s resident set size to the physical memory on the machine, expressed as a percentage.”
  • VSZ: “virtual memory size of the process in KiB (1024-byte units).”
  • RSS: “resident set size, the non-swapped physical memory that a task has used (in kiloBytes).”
  • TTY: “controlling tty (terminal).”
  • STAT: “multi-character process state. See section PROCESS STATE CODES for the different values meaning.”
  • START: “starting time or date of the process.”
  • COMMAND: The command is shown. Depending on other options, the command arguments are shown.

ps -ef (Using Standard Syntax):

“ps -ef” command
  • -e: “Select all processes. Identical to -A.”
  • -f: “Do full-format listing. It also causes the command arguments to be printed.”

Following are the details for the column headers.

  • UID: “effective user ID”
  • PID: “a number representing the process ID”
  • PPID: “parent process ID.”
  • C: “processor utilization. Currently, this is the integer value of the percent usage over the lifetime of the process.”
  • STIME: “starting time or date of the process.”
  • TTY: “controlling tty (terminal).”
  • TIME: “cumulative CPU time.”
  • CMD: The command is shown. Depending on other options, the command arguments are shown.

For more output formats and details on columns, see the Standard Format Specifiers section in the manual page of ps.

Reporting Virtual Memory Statistics

The “vmstat” command reports information about processes, memory, paging, block IO, traps, disks and CPU activity.

The first report gives averages since the last reboot. We can also specify a “delay” to produce reports periodically. We can also give a “count” to limit the number of reports/updates. The delay and count are passed as arguments to vmstat command. e.g. “vmstat [ interval [ count ] ]

vmstat command

Following example uses “vmstat -SM 1”, which shows the stats in Megabytes and reports new stats every second.

“vmstat -SM 1" command

Following are the columns to check. (For more details, check the vmstat manual: “man vmstat”)

Procs

  • r: “This shows the runnable processes (running or waiting for run time)”. If this is greater than the number of CPUs, there is more demand for CPU resources.
  • b: The number of processes in uninterruptible sleep.”

Memory

  • swpd: “the amount of virtual memory used.”
  • free: “the amount of idle memory.”
  • buff: “the amount of memory used as buffers.”
  • cache: “the amount of memory used as cache.”
  • inact: “the amount of inactive memory. (-a option)”
  • active: “the amount of active memory. (-a option)”

Swap

  • si, so: “Amount of memory swapped in from disk and swapped to disk per second”.

If these values are non-zero, the system has run out of memory.

IO

  • bi, bo: “The number of blocks per second received from and sent to a block device”.

If these values are non-zero, there is I/O activity.

System

  • in: “The number of interrupts per second, including the clock.”
  • cs: “The number of context switches per second.”

These values are higher if there are processes doing lot of work.

CPU

These shows the percentages of total CPU time.

  • us: “Time spent running non-kernel code. (user time, including nice time)”
  • sy: “Time spent running kernel code. (system time)”
  • id: “Time spent idle. Prior to Linux 2.5.41, this includes IO-wait time.”
  • wa: “Time spent waiting for IO. Prior to Linux 2.5.41, included in idle.”
  • st: “Time stolen from a virtual machine. Prior to Linux 2.6.11, unknown.”

Reporting Input/Output statistics

The “iostat” command can be used to monitor the system input/output operations for the devices and partitions. This is command is similar to vmstat in the way it is executed. e.g. iostat [ interval [ count ] ]

This command may not be available by default. e.g. I installed the “sysstat” package in Ubuntu (“sudo apt install sysstat”).

The iostat command gives two types of reports, the CPU Utilization report and the Device Utilization report. Both reports are shown by default. Use following options to view a specific report.

  • -c: “Display the CPU utilization report.”
  • -d: “Display the device utilization report.”

The first report gives the statistics since the boot time. If you want to avoid the first report, you can use the “-y” option. e.g. “iostat -y 1 1

"iostat -y 1 1" command

When using the iostat command, following options are recommended.

  • -x: “Display extended statistics.”
  • -z: “Tell iostat to omit output for any devices for which there was no activity during the sample period.”

Following example also uses the “-m” option to display statistics in megabytes per second.

“iostat -xmz 1 3” command

See manual page of iostat for the descriptions of each column header.

Following are the device utilization report columns to check in default mode (without using “-x” option)

  • tps: “Indicate the number of transfers per second that were issued to the device. A transfer is an I/O request to the device. Multiple logical requests can be combined into a single I/O request to the device. A transfer is of indeterminate size.”
  • Blk_read/s (kB_read/s, MB_read/s), Blk_wrtn/s (kB_wrtn/s, MB_wrtn/s): “Indicate the amount of data read from or written to the device expressed in a number of blocks (kilobytes, megabytes) per second.”
  • Blk_read (kB_read, MB_read), Blk_wrtn (kB_wrtn, MB_wrtn): “The total number of blocks (kilobytes, megabytes) read/written.”

Following are the columns to check in extended mode

  • r/s, w/s: “The number (after merges) of read/write requests completed per second for the device.”
  • rsec/s (rkB/s, rMB/s), wsec/s (wkB/s, wMB/s): “The number of sectors (kilobytes, megabytes) read/written from the device per second.”
  • rrqm/s, wrqm/s: “The number of read/write requests merged per second that were queued to the device.”
  • %rrqm, %wrqm: “The percentage of read/write requests merged together before being sent to the device.”
  • r_await, w_await: “The average time (in milliseconds) for read/write requests issued to the device to be served. This includes the time spent by the requests in queue and the time spent servicing them.”
  • aqu-sz: “The average queue length of the requests that were issued to the device.
     Note: In previous versions, this field was known as avgqu-sz.”
  • rareq-sz, wareq-sz: “The average size (in kilobytes) of the read requests that were issued to the device.”
  • svctm: “The average service time (in milliseconds) for I/O requests that were issued to the device. Warning! Do not trust this field any more. This field will be removed in a future sysstat version.”
  • %util: “Percentage of elapsed time during which I/O requests were issued to the device (bandwidth utilization for the device). Device saturation occurs when this value is close to 100% for devices serving requests serially. But for devices serving requests in parallel, such as RAID arrays and modern SSDs, this number does not reflect their performance limits.”

Reporting Multi-Processor statistics

The command “mpstat” can be used to print CPU time breakdown for each available processor. This is a part of “sysstat” package. This command is also similar to “vmstat” and “iostat” in the way it is executed. e.g. “mpstat [ interval [ count ] ]

See following screenshot for the command “mpstat -P ALL 1 2”, which shows per CPU stats.

"mpstat -P ALL 1 2" command
  • CPU: “Processor number. The keyword all indicates that statistics are calculated as averages among all processors.”
  • %usr: “Show the percentage of CPU utilization that occurred while executing at the user level (application).”
  • %nice: “Show the percentage of CPU utilization that occurred while executing at the user level with nice priority.”
  • %sys: “Show the percentage of CPU utilization that occurred while executing at the system level (kernel). Note that this does not include time spent servicing hardware and software interrupts.”
  • %iowait: “Show the percentage of time that the CPU or CPUs were idle during which the system had an outstanding disk I/O request.”
  • %irq: “Show the percentage of time spent by the CPU or CPUs to service hardware interrupts.”
  • %soft: “Show the percentage of time spent by the CPU or CPUs to service software interrupts.”
  • %steal: “Show the percentage of time spent in involuntary wait by the virtual CPU or CPUs while the hypervisor was servicing another virtual processor.”
  • %guest: “Show the percentage of time spent by the CPU or CPUs to run a virtual processor.”
  • %gnice: “Show the percentage of time spent by the CPU or CPUs to run a niced guest.”
  • %idle: “Show the percentage of time that the CPU or CPUs were idle and the system did not have an outstanding disk I/O request.”

Please check the manual page of mpstat for more information.

We can use the mpstat command to check whether processes are using the CPU properly.

Displaying the amount of Free and Used Memory in the system

There is a simple command named “free” to display the amount of free and used physical memory and swap memory in the system. This tool also shows the buffers and caches used by the kernel.
The “free -m” command is the most common way of checking the free memory in megabytes.

There are also other useful options in the “free” command, such as displaying results continuously (using “-s [delay in seconds]” option) and displaying a total line (using “-t” option)

Following are the descriptions of each column.

  • total: “Total installed memory (MemTotal and SwapTotal in /proc/meminfo)”
  • used: “Used memory (calculated as total — free — buffers — cache)”
  • free: “Unused memory (MemFree and SwapFree in /proc/meminfo)”
  • shared: “Memory used (mostly) by tmpfs (Shmem in /proc/meminfo)”
  • buffers: “Memory used by kernel buffers (Buffers in /proc/meminfo)”
  • cache: “Memory used by the page cache and slabs (Cached and SReclaimable in /proc/meminfo)”
  • buff/cache: “Sum of buffers and cache
  • available: “Estimation of how much memory is available for starting new applications, without swapping. Unlike the data provided by the cache or free fields, this field takes into account page cache and also that not all reclaimable memory slabs will be reclaimed due to items being in use (MemAvailable in /proc/meminfo, available on kernels 3.14, emulated on kernels 2.6.27+, otherwise the same as free)”

It’s important to note that you don’t have to worry about “Mem:free” memory being very less as long as there are enough memory in “Mem:available”.

If there are enough memory available, the Linux system can quickly reclaim memory if processes need it. So, now you know that Linux did not eat your ram!

The “Swap:” line shows the swap memory usage. If “Swap:used” is not zero, your system is swapping.

Intermediate Observability Tools

Reporting statistics for Linux tasks

The command “pidstat” can be used to monitor individual Linux tasks. The pidstat can also monitor the child processes of selected tasks. This is also a part of “sysstat” package. This command is similar to “vmstat” and “iostat” in the way it is executed. e.g. “pidstat [ interval [ count ] ]

If the interval parameter is not specified, the task statistics are reported for the time since the system startup.

By default it shows the CPU utilization.

“pidstat” command

There are other types of reports as well. Following are some options.

  • -d: “Report I/O statistics”
  • -r: “Report page faults and memory utilization.”
  • -s: “Report stack utilization.”
  • -u: “Report CPU utilization.” This is the default report.
  • -v: “Report values of some kernel tables.”
  • -w: “Report task switching activity”

Following are some columns to check in each report.

I/O statistics

“pidstat -C java -d” command
  • kB_rd/s: “Number of kilobytes the task has caused to be read from disk per second.”
  • kB_wr/s: “Number of kilobytes the task has caused, or shall cause to be written to disk per second.”
  • kB_ccwr/s: “Number of kilobytes whose writing to disk has been cancelled by the task.”
  • iodelay: “Block I/O delay of the task being monitored, measured in clock ticks. This metric includes the delays spent waiting for sync block I/O completion and for swapin block I/O completion.”

Page faults and memory utilization

“pidstat -C java -r” command
  • minflt/s: “Total number of minor faults the task has made per second, those which have not required loading a memory page from disk.”
  • majflt/s: “Total number of major faults the task has made per second, those which have required loading a memory page from disk.”
  • VSZ: “Virtual Size: The virtual memory usage of entire task in kilobytes.”
  • RSS: “Resident Set Size: The non-swapped physical memory used by the task in kilobytes.”
  • %MEM: “The tasks’s currently used share of available physical memory.”

Stack utilization

“pidstat -C java -s” command
  • StkSize: “The amount of memory in kilobytes reserved for the task as stack, but not necessarily used.”
  • StkRef: “The amount of memory in kilobytes used as stack, referenced by the task.”

CPU utilization

“pidstat -C java -u” command
  • %usr: “Percentage of CPU used by the task while executing at the user level (application), with or without nice priority. Note that this field does NOT include time spent running a virtual processor.”
  • %system: “Percentage of CPU used by the task while executing at the system level (kernel).”
  • %guest: “Percentage of CPU spent by the task in virtual machine (running a virtual processor).”
  • %CPU: “Total percentage of CPU time used by the task. In an SMP environment, the task’s CPU usage will be divided by the total number of CPU’s if option -I has been entered on the command line.”
  • CPU: “Processor number to which the task is attached.”

Values of some kernel tables

“pidstat -C java -v” command
  • threads: “Number of threads associated with current task.”
  • fd-nr: “Number of file descriptors associated with current task.”

Task switching activity

“pidstat -C java -w” command
  • cswch/s: “Total number of voluntary context switches the task made per second. A voluntary context switch occurs when a task blocks because it requires a resource that is unavailable.”
  • nvcswch/s: “Total number of non voluntary context switches the task made per second. A involuntary context switch takes place when a task executes for the duration of its time slice and then is forced to relinquish the processor.”

Reporting Network Traffic Statistics

The “nicstat” command prints the network traffic statistics. This is not available by default in Ubuntu and it can be installed with the command: “sudo apt install nicstat”.

This “nicstat” is also similar to “vmstat”, “iostat” etc when reporting statistics periodically. e.g. “nicstat [ interval [ count ] ]

“nicstat 1 2" command

Above screenshot was taken while running a test with iperf3.

Following are the descriptions for each column.

  • Time: “The time corresponding to the end of the sample shown, in HH:MM:SS format (24-hour clock).”
  • Int: “The interface name.”
  • rKB/s, InKB: “Kilobytes/second read (received).”
  • wKB/s, OutKB: “Kilobytes/second written (transmitted).”
  • rPk/s, InSeg, InDG: “Packets (TCP Segments, UDP Datagrams)/second read (received).”
  • wPk/s, OutSeg, OutDG: “Packets (TCP Segments, UDP Datagrams)/second written (transmitted).”
  • rAvs: “Average size of packets read (received).”
  • wAvs: “Average size of packets written (transmitted).”
  • %Util: “Percentage utilization of the interface. For full-duplex interfaces, this is the greater of rKB/s or wKB/s as a percentage of the interface speed. For half-duplex interfaces, rKB/s and wKB/s are summed.”
  • Sat: “Saturation. This the number of errors/second seen for the interface — an indicator the interface may be approaching saturation.”

Following are some types of reports you can get from nicstat.

  • nicstat: “Prints out network statistics for all network cards (NICs)”
  • nicstat -s: “Display summary report.”
  • nicstat -t: “Show TCP statistics.”
  • nicstat -u: “Show UDP statistics.”
  • nicstat -x: “Display extended output.”
  • nicstat -a: “Equvalent to ‘-x -t -u’.”

See manual page of nicstat for the output details.

Reporting Network Connections

The “netstat” command is a very popular command to print network connections, routing tables, interface statistics, masquerade connections, and multicast memberships.

By default, the netstat command displays a list of open sockets, i.e. “Active Internet connections”, “Active UNIX domain sockets” etc. We are usually interested in the “Internet Connections”, i.e. TCP & UDP.

There are few options to specify the type of information printed in the “netstat” output. Following are those options:

  • (none): “By default, netstat displays a list of open sockets.”
  • — route , -r: “Display the kernel routing tables.”
  • — groups , -g: “Display multicast group membership information for IPv4 and IPv6.”
  • — interfaces, -i: “Display a table of all network interfaces.”
  • — masquerade , -M: “Display a list of masqueraded connections.”
  • — statistics , -s: “Display summary statistics for each protocol.”

The connections without servers are shown by default.

Following are some important options:

  • -a, — all: “Show both listening and non-listening sockets.” This shows servers and established connections.
  • -t: Show only TCP connections.
  • -u: Show only UDP connections.
  • -l, — listening: “Show only listening sockets.” This shows only servers.
  • -p, — program: “Show the PID and name of the program to which each socket belongs.” In this case, running the “netstat” command with root privileges is recommended to find out the processes and the users owning particular processes.
  • — numeric , -n: “Show numerical addresses instead of trying to determine symbolic host, port or user names.”
  • -c, — continuous: “This will cause netstat to print the selected information every second continuously.”
  • -e, — extend: “Display additional information. Use this option twice for maximum detail.”

Note:

  • The “netstat” command, by default tries to determine the hostname instead of showing the IP addresses and it can slow down the netstat output. Use “-n” option to avoid that.
  • With “-n” and “-e” options, the uid will be shown instead of the username.

We can combine above options to get more refined output. For example:

  • netstat -at: Display all (servers and established) TCP connections.
  • netstat -ant: Same as previous command, but disable hostname lookup methods for faster output.
  • sudo netstat -antp: Same as previous command, but display the process ID and Program.
  • sudo netstat -antpe: Same as previous command, but display the user ID. (Avoid “-n” to display the username.
  • sudo netstat -atpe — numeric-hosts: Similar to previous command, but display username and show the IP addresses instead of hostnames.
  • netstat -tnl: Display all servers listening on TCP.
  • netstat -ct: Show TCP connections continuously.
  • netstat -ie: Display detailed “Kernel Interface table”.

See following screenshot for “sudo netstat -atnp

“sudo netstat -atnp” command

Following are the meanings of each column.

  • Proto: “The protocol (tcp, udp, raw) used by the socket.”
  • Recv-Q: “The count of bytes not copied by the user program connected to this socket.”
  • Send-Q: “The count of bytes not acknowledged by the remote host.”
  • Local Address: “Address and port number of the local end of the socket.”
  • Foreign Address: “Address and port number of the remote end of the socket.”
  • State: “The state of the socket.”
  • PID/Program name: “Slash-separated pair of the process id (PID) and process name of the process that owns the socket.”

For more information on these columns and the states of socket, see the manual page of netstat.

The use of “netstat” command is now deprecated and there are few replacement commands available such as “ss, ip route (for netstat-r), ip -s link (for netstat -i), ip maddr (for netstat-g)”.

Following are the notes from man page.

This program is mostly obsolete. Replacement for netstat is ss. Replacement for netstat -r is ip route. Replacement for netstat -i is ip -s link. Replacement for netstat -g is ip maddr.

Checking the Swap Usage

The Swap usage can be checked by the “swapon” command.

swapon" command

If you have an older OS, the type of the swap might be a “partition”

It’s recommended to have swap enabled if you have less physical memory to run all your applications.

The swapoff command can be used to “disable devices and files for paging and swapping”

List Open Files in the system

The “lsof” command can be used to list the files opened by processes. By default, it shows all open files belonging to all active processes.

“sudo lsof | less” command

Following are the descriptions for each columns (shown by default). For more details, refer the manual page of lsof.

  • COMMAND: “The name of the UNIX command associated with the process.”
  • PID: “Process ID.”
  • TID: “Task ID. A blank TID column indicates a process.”
  • USER: “The user ID number or login name of the user to whom the process belongs, usually the same as reported by ps”.
  • FD: “The File Descriptor number of the file”. It also shows some predefined values such as “cwd”, “txt”, etc.
  • TYPE: “The type of the node associated with the file.”
  • DEVICE: “The device numbers, separated by commas.”
  • SIZE/OFF: “The size of the file or the file offset in bytes.”
  • NODE: “The node number of a local file.”
  • NAME: “The name of the file.”

Following are some useful examples:

  • lsof [names]: List processes, which opened specific files. e.g. lsof /var/log/auth.log /var/log/syslog
  • lsof +d [directory]: List opened files under a directory. e.g. lsof +d /var/log/
  • lsof +D [directory]: List opened files under a directory and all subdirectories. e.g. lsof +D /var/log/
  • lsof -c [command name]: List all files opened by the processes starting with the given name. e.g. lsof -c java
  • lsof -p [PID(s)]: List all files opened by specific processes. e.g. lsof -p 4352,4375
  • lsof -i: List all network connections.
  • lsof -i TCP: List all TCP connections.
  • lsof -iTCP -sTCP:ESTABLISHED: Show all active connections.
  • lsof -u [user]: List all open files by the given user. e.g. lsof -u syslog

The lsof options are very useful and there are many ways we can use those options to filter out what we need. For example, with “-i” option, we can filter by IP version, Protocol, hostname, port etc.

System Activity Monitor

The “sar” is a useful command to monitor performance of the system and it supports collecting, storing and reporting the system activity information.

It’s important to note that the “sar” tool can report system activity over a period of time. The data collection must be enabled if you need historical data.

The “sar” command is again a part of “sysstat” package. When you run the “sar” command, you might get a message as shown below if you have not enabled the data collection.

“sar” command output when it’s not enabled

To enable data collection, we need to set “ENABLED” property to “true” in “/etc/default/sysstat” file.

You can use following command with root permission to enable SAR.

#Enable
sed -i "s|ENABLED=\"false\"|ENABLED=\"true\"|" /etc/default/sysstat
#Restart the service
service sysstat restart

By default, the activity is reported in every 10 minutes. The activity is reported by a cron job and we can edit it from the file “/etc/cron.d/sysstat”. If you edit the cron expression, you will have to restart the “sysstat” service (sudo service sysstat restart).

It’s is also important to note that the activity logs will be kept only for seven days. If you want to change the number of days for the history, you can edit “/etc/sysstat/sysstat” file and change the “HISTORY” option.

The “sar” command can give you a lot of details about the system covering all aspects. With the historical data, you even troubleshoot an issue happened in the past.

Following are some useful options to get details in different subsystems.

  • -B: “Report paging statistics.”
  • -b: “Report I/O and transfer rate statistics.”
  • -d: “Report activity for each block device.”
  • -F [ MOUNT ]: “Display statistics for currently mounted filesystems.”
  • -H: “Report hugepages utilization statistics.”
  • -I { int_list | SUM | ALL }: “Report statistics for interrupts. int_list is a list of comma-separated values or range of values (e.g., 0–16,35,400-). The SUM keyword indicates that the total number of interrupts received per second is to be displayed. The ALL keyword indicates that statistics from all interrupts, including potential APIC interrupt sources, are to be reported. Note that interrupt statistics depend on sadc option “-S INT” to be collected.”
  • -m { keyword [,…] | ALL }: “Report power management statistics.”
  • -n { keyword [,…] | ALL }: “Report network statistics.”
  • -P { cpu_list | ALL }: “Report per-processor statistics for the specified processor or processors.”
  • -q: “Report queue length and load averages.”
  • -r [ ALL ]: “Report memory utilization statistics. The ALL keyword indicates that all the memory fields should be displayed.”
  • -S: “Report swap space utilization statistics.”
  • -u [ ALL ]: “Report CPU utilization.”
  • -v: “Report status of inode, file and other kernel tables.”
  • -W: “Report swapping statistics.”
  • -w: “Report task creation and system switching activity.”
  • -y: “Report TTY devices activity.”

Each of above options produce different reports and the details of each column are available in the manual page of “sar”.

NOTE: All these reports can be printed using “sar -A” command.

The “ksar” is a great tool you can use to visualize all the data collected from the sar tool.

See following diagram by Brendan Gregg showing all the sar report options.

Linux Performance Observability: sar — Taken from http://www.brendangregg.com/linuxperf.html

When we run “sar”, the report will be shown by reading the standard system activity daily data file. By default, “sar” return statistics for the current day. If you want to look statistics for a specific period, you can use “-s” and “-e” options to specify start and end times respectively.

For example: “sar -s 17:00:00 -e 17:30:00”

If you want to find data from a different day, you can use “-f” option to specify a SAR log file in “/var/log/sysstat/

For example: “sar -s 17:00:00 -e 17:30:00 -f /var/log/sysstat/sa01

By default, the SAR reports CPU utilization. You can use above mentioned options to get reports for different subsystems.

For example: “sar -s 17:00:00 -e 17:30:00 -f /var/log/sysstat/sa01 -r

The “sar” command also supports reporting periodically by giving an “interval” and optionally a “count” to limit to the number of outputs. e.g. “sar [ options ] [ interval [ count ] ]”. For example, the following command report CPU utilization for each 2 seconds. 5 lines are displayed: “sar -u 2 5

Let’s see some more example usages of sar command. Note that each command shown in examples use the interval and count. Then each command will show the real time data and it will not look for the activity log file.

CPU Usage for all CPUs

With “sar -u”, we can get CPU usage for all CPUs. The output is similar to “mpstat” command. This output gives the cumulative real-time CPU usage for all CPUs.

“sar -u 1 4” command

In this example, the CPU is not idling much and that means the CPUs are fully utilized. Checking %idle is a good way to figure out the CPU load.

CPU Usage of Individual CPUs/Cores

The command is “sar -P ALL”. This is similar to “mpstat -P ALL”.

“sar -P ALL 1 2” command

Memory Usage

The command is “sar -r

“sar -r 1 2” command

Following are some columns to check.

  • kbmemfree: “Amount of free memory available in kilobytes.”
  • kbmemused: “Amount of used memory in kilobytes. This does not take into account memory used by the kernel itself.”
  • %memused: “Percentage of used memory.”

Checking Network Interface Throughput

The command is “sar -n DEV

“sar -n DEV 1 2" command

Checking TCP statistics

The command is “sar -n TCP,ETCP

“sar -n TCP,ETCP 1 2” command

Check the run queue and load averages

The command is “sar -q

“sar -q 1 4" command

This is an important report to understand about server run queue and load average. See also “vmstat” command. If the run queue is more than the number of CPUs, there is more demand for CPU resources.

  • runq-sz: “Run queue length (number of tasks waiting for run time).”
  • plist-sz: “Number of tasks in the task list.”
  • ldavg-1: “System load average for the last minute. The load average is calculated as the average number of runnable or running tasks (R state), and the number of tasks in uninterruptible sleep (D state) over the specified interval.
  • ldavg-5: “System load average for the past 5 minutes.”
  • ldavg-15: “System load average for the past 15 minutes.”
  • blocked: “Number of tasks currently blocked, waiting for I/O to complete.”

Checking activity for each block device

The command is “sar -d”. This shows statistics similar to “iostat

“sar -d 1 2” command

Checking I/O and Transfer Rate Statistics

The command is “sar -b

“sar -b 1 2” command

Checking Swap Space Usage

The command is “sar -S” to report swap statistics. In this example, “kbswpused” and “%swpused” are at 0 and it means that the system is not swapping.

“sar -S 1 2” command

Report task creation and system switching activity

“sar -w 1 2” command
  • proc/s: “Total number of tasks created per second.”
  • cswch/s: “Total number of context switches per second.”

Report status of inode, file and other kernel tables

“sar -v 1 2” command

Advanced Observability Tools

Socket Statistics

The “ss” command is available to get socket statistics. We previously looked at “netstat” command to get network connection details. The “ss” command is a replacement available for the “netstat” as the usage of “netstat” is now deprecated. The “ss” command is also faster and capable of displaying more information than the “netstat” command.

The “ss” command by default lists all open non-listening established TCP, UDP and UNIX socket connections.

“ss | less” command

Following are some important options.

  • -r, — resolve: “Try to resolve numeric address/ports.”
  • -a, — all: “Display both listening and non-listening (for TCP this means established connections) sockets.”
  • -l, — listening: “Display only listening sockets (these are omitted by default).”
  • -o, — options: “Show timer information.”
  • -e, — extended: “Show detailed socket information”
  • -m, — memory: “Show socket memory usage.”
  • -p, — processes: “Show process using socket.”
  • -i, — info: “Show internal TCP information.”
  • -s, — summary: “Print summary statistics.”
  • -t, — tcp: “Display TCP sockets.”
  • -u, — udp: “Display UDP sockets.”
  • -d, — dccp: “Display DCCP sockets.”
  • -w, — raw: “Display RAW sockets.”
  • -x, — unix: “Display Unix domain sockets (alias for -f unix).”

Here are some examples.

Listing all TCP connections

Use -t and -a options to list all TCP connections. Example, “ss -ta

“ss -ta” command

Show process name and process ID

Using -p option to view the process name and process ID.

ss -atp” command

Show socket memory usage for TCP connections

Use -m option to view socket memory usage.

“ss -mtp” command

Show all TCP listening servers without resolving service names

The “ss -tlnp” can be used to quickly view all TCP servers running in your system.

ss -tlnp" command

Conclusion

This story covers some of the popular Linux Performance Observability Tools available.

Use this as a quick reference guide (cheat-sheet), but always check the manual pages as these tools are keep getting updated.