Linux troubleshooting: Memory analysis

Sebastian Dahlgren
Saltside Engineering
4 min readMay 5, 2022
RAM is the computers short term memory. [Nazarethman/Getty images]

In this second part of the Linux troubleshooting series we’ll take a closer look at how to debug memory issues on Linux servers.

Getting an overview

To quickly see the current memory usage, you’d run the free command. This will give you cross system details on how much memory is used and for what.

$ free -h
total used free shared buff/cache available
Mem: 31Gi 4.4Gi 20Gi 2.1Gi 6.0Gi 24Gi
Swap: 2.0Gi 0B 2.0Gi

What does a healthy system look like?

A healthy Linux system with more than enough memory will, after running for a while, show the following expected and harmless behavior:

free memory is close to zero.

used memory is close to total. This means a lot of the memory is used for caching data for it to be readily available.

available memory has more than 20% of the total memory available (or so).

swap is stable.

Do not get alarmed when free is close to zero, this is perfectly normal and means that your memory is caching a lot of content. Provided available is not too low, of course. If some process need more memory, Linux would remove items from the cache and give to the process requesting for space — but that can only be done if there is memory available.

What does an unhealthy system look like?

Warning signs of a genuine low memory situation that you may want to look into:

available memory is close to zero.

swap increases or fluctuates. This would indicate that the memory is full and Linux is using disk (swap) for storing additional items. Swapping is a slow process and is a clear indication of a system with saturized memory resources.

dmesg | grep oom-killer shows the OutOfMemory-killer killing processes that are running out of memory. It means what it says; you’re out of memory 😃.

Identifying swapping

Our friend from the previous article about CPU analysis, vmstat -w 1 can also help in checking for swap issues.

$ vmstat -w 1
... --------------------memory---------------------- ---swap-- ...
... swpd free buff cache si so ...
... 0 21747572 201632 6180840 0 0 ...

Output above cropped for readability.

What to look for?

The key thing here is to look for the trend of the si and so columns. They would tell you if the system is swapping in (reading) or swapping out (writing) memory data to disk. These numbers should be zero in the normal scenario. Or at least not consistently being non-zero, because then the system is reading from disk often, which decreases performance.

If that happens, you need to either buy more RAM or optimize memory consumption in your applications.

Finding out which process is using memory

Just like in the case of CPU, top could help you finding processes that consume memory. But I prefer pidstat -r | head -20 or pidstat -r 1 30 to get output that I can more easily copy and share.

$ pidstat -r | head -20                                                                                                                                                                                   16:12:01
Linux 5.13.0-40-generic (tsunami) 05/03/2022 _x86_64_ (8 CPU)
UID PID minflt/s majflt/s VSZ RSS %MEM Command
0 1 1.33 0.01 164984 11292 0.03 systemd
127 787 0.01 0.00 7912 3892 0.01 rpcbind
0 892 0.01 0.00 2928 2080 0.01 acpid
...

What to look for?

This table gives you a lot of important information.

%MEM shows the memory usage of the total available memory. Gives an indication of how large this process is compared to other processes on the host. If this is constantly high, you may have too little RAM allocated for this host.

RSS shows physical memory allocated to the process. This does not include swap storage nor shared libraries. If this number is growing constantly over time, you may have a memory leak.

VSZ shows the total amount of virtual memory allocated to the process. This includes swap, used and unused memory etc.

minflts/s shows the number of minor page faults that occur per second. A minor fault is when e.g. a part of a shared library is loaded in memory, but not mapped to the requesting process. Do not worry about this.

majflts/s shows the number of major page faults that occur per second. A major page fault occurs when the kernel need to allocate more memory to the process. It can occur when only parts of the executed binary has been read into memory for example.

--

--

Sebastian Dahlgren
Saltside Engineering

Developer and Go ❤. Love code. And clouds. Nerd on a mission.