My first bash bug debugging
One day last week, I tried to open my terminal and I was surprised by this error:
-bash: xmalloc: cannot allocate 4000000016 bytes
Wat
Bash was trying to allocate 4Gb of memory? Why?
Looking at my Frankenstein .bashrc file, I decided that it was time to try to understand what each variable actually does. I like to keep everything I ever executed in my history, with timestamps and multi line support. I started cleaning it, and found out that this issue stopped happening once I removed the variables responsible for my bash’s history size.
These are the ones to blame:
export HISTSIZE=500000000
export HISTFILESIZE=500000000
I had both $HISTSIZE
and $HISTFILESIZE
set up to 500,000,000 lines. It's a reasonable number, right? \s
But what do they actually do?
HISTFILESIZE
: the maximum number of commands to store in your$HISTFILE
file, which by default is~/.bash_history
.$HISTSIZE
: the maximum number of commands that will be kept in memory and flushed to disk (into$HISTFILE
) once you exit the shell. It is also used to load the last$HISTSIZE
lines from your history file into memory when you open a new shell.
Now back to the problem. Let’s invoke our friend strace
. I left only the important lines of the trace:
$ strace -f bash
...
[pid 11677] open("/home/carla/.bash_history", O_RDONLY) = 3
[pid 11677] fstat(3, {st_mode=S_IFREG|0600, st_size=16755, ...}) = 0
[pid 11677] read(3, "ls\n#1478028902\n./i3-get-window-c"..., 16755) = 16755
[pid 11677] close(3) = 0
[pid 11677] chown("/home/carla/.bash_history", 1000, 100) = 0
[pid 11677] stat("/home/carla/.bash_history", {st_mode=S_IFREG|0600, st_size=16755, ...}) = 0
[pid 11677] open("/home/carla/.bash_history", O_RDONLY) = 3
[pid 11677] fstat(3, {st_mode=S_IFREG|0600, st_size=16755, ...}) = 0
[pid 11677] read(3, "ls\n#1478028902\n./i3-get-window-c"..., 16755) = 16755
[pid 11677] close(3) = 0
[pid 11677] mmap(NULL, 4000002048, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Cannot allocate memory)
[pid 11677] brk(0xf0e0a000) = 0x2746000
[pid 11677] mmap(NULL, 4000137216, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Cannot allocate memory)
[pid 11677] mmap(NULL, 134217728, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x7f9e145ae000
[pid 11677] munmap(0x7f9e145ae000, 61153280) = 0
[pid 11677] munmap(0x7f9e1c000000, 5955584) = 0
[pid 11677] mprotect(0x7f9e18000000, 135168, PROT_READ|PROT_WRITE) = 0
[pid 11677] mmap(NULL, 4000002048, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Cannot allocate memory)
[pid 11677] write(2, "-bash: xmalloc: cannot allocate "..., 49-bash: xmalloc: cannot allocate 4000000016 bytes ) = 49
[pid 11677] exit_group(2) = ?
[pid 11677] +++ exited with 2 +++
...
strace
is telling me that as part of starting bash, it reads my .bash_history
and then allocate memory right after. It was trying to allocate 4,000,000,016 bytes (4Gb) of memory. My first reaction was to blame Google Chrome for using all of my memory, causing memory starvation of my dear friend, bash.
This is all the information I have until this point: the history variables in my .bashrc, $HISTSISZE
and $HISTFILESIZE
, set to remember and store 500,000,000 commands, was causing bash to allocate 4Gb of memory upon start.
My Hypothesis
Let’s do some math.
How are these two numbers related? My first guess was to divide one by the other:
(4000000000 + 16) / (500000000 + 2)
= 8 bytes per line, where this 2 is being added for some unknown reason.
strace
can’t help me any further at this time, so I downloaded the source code. I made sure I had the same version as the one installed on my laptop (GNU bash, version 4.4.0(1)-release).
Now it’s time to use GDB. The code needs to be compiled with extra flags to allow the debugging.
$ cd ~/bash/bash/
$ ./configure
$ CFLAGS='-g -Wall -Wextra' make
I cleaned my .bashrc, leaving only these two lines:
After glancing into the code (a.k.a using grep
and find
for the word ‘history’), I found a file I thought would lead me to the reason it was trying to allocate memory. Let’s look atlib/readline/history.c
, specifically the function add_history (line_start)
, defined at line 276.
Now we’re good to go. Digging into the code with gdb:
BOOM! Int overflow.
Let’s try to understand what happened here.
This is the important information for us at the moment: history_max_entries
equals 500000000
, which is the value I have for $HISTSIZE
on my .bashrc.
How does it end up there? It all starts when bash wants to load history: load_history()
at bashhist.c:300
. It fetches the value of $HISTSIZE
from your environment.
Let’s see what this function sv_histsize
on line 12 does. It is defined at lib/readline/bind.c
line 1825.
As a bonus we also can see that it sets the default value to 500 again. Why is it set twice, I’m not sure.
I see that nval
will have my $HISTSIZE
value. Now let’s see what the function stifile_history (nval)
.
Remember the variable history_max_entries
? The function above, on line 25, sets its value to 500000000
.
As we saw in the GDB output, 500000000
was then assigned to the variablehistory_size
, and later multiplied by 8 as argument to the functionxmalloc()
, causing the int overflow.
This bug was introduced on September 15th (a0c0a00
), on the 4.4 release and fixed in patch 1 on November 14th (8ddc8d6
). The fix was to set a maximum limit for initial memory allocation:
Update
Nenad Stojanovski pointed that my conclusion is not correct. The real reason for bash’s crash wasn’t an int overflow, but lack of memory available. The detailed explanation is available here.
What I learned
There is no point into setting the $HISTSIZE
more than 8192, since its limit was now set on the patch (unless it is being used elsewhere, which I didn’t look for. If so, please let me know).
So I changed my .bashrc to:
export HISTSIZE=8192
export HISTFILESIZE=500000000
But then I remembered that I read somewhere we could improve bash’s history even more. Instead of keeping in memory the commands to later be appended to history file (and if you crash your terminal emulator, you’ll lose it), we can flush the commands as soon as we run them. To do that, we just need to add to .bashrc:
shopt -s histappend
export PROMPT_COMMAND='history -a'
Now we could decrease even more the $HISTSIZE
because we don’t need to keep in memory.
But there are two caveats:
- if you work with multiple terminals (like me) and is used to go back (arrow up) for the previous commands in a certain terminal, all your commands from all open terminals will be all mixed together, ordered by timestamp;
- it will increase disk IO since it is flushing into the
$HISTFILE
at every command.
So I changed back my mind, kept at the max 8192 and removed the append feature.
More info on bash’s variables: https://www.gnu.org/software/bash/manual/html_node/Bash-Variables.html
TL;DR
When opening a terminal having bash as $SHELL
, bash allocates $HISTSIZE * 8
bytes in memory to remember the last $HISTSIZE
commands. If your $HISTSIZE
is too big and you don’t have enough memory available, you’re gonna have a bad time.
I would like to thank Thiago Sousa for guiding me through my first GDB debugging, along with Patrick Gray, Andre Leite, Andre Dieb and Raissa Sarmento for reviewing this post.