My first bash bug debugging

One day last week, I tried to open my terminal and I was surprised by this error:

-bash: xmalloc: cannot allocate 4000000016 bytes

Wat

Bash was trying to allocate 4Gb of memory? Why?

Looking at my Frankenstein .bashrc file, I decided that it was time to try to understand what each variable actually does. I like to keep everything I ever executed in my history, with timestamps and multi line support. I started cleaning it, and found out that this issue stopped happening once I removed the variables responsible for my bash’s history size.

These are the ones to blame:

export HISTSIZE=500000000 
export HISTFILESIZE=500000000

I had both $HISTSIZE and $HISTFILESIZE set up to 500,000,000 lines. It's a reasonable number, right? \s

But what do they actually do?

  • HISTFILESIZE: the maximum number of commands to store in your $HISTFILE file, which by default is ~/.bash_history.
  • $HISTSIZE: the maximum number of commands that will be kept in memory and flushed to disk (into $HISTFILE) once you exit the shell. It is also used to load the last $HISTSIZE lines from your history file into memory when you open a new shell.

Now back to the problem. Let’s invoke our friend strace. I left only the important lines of the trace:

$ strace -f bash
...
[pid 11677] open("/home/carla/.bash_history", O_RDONLY) = 3
[pid 11677] fstat(3, {st_mode=S_IFREG|0600, st_size=16755, ...}) = 0
[pid 11677] read(3, "ls\n#1478028902\n./i3-get-window-c"..., 16755) = 16755
[pid 11677] close(3) = 0
[pid 11677] chown("/home/carla/.bash_history", 1000, 100) = 0
[pid 11677] stat("/home/carla/.bash_history", {st_mode=S_IFREG|0600, st_size=16755, ...}) = 0
[pid 11677] open("/home/carla/.bash_history", O_RDONLY) = 3
[pid 11677] fstat(3, {st_mode=S_IFREG|0600, st_size=16755, ...}) = 0
[pid 11677] read(3, "ls\n#1478028902\n./i3-get-window-c"..., 16755) = 16755
[pid 11677] close(3) = 0
[pid 11677] mmap(NULL, 4000002048, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Cannot allocate memory)
[pid 11677] brk(0xf0e0a000) = 0x2746000
[pid 11677] mmap(NULL, 4000137216, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Cannot allocate memory)
[pid 11677] mmap(NULL, 134217728, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x7f9e145ae000
[pid 11677] munmap(0x7f9e145ae000, 61153280) = 0
[pid 11677] munmap(0x7f9e1c000000, 5955584) = 0
[pid 11677] mprotect(0x7f9e18000000, 135168, PROT_READ|PROT_WRITE) = 0
[pid 11677] mmap(NULL, 4000002048, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Cannot allocate memory)
[pid 11677] write(2, "-bash: xmalloc: cannot allocate "..., 49-bash: xmalloc: cannot allocate 4000000016 bytes ) = 49
[pid 11677] exit_group(2) = ?
[pid 11677] +++ exited with 2 +++
...

strace is telling me that as part of starting bash, it reads my .bash_history and then allocate memory right after. It was trying to allocate 4,000,000,016 bytes (4Gb) of memory. My first reaction was to blame Google Chrome for using all of my memory, causing memory starvation of my dear friend, bash.

This is all the information I have until this point: the history variables in my .bashrc, $HISTSISZE and $HISTFILESIZE, set to remember and store 500,000,000 commands, was causing bash to allocate 4Gb of memory upon start.


My Hypothesis

Let’s do some math.

How are these two numbers related? My first guess was to divide one by the other:

(4000000000 + 16) / (500000000 + 2) = 8 bytes per line, where this 2 is being added for some unknown reason.

strace can’t help me any further at this time, so I downloaded the source code. I made sure I had the same version as the one installed on my laptop (GNU bash, version 4.4.0(1)-release).

Now it’s time to use GDB. The code needs to be compiled with extra flags to allow the debugging.

$ cd ~/bash/bash/
$ ./configure
$ CFLAGS='-g -Wall -Wextra' make

I cleaned my .bashrc, leaving only these two lines:

After glancing into the code (a.k.a using grep and find for the word ‘history’), I found a file I thought would lead me to the reason it was trying to allocate memory. Let’s look atlib/readline/history.c, specifically the function add_history (line_start), defined at line 276.

Now we’re good to go. Digging into the code with gdb:

BOOM! Int overflow.

Let’s try to understand what happened here.

This is the important information for us at the moment: history_max_entries equals 500000000, which is the value I have for $HISTSIZE on my .bashrc.

How does it end up there? It all starts when bash wants to load history: load_history() at bashhist.c:300. It fetches the value of $HISTSIZE from your environment.

Let’s see what this function sv_histsize on line 12 does. It is defined at lib/readline/bind.c line 1825.

As a bonus we also can see that it sets the default value to 500 again. Why is it set twice, I’m not sure.

I see that nval will have my $HISTSIZE value. Now let’s see what the function stifile_history (nval).

Remember the variable history_max_entries? The function above, on line 25, sets its value to 500000000.

As we saw in the GDB output, 500000000was then assigned to the variablehistory_size, and later multiplied by 8 as argument to the functionxmalloc(), causing the int overflow.

This bug was introduced on September 15th (a0c0a00), on the 4.4 release and fixed in patch 1 on November 14th (8ddc8d6). The fix was to set a maximum limit for initial memory allocation:


Update

Nenad Stojanovski pointed that my conclusion is not correct. The real reason for bash’s crash wasn’t an int overflow, but lack of memory available. The detailed explanation is available here.

What I learned

There is no point into setting the $HISTSIZE more than 8192, since its limit was now set on the patch (unless it is being used elsewhere, which I didn’t look for. If so, please let me know).

So I changed my .bashrc to:

export HISTSIZE=8192 
export HISTFILESIZE=500000000

But then I remembered that I read somewhere we could improve bash’s history even more. Instead of keeping in memory the commands to later be appended to history file (and if you crash your terminal emulator, you’ll lose it), we can flush the commands as soon as we run them. To do that, we just need to add to .bashrc:

shopt -s histappend 
export PROMPT_COMMAND='history -a'

Now we could decrease even more the $HISTSIZE because we don’t need to keep in memory.

But there are two caveats:

  • if you work with multiple terminals (like me) and is used to go back (arrow up) for the previous commands in a certain terminal, all your commands from all open terminals will be all mixed together, ordered by timestamp;
  • it will increase disk IO since it is flushing into the $HISTFILE at every command.

So I changed back my mind, kept at the max 8192 and removed the append feature.

More info on bash’s variables: https://www.gnu.org/software/bash/manual/html_node/Bash-Variables.html


TL;DR

When opening a terminal having bash as $SHELL, bash allocates $HISTSIZE * 8 bytes in memory to remember the last $HISTSIZE commands. If your $HISTSIZE is too big and you don’t have enough memory available, you’re gonna have a bad time.


I would like to thank Thiago Sousa for guiding me through my first GDB debugging, along with Patrick Gray, Andre Leite, Andre Dieb and Raissa Sarmento for reviewing this post.