RabbitMQ server issue — vhost ‘/’ is down

Pratibha Jagnere
3 min readOct 17, 2019

--

Even after using RabbitMQ for almost a year, it is still a mystery to me. I use it for my Celery tasks. My Django app put task in in AMQP and Celery pick it and process it.

It was one fine morning we had to restart the server.

After that all services started as expected except rabbitmq-server . After scratching heads for a day and reinstalling all the relevant libraries, issue persist.

After further debugging, we found that restarting the server has updated all the libraries to latest version and hence they are not compatible any more. Even after downgrading the libraries, rabbitmq-server was not coming up. We even tried the old version of erlang(14) and RabbitMQ(3.1.5) but no use.

So we deleted all the packages and started again. This time we installed zero dependency 64-bit Erlang RPM package that provides just enough to run RabbitMQ.

To use the most recent version on CentOS 6:

# In /etc/yum.repos.d/rabbitmq_erlang.repo
[rabbitmq_erlang]
name=rabbitmq_erlang
baseurl=https://packagecloud.io/rabbitmq/erlang/el/6/$basearch
repo_gpgcheck=1
gpgcheck=0
enabled=1
gpgkey=https://packagecloud.io/rabbitmq/erlang/gpgkey
sslverify=1
sslcacert=/etc/pki/tls/certs/ca-bundle.crt
metadata_expire=300
[rabbitmq_erlang-source]
name=rabbitmq_erlang-source
baseurl=https://packagecloud.io/rabbitmq/erlang/el/6/SRPMS
repo_gpgcheck=1
gpgcheck=0
enabled=1
gpgkey=https://packagecloud.io/rabbitmq/erlang/gpgkey
sslverify=1
sslcacert=/etc/pki/tls/certs/ca-bundle.crt
metadata_expire=300

Then install the package:

yum install erlang

This installed Erlang-22 on system.

Then I checked RabbitMQ and Erlang/OTP Compatibility Matrix to find the suitable rabbitMQ server. Then I downloaded the respective file of RabbitMQ installer from https://dl.bintray.com/rabbitmq/all/rabbitmq-server/

Everything setup and the rabbitmq-server dint crashed this time. But When I tried to connect from Celery, I started getting other error in Celery console,

consumer: Cannot connect to amqp://guest:**@127.0.0.1:5672//: Connection.open: (541) INTERNAL_ERROR — aess to vhost ‘/’ refused for user ‘guest’: vhost ‘/’ is down.`

In `/var/log/rabbitmq/rabbit@hostname_1.log`, below error was there

2019–10–16 13:18:18.473 [info] <0.801.0> accepting AMQP connection <0.801.0> (127.0.0.1:56828 -> 127.0.0.1:5672)
2019–10–16 13:18:18.476 [error] <0.801.0> Error on AMQP connection <0.801.0> (127.0.0.1:56828 -> 127.0.0.1:5672, vhost: ‘none’, user: ‘guest’, state: opening), channel 0:
{handshake_error,opening,
{amqp_error,internal_error,
“access to vhost ‘/’ refused for user ‘guest’: vhost ‘/’ is down”,
‘connection.open’}}
2019–10–16 13:18:18.476 [info] <0.801.0> closing AMQP connection <0.801.0> (127.0.0.1:56828 -> 127.0.0.1:5672, vhost: ‘none’, user: ‘guest’)`

At first, I was not able to understand the issue and all the StackOverflow links dint helped either. So I looked into RabbitMQ code and tried to restart vhost using `sudo rabbitmqctl restart_vhost`. But I got below error

Trying to restart vhost ‘/’ on node ‘rabbit@hostname_1’ …
Failed to start vhost ‘/’ on node ‘rabbit@hostname_1’
Reason: {:shutdown, {:failed_to_start_child, :rabbit_vhost_process, {:badmatch, {:error, {{{:badmatch, {:error, {:not_a_dets_file, ‘/v ar/lib/rabbitmq/mnesia/rabbit@hostname_1/msg_stores/vhosts/628WB79CIFDYO9LJI6DKMI09L/recovery.dets’}}}, [{:rabbit_recovery_terms, :open_table, 1, [file: ‘src/rabbit_recovery_terms.erl’, line: 197]}, {:rabbit_recovery_terms, :init, 1, [file: ‘src/rabbit_recovery_t erms.erl’, line: 177]}, {:gen_server, :init_it, 2, [file: ‘gen_server.erl’, line: 374]}, {:gen_server, :init_it, 6, [file: ‘gen_server .erl’, line: 342]}, {:proc_lib, :init_p_do_apply, 3, [file: ‘proc_lib.erl’, line: 249]}]}, {:child, :undefined, :rabbit_recovery_terms , {:rabbit_recovery_terms, :start_link, [“/”]}, :transient, 30000, :worker, [:rabbit_recovery_terms]}}}}}}`

Simultaneously, I was getting error in `/var/log/rabbitmq/rabbit@hostname_1.log`

2019–10–16 14:48:29.002 [error] <0.430.0> CRASH REPORT Process <0.430.0> with 0 neighbours crashed with reason: no match of right hand value {error,{not_a_dets_file,”/var/lib/rabbitmq/mnesia/rabbit@hostname_1/msg_stores/vhosts/628WB79CIFDYO9LJI6DKMI09L/recovery.d ets”}} in rabbit_recovery_terms:open_table/1 line 197
2019–10–16 14:48:29.003 [error] <0.429.0> Unable to recover vhost <<”/”>> data. Reason {badmatch,{error,{{{badmatch,{error,{not_a_dets _file,”/var/lib/rabbitmq/mnesia/rabbit@hostname_1/msg_stores/vhosts/628WB79CIFDYO9LJI6DKMI09L/recovery.dets”}}},[{rabbit_recovery _terms,open_table,1,[{file,”src/rabbit_recovery_terms.erl”},{line,197}]},{rabbit_recovery_terms,init,1,[{file,”src/rabbit_recovery_ter ms.erl”},{line,177}]},{gen_server,init_it,2,[{file,”gen_server.erl”},{line,374}]},{gen_server,init_it,6,[{file,”gen_server.erl”},{line ,342}]},{proc_lib,init_p_do_apply,3,[{file,”proc_lib.erl”},{line,249}]}]},{child,undefined,rabbit_recovery_terms,{rabbit_recovery_term s,start_link,[<<”/”>>]},transient,30000,worker,[rabbit_recovery_terms]}}}}`

We were blocked completely, and after 5 hours we found https://support.qlik.com/articles/000073759

TL:DR; They faced same issue after their server got rebooted. Apparently, reboot was not clean and it corrupted the vhost files in RabbitMQ folder

Resolution :

  1. Stop `rabbitmq-server`.
  2. delete the folder that contains the corrupt files. In this case: 628WB79CIFDYO9LJI6DKMI09L
  3. Start `rabbitmq-server`. It should work fine now.

Useful links:

https://www.rabbitmq.com/which-erlang.html
https://github.com/rabbitmq/erlang-rpm

https://support.qlik.com/articles/000073759

--

--