PHP 7 — PHP FPM — Nginx — Varnish Cache 502 Bad Gateway Error

After running apt-get update then apt-get upgrade my server was unresponsive and showing 502 Bad Gateway which any SysAdmin who come across it will know that its a very unhelpful error.
You have to start the process of elimination and as always start with the Nginx log files and as this is usually an issue with PHP-FPM (if you’re running that) you should also check its logs too, not gonna lie there wasn’t a log in sight that I hadn’t checked (dmesg, syslog etc etc.) and every article (like this one) I came across was stating that this is fixed by editing listen values for PHP FPM.
But my server was running perfectly fine before the update and my config files were all correct so I was dubious that it was related.
I ran strace nginx to see what was happening at the kernel level and it was reporting a lot of SOCKET and EADDRINUSE warnings:
socket(PF_INET6, SOCK_STREAM, IPPROTO_IP) = 16
setsockopt(16, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
setsockopt(16, SOL_IPV6, IPV6_V6ONLY, [1], 4) = 0
ioctl(16, FIONBIO, [1]) = 0
bind(16, {sa_family=AF_INET6, sin6_port=htons(8181), inet_pton(AF_INET6, “::”, &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, 28) = -1 EADDRINUSE (Address already in use)
gettid() = 8004
write(5, “2017/02/24 16:58:21 [emerg] 8004”…, 95) = 95
write(2, “nginx: [emerg] bind() to [::]:80”…, 71nginx: [emerg] bind() to [::]:8080 failed (98: Address already in use)
) = 71
close(16)As well as that the checking all of the Nginx logs (tail -f /var/log/nginx/*) were reporting a similar kind of situation over and over:
… [error] 3711#3711: *20 connect() failed (111: Connection refused) while connecting to upstream, ……
and
….. [emerg] 8004#8004: bind() to [::]:8181 failed (98: Address already in use)…..
I checked all the services, this server happens to be running varnish as a proxy on front of Nginx so it was starting to look like there was a miscommunication between them.
The head scratcher was that after checking my configs, and absolutely everything that could cause this, it still was down.
On the verge of going prematurely bald I began to start from the very beginning checking the /etc/varnish and /etc/default/varnish config file’s, keeping in mind I had altered the default configuration in /etc/default/varnish when setting it up ages ago so that it would listen on port 80 rather than 6081… this was all still fine..
Absolutely stumped I stumbled on this bug report with the following heading:
“ varnish doesn’t source /etc/default/varnish when started but uses it when reloaded ”
To cut a long story short basically the file that is in /etc/default/varnish should be copied to /lib/systemd/system/varnish.service but after updating the server for whatever reason this file was replaced with the default varnish configuration file and with that it had removed my port 80 edits and reset it back to port 6081.
The worst part about this kind of issue is that it leads you to a dead end and if I hadn’t found that bug report this probably would have taken a lot of time to track down.
Hopefully this saves someone else the hassle.
