Azure Database for PostgreSQL Single Server intermittent restart and recovery due to undocumented behavior.

Matheus Oliveira
NAZAR
Published in
2 min readJul 13, 2020

--

If you are experiencing Azure Database for PostgreSQL intermittent restart and recovery, it may be due to internal Log file reaching 1 GB.

Recently, I was involved in a troubleshooting of a production Azure Database for PostgreSQL that was intermittently unavailable. As in all intermittent problems, I first tried to find metrics that could indicate the root cause. This time, there was no metric with any behavior change that could be correlated to the occurrences of the problem, so I opened a support ticket for the Microsoft team.

I shared every information I had about the problem and the application that used that database. After confirming all the evidences, the support attendant said that the issue needed to be escalated to the product group for further investigation.

After a few days, I received the answer from the support team saying:

“By Design, if the internal Log file (size determined by the level of Logging set in Portal) consistently reaches 1 GB, it degrades the Performance of the Postgres Single Server DB and can lead to Restart And Recovery to self-stablize the database."

The root cause of the problem was enlightened but this behavior is not mentioned in the documentation (https://docs.microsoft.com/en-us/azure/postgresql/concepts-server-logs). What the documentation says is that, once you reach 1 GB of log files, the oldest files will be deleted regardless of the retention period set.

"The short-term storage location can hold up to 1 GB of log files. After 1 GB, the oldest files, regardless of retention period, will be deleted to make room for new logs"

First, I think that this behavior should be explicit in the documentation. Second and most important, it is an important limitation of the Azure Database for PostgreSQL Single Server as, if you set the maximum retention period which is 7 days, it will only allow approximately 6 MB of log per hour.

With this limitation in mind, be careful with logging level and set an alert for "Sever Log Storage used". Also, reducing logging retention period to 1 day (default is 3 days) would reduce the probability of reaching the 1 GB limit.

--

--