Best Practices for Scheduled Maintenances

Justin Thomas
Jul 27, 2017 · 4 min read

As an infrastructure team, all of us understand that at some point we need to take an important service offline temporarily, for scheduled maintenance or for upgrades. We matters most is to communicate thoroughly to avoid confusion and result in any loss of productivity.

Any user will expect a few days notice for planned downtime. It gives them the required hindsight as well allows them to look for some alternatives so that their Business does not suffer because of what their upstream provider is planning to do. The larger the impact on users, the longer the lead time needs to be.

Critical Components = 1 Week
Minor Components = 1 or 2 days

In most cases, 1 single email is not going to be enough. Depending on the expected impact, send one a week ahead of time, one a few days ahead of time, one the day before, and one the day of the planned downtime. Not everyone appreciates so many emails so its important to understand your customer base and go accordingly. An important aspect is to ensure that once the maintenance is over there is communication confirming that the task/activity is complete and all services are functioning as normal.

How to decide the best time for your maintenance

For any scheduled downtime its important to propose a time that is least inconvenient for the majority of your users. If your users are worldwide, you’re always going to affect some of them. But if there are time zones with with heavy impact than others, plan your maintenance around those user locations. Admittedly, this usually means it will be a very inconvenient time for the IT administrators. But it’s better for the company as whole to inconvenience a few administrators rather than large parts of the business.

What are the various modes of communication

Email is the main medium used, but you can also communicate via social media, web pages, internal communication methods (eg Hipchat/Slack) or on the application’s start page, as appropriate.

Its important to place a notice about upcoming downtime on the login page or main start page a few days ahead of time, so that users are able to plan their work around the proposed downtime. Simply putting a notice that the service is currently unavailable might not fly.

Structure your communication

Break communication down to the clear steps, at any point if the USER needs to take action, include very clear instructions on what to do.

Criticality and Notification

  • Let users know what new features or improvements this upgrade will bring. Knowing that they will benefit long-term from the downtime increases user acceptance.
  • How important is this downtime? Is this notice about a critical system, or only informational?
  • Is this planned or unplanned downtime? Scheduled maintenance? Replacement of older systems? Deployment of new systems? Update of existing systems?

Impact

  • What customers or users are affected along with the affected services?
  • What is NOT affected?
  • What users can/cannot do during this time.
  • What will happen if they attempt something that doesn’t work? For example, if they send email, with the email be lost/deleted, or will it be sent after the downtime?

Start Time and Estimated End Time

  • Be very clear on what date and time formats you are using, as this varies from culture to culture. “01.06.2017”, for example, could mean Jan 6th or June 1st.
  • Make sure to mention time zones in the information about the downtime
  • When calculating the estimated duration of the downtime, be sure to leave yourself a generous buffer. Allow your users to be pleasantly surprised that you’re back online earlier than planned if things go well. (Murphy’s Law)
  • If no estimate can be provided at the start, always ask clients to refer to the post for new updates.

Provide Progress Updates

  • Describe what you're currently doing to address the situation
  • Provide updates that are meaningful and not filled with technical jargons

Reassure

  • Assure users that the situation is under control.
  • Apologize for the inconvenience (even if it isn’t your fault).
  • Include contact information or methods to receive additional information, updates or to report issues that occur during or after the update.

Good to have

Link to official page where downtime information is published
Add to calendar option

Example

Justin Thomas

Written by

Internet Buff | Technology Freak | Wants to make a difference

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade