When your ‘Best Practices’ are only your ‘Best guess’ with regards to SQL Server configuration.
We are running a survey to delve deeper into our understanding of the points around SQL Server Configuration outlined below, it only takes a few minutes to complete and you can win $100 in vouchers for taking part. To enter please visit the survey here.
SQL Server configuration plays a vital role in health of your SQL Estate. Your needs may vary depending on the purpose of the server, or its role within a business, but a healthy configuration ensures availability and performance for your customers.
We’ve been thinking hard about SQL server configuration in Foundry, Redgate’s R&D division. We know misconfiguration leads to estate performance issues that can hit your colleagues and customers hard.
“It’s the silly things that cause issues later on…”
Our definition of “configuration”
The most common settings — TempDb, Collation, File growth settings, Min/Max Memory, MaxDOP, Cost Threshold of Parallelism — they’re the settings that need to have a keen eye kept on them on if you don’t want issues to strike out of the blue.
But keeping your estate healthy through the lens of configuration is a broader topic than simply tweaking dials. Let’s look at each of the scenarios in which configuration is a prominent concern.
- 1) Establishing: The definition and documentation of configuration; what’s ‘best practice’ for us.
- 2) Creating: The act of standing up a new server and making it fit for purpose
- 3) Checking: Knowing that the server is running on the correct configuraiton.
- 4) Modifying: Rolling out improvements and patches across your estate.
- 5) Tracing & Troubleshooting: The diagnosis and resolution of configuration issues
Lets look at these in more detail in turn.
1 ) Establishing
The definition and documentation of configuration; what’s ‘best practice’ for us.
Simple, standardised configuration ensures consistency, and ease of deployment. Developing and documenting ‘ideal’ or ‘Best Practice’ achieves this.
“If I can standardise my configuration I can understand, diagnose and resolve issues faster”
Some go further, developing ‘run books’, ensuring best practise is followed by providing step by step instructions on standing up a new server.
Even so, problems arise:
“We had an issue where we lost a client, just last week, another team had changed the documentation, we set up the server based on that info it wasn’t fit for purpose, it didn’t have enough drives, it would have cost a lot of money to fix, so the client retracted their business.”
Documents become outdated quickly. The moment anything is changed on the estate, the run book also needs updating. Keeping on top on this is vital, but time consuming.
“Once the server is been provisioned I still go through the checklist to check that it has been done to the standard, it’s better check then than find out in prod!”
Here we’re exploring options around ‘living’ documentation, that’s easily updated, enabling you to define and document SQL Server configuration from a central, connected location.
The act of standing up a new server and making it fit for purpose.
Even with best practise and a baseline to work from, bespoke needs — from customers and applications — require bespoke configuration. It’s down to you to decide how to accommodate this, whilst maintaining control.
Deviating from the norm produces opportunities for errors to creep in. Poor service to the customer, repetitional damage, or even the cancellation of a contract altogether are real risks.
One way to mitigate this risk is by only making a few ‘flavours’ of configuration available to the customer; This balances flexibility with security. Achieving this takes time, as different flavours need refining and adjusting over time in order to provide that balance of performance and control.
“After discussing, we work together to pick one of the 10 different types, this means that I keep it under control, and the customer gets what they want”
Not everyone has the luxury of time, so in this area we’re exploring ways to empower you to apply a configuration template, fine tune, and deploy to one or many servers as required.
Knowing that the server is running on the correct configuration.
Often, when configuration is set at provision time, it’s left alone for long periods of time. If nothing’s broken then there’s no problem, right?
Except, is it? What if settings are incorrect, the result of a configuration choice that doesn’t immediately create a problem, but lays dormant.
“The common errors are caused by people making simple mistakes. No one thinks “AD accounts?”, or “Max mem setting”, or “cost threshold” — Microsoft don’t address these setting by defaults so you need an experienced DBA to remember.”
You’re not always the person responsible for standing up the server in the first place. Many people are responsible for estates built upon over years by contractors and developers until it resembles a labyrinth that it’s down to the DBA to understand and organise.
Here we’re thinking about ways to simplify the identification, diagnosis and resolution, time which you could be spending on more valuable things to ensure the smooth running of your business.
Rolling out improvements and patches across your estate.
At some point configurations will need to be reviewed. They may require modifying. Patching and preparing for growth or high intensity workloads are common examples of this.
In most cases, this involved working through each server step-by-step, ensuring the changes are rolled out correctly.
“I had to go through and patch 13 individual servers, that meant going into each individual server in turn and change things manually, it took a lot of time.”
Over time, people develop, collect and curate scripts and tools to handle this. Crucially, it takes time and experience to achieve this.
If you’re not the curator of these tools and scripts, then knowing how to ‘drive’ them requires excellent documentation (see ‘run books’ above) and a robust testing process.
In the modification scenario, we’re looking at ways to provide a better connection between ‘how things should be’ (documentation) and how they are in the target environment. Can we learn from the Infrastructure as Code movement?
The diagnosis and resolution of configuration issues.
Everyone strives to be on top of their estate, able to react quickly to issues and ensure applications meet their SLAs.
A substantial amount of time is dedicated to ensuring alerts — notifications of issues — are configured correctly, raising the alarm before issues become crises.
But what about dormant misconfigurations? They won’t spark an alert because things are running ‘as expected’. Until they aren’t. The longer these issues remain dormant, the greater the distance between symptom and root cause.
We’ve heard horror stories about restaurants unable to process any payments at peak times due to issues caused by a config setting. It’s also often the customer that notices the performance drop before the DBA due to the lack of reporting or alerts set up to proactively monitor these settings.
“Around 1:45 PM we had a Severity 1 alert fire off relating to TempDB filling up on a production server. While I was not on-call, I decided to look around to see what caused this issue. At about the same time the on-call DBA responded to the alert and had fixed the issue. We had user running an Ad Hoc query that filled up the TempDB drive. This showed a gap in our TempDB settings as we thought our configuration globally was to auto-grow TempDB manually to the proper size for the drive then disable auto-growth. So I have added a KanBan task to perform an audit of all our servers.”
This incident was resolved quickly, but a full audit of the whole estate to review configuration was still required as a result. For some, the worst case scenario involves engaging with (expensive) specialist consultants.
In this scenario, we’re exploring ideas around developing faster detection method for dormant symptoms, followed by faster diagnosis.
We’re in the early stages of researching where the main challenges lie within these 5 stages of a configuration process. To ensure that we continue to work on the right problems and develop the right tools to support our customers both now and in the future.
If you have any thoughts, ideas or experiences you feel would be beneficial to aid us in this journey, please sign up and we’ll arrange a time to talk.