Migrating to Azure SQL DB
In this post, I’m going to discuss our experience of migrating a database from an on-premise SQL 2016 Enterprise Edition installation to Azure SQL DB, along with some of the challenges we faced throughout the process.
This was part of a larger project to move our infrastructure to Azure, as we previously had a hybrid infrastructure, comprising a mixture of bare metal hardware, and a number of Azure hosted services. The incentive for us to migrate was due to a desire to consolidate our infrastructure, rather than having to use multiple hosting services. This would also allow us to utilise the benefits of Azure hosting, such as being able to scale our resources on demand, and reducing the need to for us to spend time on tasks like applying updates to servers.
We chose our Australian servers as our initial candidate for migration as they have a smaller workload and database size than our UK servers, with the idea being that if we were able to successfully migrate those we could later apply the same processes to migrate our larger UK set up.
We had several questions when we initially considered migrating:
- Would everything function as expected?
- What performance tier would we require?
- Would the performance tier we require be cost-effective?
Assessment
We had been looking at doing this for a while and weren’t sure what the experience would be like compared with our existing setup. Previously, everything was running on bare metal hardware, with an IIS server and a 2 node SQL cluster, with each node having 128GB RAM and 8 Core HT enabled Intel CPUs.
The first challenge we faced was in how to go about moving the database. The database we were looking to move was approximately 200 GB and was used by our online reservation service, which is available 24/7. This limited our options for moving the database as we would need to aim to keep downtime to a minimum.
As Azure SQL DB does not support native backup/restore from SQL Server, the options available were to either use a BACPAC export/import, which would not be acceptable due to the downtime required to do this, or transactional replication, which was the method we chose to use. We also looked at what other options might be available to us, such as SQL Managed Instances, but unfortunately, they were not really a viable option at the time, as they had only quite recently become available in public preview.
Preparation
The first step was to run the Microsoft Data Migration Assistant to identify any potential blockers that would stop us from moving. This highlighted a few things we would need to address prior to the migration, but nothing too problematic. We had a number of columns using deprecated data types which we had to update and convert to use newer data types. We also had a few cases of procedures where outdated join syntax was in use and some database objects which were not supported in Azure that we were using for things like integrating alerts with Slack.
In addition to using the DMA, I also imported our database into Visual Studio as a database project which was helpful in identifying broken references where we had removed some older database objects and these were still referenced by procedures or code which was no longer in use.
Initial Testing
After we addressed the issues highlighted by the data migration assistant, we were ready to have a go at the process of actually moving the database using replication.
I started with a much smaller copy of our database so that I could test the process and ensure that everything worked as expected. This went smoothly, but I had decided for the first run to just go ahead with the default settings to see what the end result was, with the assumption that by default everything would just be an exact copy. This proved not to be the case, as I soon realised that there are a number of settings which are not as you might expect by default. For example, the default publication settings will not include check or foreign key constraints.
There were also some issues that did not appear until we repeated the process with a live copy of the database. These were mainly caused by not having the NOT FOR REPLICATION property set. This is a property which can be used to avoid issues with identity columns and also to prevent triggers from firing when the replication agent inserts data. The first problem we encountered was an error with the subscription database where there was a discrepancy between ID values on the publisher and subscriber:
The row was not found at the Subscriber when applying the replicated DELETE command for Table ‘[dbo].[tablename]’ with Primary Key(s): [id_columnname]
On a later test, we also ran into a problem after we had originally set all of our triggers and identity columns to have the NOT FOR REPLICATION property, but one stored procedure had this property reset after we deployed a code change in between tests. Somewhat inexplicably, this caused a different problem, where the replication snapshot would be uploaded to Azure and immediately after it had been applied, it would reinitialise and start the whole process over again without any sort of error or warning messages.
Refining the Process
We made the decision to only replicate clustered indexes and to script out nonclustered indexes and create them later, as this would significantly reduce the length of time for the replication snapshot to be applied, as it reduced the size of the database by about 40%. After the initial snapshot had been applied, there were a few steps required before the database could be used:
- Create required logins along with the correct permissions, e.g. logins used by the application to connect to the database.
- Create all nonclustered indexes, ensuring that they have the desired options set.
- Run a full statistics update.
- Compare database object counts & names.
- Compare tables and row counts.
- Create any objects that weren’t migrated — user-defined data types should be scripted, along with any tables which do not have primary keys.
- Set fill factors for clustered indexes.
- Reseed identity values.
For this database, the process of uploading the initial snapshot and then creating nonclustered indexes etc. took around 9 hours. Steps like creating the nonclustered indexes were completed whilst the data was still being replicated to get the database as close to the production copy as possible before testing the application against it.
Testing Application Functionality
With a fully functional copy of the database in Azure SQL DB we went about testing some of the functionality by setting up a copy of our application and navigating around, making bookings, etc. and just doing various tests to ensure that things worked as expected.
We also ran queries manually against the database to compare performance with the on-premise installation. This was an area where we ran into some problems as some queries were performing significantly worse than when they ran against the original database. Initially, I thought this was due to a statistics problem, though we later figured out it was due to the newly created database in SQL DB using the latest compatibility level and the SQL 2014 > cardinality estimator, which we had not been using before this.
After having gotten things working to our satisfaction, we then had the task of reproducing and testing using an actual workload against the database and gauging the performance tier we would require to satisfy our production workload.
Load Testing
Microsoft’s guidance on selecting a performance tier is typically to choose the tier where you can get the features you want, start at a low level, and scale up as necessary, however there is also the DTU Calculator, which can be used to give an approximation of what sort of performance tier to look at based on a set of counters collected over a period with a representative database workload.
I used the command line utility for the DTU calculator and collected the required perfmon counters over several different time periods, making sure to capture times when the application would be at its busiest but also quieter times to see how much variance there was between the performance tiers required.
After collecting the perfmon counters required, it was just a case of uploading them to the DTU calculator website, which analyses the log files and gives a breakdown of the workload requirements and which different performance tiers it fits within. There are some caveats with the DTU calculator as it does not account for factors such as processor speed, however, we found it to be worthwhile, as the tier we have ended up using is the one suggested by the calculator.
Using the suggestions from the DTU calculator as a starting point, we then looked to how we could test using a more realistic workload and found a tool called GoReplay, which can be used for replaying production traffic against a test server. This is covered in more detail in my colleague Adam’s post here.
Deciding on a Service Tier
Azure SQL DB offers multiple purchasing models. There is the DTU based model which has been in place for a number of years, and the more recently introduced vCore purchasing model, which was added alongside Managed Instances earlier in the year. We performed our load tests using both models to try and understand which best fit our requirements, though we found that at present the DTU based model was most appropriate.
The vCore model is more flexible in allowing users to make savings if they already own a SQL Server license using Azure Hybrid Benefit, and also allows scaling resources by the number of CPU cores desired, rather than using the blended measure of DTUs. It was our experience that for the number of vCores we required, the equivalent DTU performance tier was more cost-effective and provided better performance.
Performing the Migration
We repeated this process several times where we ran a load test during different periods of time to get a feel for how the Azure copy of the application & database coped, then we would remove everything and start again. After having completed multiple load tests and satisfied with the performance of the application in Azure, we repeated the same process to perform the live migration of our production workload, which we completed during a maintenance window, requiring less than 10 minutes of downtime.
Additional References:
SQL Server database migration to Azure SQL Database
Migrating from SQL Server to Azure SQL Database using Bacpac Files
Initializing Azure SQL Database with SQL Server Transactional Replication
Controlling Constraints, Identities, and Triggers with NOT FOR REPLICATION
Replication to SQL Database limitations
Using the DTU Calculator