On Database Comparison & Recovery

Where to start when your database is corrupt

David Stevens
4 min readAug 10, 2014

This Monday afternoon I got an all-too-ominous chat from our IT director, “can you come here a sec? I have a favor to ask”. Every developer knows what that means, but we love a challenge, right? Plus NewSpring Web’s unofficial team motto is “Be Optimistic” (we even created a site for it).

Long story short, two databases had gotten out of sync and caused data corruption over several days. Backups were available but we had to go back at least a week to get non-corrupted data.

Questions we immediately started asking:

1) What’s missing from our last good backup?
2) How long will it take to fix?

Question #1 seems easy enough in the age of version control systems like GitHub and Dropbox. But most database providers don’t have change tracking, so how do we find that for a database?

My first thought was to compare counts of each table:

SELECT COUNT(1) FROM [BackupDB].dbo.tblDLDocument
SELECT COUNT(1) FROM [CorruptDB].dbo.tblDLDocument

There’s just a slight problem: SQL Server doesn’t allow you to query a table with corruption issues (pretty sure MySQL doesn’t either).

Enter layer 2 of the DBA toolbox: Sys.Objects & Information_Schema. These tables are like internal indexes on your table structure, size, and code. Even if a table isn’t live, the system index will give you the last known state that table was in. I modified this query I found on Google:

USING [CorruptDB]
DBCC UPDATEUSAGE(0) WITH COUNT_ROWS, NO_INFOMSGS

SELECT DBName = DB_NAME(),
TableName = so.Name,
Rows = SUM(CASE WHEN si.IndID IN (0,1)
THEN si.Rows ELSE 0 END),
RowModCtr = MIN(si.RowModCtr)
FROM dbo.SysObjects so,
dbo.SysIndexes si
WHERE si.ID = so.ID AND si.IndID IN (0,1,255)
AND so.XType = 'U' AND PERMISSIONS(so.ID) <> 0
GROUP BY so.Name, so.UID, so.ID
ORDER BY TableName

So far so good. Now I know the approximate amount of data affected for each database. DBCC CHECKDB gave me a lot more info, but I won’t go into that — read this article if you’re curious.

Question #2 is the real crossroads in this situation. How important is a week’s worth of data? Should we pay for a 3rd-party tool/developer to recover it? Do we have a paper/email trail to reconstruct that data? What happens if we ______ ?

Side note: we discussed all of these questions with affected parties face-to-face. I’ve learned at least one thing from our senior pastor/leadership @ NewSpring, and that’s to address issues in person: Why Email Doesn’t Work, Be Honest, & Why So Angry?.

Based on the data I pulled, we landed on two solutions: have our system provider’s development team merge the higher priority database and reconstruct data manually in the other.

Not knowing anything about the second database, I quickly needed to get all data that didn’t exist from a week prior. I started writing queries for all 34 tables and then realized something — the columns I was joining on already existed in Information_Schema. So I wrote this:

DECLARE @OldDatabase varchar(50)
DECLARE @NewDatabase varchar(50)
SELECT @OldDatabase = 'BackupDB'
SELECT @NewDatabase = 'CorruptDB'

;WITH tableSchema AS (
SELECT k.* FROM information_schema.columns c
INNER JOIN information_schema.key_column_usage k
ON c.column_name = k.column_name
AND c.table_name = k.table_name
), firstPK AS (
SELECT table_name, column_name as 'primarykey'
FROM tableSchema
WHERE ordinal_position = 1
)
SELECT DISTINCT '-- ' + c.table_name + CHAR(13) +
'SELECT new.*' + CHAR(13) +'FROM [' + @NewDatabase + '].dbo.' +
c.table_name + ' new ' +
CHAR(13) + 'LEFT JOIN [' + @OldDatabase + '].dbo.' +
c.table_name + ' old ' + CHAR(13) + STUFF((
SELECT ' AND new.' + c2.column_name + ' = old.' +
c2.column_name
FROM tableSchema c2
WHERE c2.TABLE_NAME = c.TABLE_NAME
GROUP BY ' AND new.' + c2.column_name +
' = old.' + c2.column_name
FOR XML PATH('')
), 1, 5, 'ON '
) + CHAR(13) + 'WHERE old.' + f.primarykey + ' IS NULL;' + CHAR(13)
FROM tableSchema c
INNER JOIN firstPK f
ON c.table_name = f.table_name

This matches each table from the corrupt database to the backup based on the primary key(s), and returns those rows that don’t exist in the backup:

/* Returns data from tbl_WorkSheets */
SELECT new.*
FROM [RepairedDB].dbo.tbl_WorkSheets new
LEFT JOIN [BackupDB].dbo.tbl_WorkSheets old
ON new.User_ID = old.User_ID
AND new.wsID = old.wsID
AND new.wsType = old.wsType
WHERE old.wsID IS NULL;

Once I had the missing rows, I gave it to our department to start reconstructing the rest of the data. I’m not sure whether we fully recovered what was lost, but it was a harsh lesson that we learned from, and hopefully these are queries that you never have to use.

--

--

David Stevens

Converted introvert. (Trying to) love others the way I’m loved. #OSS advocate & developer working remotely