Data masking with sqlcmdcli

sqlcmdcli utility is a command-line utility for ad hoc, interactive execution of commands on SQL Server to simulate specific workloads, anonymize sensitive data and more!

Sergio Govoni
CodeX

--

Who develops a complex application such as an ERP, a solution for Logistics Management, a CRM or other software that needs an OLTP database, sooner or later will need to verify the features of the application on real data. It may be necessary to use a backup of the production database to verify the application with the debugging tools offered by the development environment. There are bugs that you cannot reproduce in the test environment! A copy of the production database may be necessary to troubleshoot or to verify how the queries are performing on real data, on tables with millions of records.

Have you happened to ask one of your most significant customers: “Can I take a backup of the production DB to perform tests?”

What did he/she answer ? What will he/she answer thinking the new European Regulation 679/2016 (known as GDPR) about the protection of personal data? In the best case the answer will be: “You can take a backup in compliance with the law but the data must be anonymous!”.

It involves masking the data on a backup of the production database and asking the customer to carry out the appropriate checks before taking it with the final authorization to proceed. Recent versions of SQL Server offer interesting data masking services but they must be enabled and they are not available in all versions of SQL Server.

I had this need, I could not use the data masking services of SQL Server so I implemented a specific command in the sqlcmdcli, the command-line utility that you can find in this GitHub repository.

The anonymizedb command anonymizes all text columns (char, nchar, varchar, nvarchar, text, ntext) longer than two characters in the database passed as a parameter; it performs the following steps:

  • Database schema analysis
  • Disabling the active Foreign Keys on text columns
  • Disabling the active Check Constraints on text columns
  • Disabling active Triggers
  • Executing the data masking algorithm
  • Enabling the Foreign Keys (previously disabled)
  • Enabling the Check Constraints (previously disabled)
  • Enabling the Triggers (previously disabled)

At the end you will get a database whose text columns have been anonymized.

The first implementation of the data masking algorithm is quite simple; for text columns less than or equal to 2000 characters, each character is shifted by a random number of characters within a range. For text columns greater than 2000 characters, the reverse function is performed. Future releases of sqlcmdcli will implement more sophisticated algorithms.

Below is an example to invoke the anonymizedb command on the AdventureWorks2017 database:

sqlcmdcli.exe anonymizedb -servername:YourTestServer -databasename:AdventureWorks2017 -username:YourSQLLogin -password:YourPassword -verbose

The execution will produce an output similar to the one shown in the following picture.

sqlcmdcli is written in Delphi, it is an opensource project, anyone who wants to contribute is welcome, the sqlcmdcli.exe is available in the Releases section of the repository.

The official wiki of the project is available here.

Enjoy!

--

--

Sergio Govoni
CodeX
Writer for

CTO at Centro Software, Microsoft Data Platform MVP