Data Masking : Concept, Tools, Masking polices & Healthcare Data Masking

Samadhan Kadam
Petabytz
Published in
11 min readJun 21, 2019

What is data masking?

Data masking is a method of creating a structurally similar but inauthentic version of an organization’s data that can be used for purposes such as software testing and user training. The purpose is to protect the actual data while having a functional substitute for occasions when the real data is not required.

Although most organizations have stringent security controls in place to protect production data in storage or in business use, sometimes that same data has been used for operations that are less secure. The issue is often compounded if these operations are outsourced and the organization has less control over the environment. In the wake of compliance legislation, most organizations are no longer comfortable exposing real data unnecessarily.

In data masking, the format of data remains the same; only the values are changed. The data may be altered in a number of ways, including encryption, character shuffling, and character or word substitution. Whatever method is chosen, the values must be changed in some way that makes detection or reverse engineering impossible.

Vendors of data masking products include Compuware, dataguise, IBM, Informatica and Oracle.

Who Uses Data Masking?

In 2018, venture organizations are learning they should join information concealing into their security technique particularly in light of the General Data Protection Regulation (GDPR) prerequisites.

On the off chance that you are perusing this, at that point you are most likely mindful that GDPR orders all organizations that acknowledge information from EU natives to be in consistence with their administration standards by May 2018. For certain ventures, this has brought about the need to reinforce their security systems by joining information veiling best practices.

There are numerous kinds of information that can be ensured utilizing covering, however some ordinarily utilized in the business world incorporate the accompanying:

PII or Personally recognizable data

PHI or Protected wellbeing data

PCI-DSS or Payment card data

ITAR or Intellectual property

The majority of the above models are liable to consistence with administration standards.

Types of Data Masking

There a couple of various sorts of information veiling to know about as you consider subsequent stages. Most specialists would concur that information veiling is static or dynamic, with one exemption — on-the-fly information covering. Here’s a gander at three principle sorts of information veiling:

Static Data Masking

Static information veiling alludes to the procedure where significant information is conceal in the first database condition. The substance is copied into a test situation, and would then be able to be shared around outsider merchants or other important gatherings.

Information is covered and removed in the generation database and moved into the test database. While this might be an important procedure for working with outsider experts, it’s not perfect. That is on the grounds that all through the way toward covering information for a copy database, genuine information is separated which can leave an indirect access open that energizes breaks.

Dynamic Data Masking

In powerful information covering, computerization and guidelines enable IT offices to verify information continuously. That implies it never leaves the creation database, and accordingly is less powerless to dangers.

Information is never presented to the individuals who access the database in light of the fact that the substance are confused progressively, making the substance inauthentic.

An asset called a dynamic covering instrument finds and veils particular sorts of delicate information utilizing an invert intermediary. Just approved clients will most likely observe the bona fide information.

Worries from dynamic information concealing for the most part originate from database execution. In an undertaking domain, time is cash and even milliseconds have esteem. Notwithstanding time contemplations of running such an intermediary, regardless of whether the intermediary itself is secure can be a reason for concern.

Data Masking Best Practices

With regards to your association’s procedures, you need to gain from the best. The following are best practices for making a procedure that works for information concealing inside your association:

Discover information: This initial step includes distinguishing and recording the different sorts of information that might be delicate. This is regularly done by business or security experts who set up together an exhaustive posting of big business wide information components.

Survey the circumstance: This stage requires oversight from the security director who is in charge of deciding whether delicate data is available, the area of the information and the perfect information concealing strategy.

Actualize veiling: Remember that for enormous associations, it isn’t possible to expect that a solitary information concealing apparatus can be utilized over the whole venture. Rather, usage must consider engineering, appropriate arranging and a look to future endeavor needs.

Test information veiling results: This the last advance in the information concealing procedure. QA and testing are required to guarantee the concealing designs yield the ideal outcomes. On the off chance that they don’t, at that point the DBA will reestablish the database to the premasked state, changes the covering calculations and finishes the information veiling process yet again.

Masking policy options Mask Mode

Use one of the following options to specify modes of masking data:

Repeatable Masking

The first four digits of the credit card number are copied from the source to the output and the rest of the digits are masked. This type of masking is repeatable for data from the same source, regardless of the order.

Use 4 issuer digits

The first four digits of the credit card number are copied from the source to the output. The remaining part of the credit card number is appended with the masked account number and a check digit. A check digit is a digit added to a number that validates the authenticity of the number. When this option is used, different runs for the same input can result in different numbers. The uniqueness of the number is guaranteed only when the Data Masking stage job runs in the sequential mode or runs on one node.

Use 6 issuer digits

The first six digits of the credit card number are copied from the source to the output. The remaining part of the credit card number is appended with the masked account number and a check digit. When this option is used, different runs for the same input can result in different numbers. The uniqueness of the number is guaranteed only when the Data Masking stage job runs in the sequential mode or runs on one node.

Examples

The following examples show what the masked data might look like after the masking policy is applied. In these examples, the original value is 3400 1100 0000 063.

Data masking examples for credit card number

What is EHR (Electronic Health Records)?

In Electronic Health Records (EHR’s) data masking, or controlled access, is the process of concealing patient health data from certain healthcare providers. Patients have the right to request the masking of their personal information, making it inaccessible to any physician, or a particular physician, unless a specific reason is provided. Data masking is also performed by healthcare agencies to restrict the amount of information that can be accessed by external bodies such as researchers, health insurance agencies and unauthorised individuals. It is a method used to protect patients’ sensitive information so that privacy and confidentiality are less of a concern. Techniques used to alter information within a patient’s EHR include data encryption, obfuscation, hashing, exclusion and perturbation.

Healthcare Data Masking: Tokenization, HIPAA and More

When attempting to shield your information from the loathsome spirits that might want access to it (?), there are a few alternatives accessible that apply to unmistakable use cases. With the goal for us to discuss the various arrangements — it is essential to characterize the majority of the terms:

PII — Personally Identifiable Information — any information that could possibly recognize a particular person. Any data that can be utilized to recognize one individual from another and can be utilized for de-anonymizing unknown information can be considered PII

GSA’s Rules of Behavior for Handling Personally Identifiable Information — This order gives GSA’s approach on the most proficient method to appropriately deal with PII and the outcomes and remedial moves that will be made whether a break happens

PHI — Protected Health Information — any data about wellbeing status, arrangement of human services, or installment for medicinal services that can be lined to a particular person

HIPAA Privacy Rule — The HIPAA Privacy Rule builds up national measures to secure people’s medicinal records and other individual wellbeing data and applies to wellbeing plans, social insurance clearinghouses, and those human services suppliers that direct certain human services exchanges electronically. The Rule requires fitting shields to secure the protection of individual wellbeing data, and sets points of confinement and conditions on the utilizations and exposures that might be made of such data without patient approval. The Rule additionally gives patients rights over their wellbeing data, including rights to look at and get a duplicate of their wellbeing records, and to demand redresses.

Encryption — a technique for ensuring information by scrambling it into a muddled structure. It is an efficient encoding process which is just reversible with the correct key.

Tokenization — a strategy for supplanting delicate information with non-touchy placeholder tokens. These tokens are swapped with information put away in social databases and records.

Information veiling — a procedure that scrambles information, either a whole database or a subset. In contrast to encryption, veiling isn’t reversible; not normal for tokenization, covered information is valuable for restricted purposes. There are a few sorts of information concealing:

Static information concealing (SDM) covers information ahead of time of utilizing it. Non creation databases veiled NOT continuously.

Dynamic information concealing (DDM) covers generation information progressively

Information Redaction — veils unstructured substance (PDF, Word, Excel)

Tokenization

For tokenization of PHI — there are numerous bits of information which must be packaged up in various ways for a wide range of crowds. Utilizing the tokenized information expects it to be de-tokenized (which generally incorporates a decoding procedure). This acquaints an overhead with the procedure. An individual’s medicinal history is a blend of therapeutic properties, specialist visits, redistributed visits. It is an entrapped set of individual, monetary, and therapeutic information. Various gatherings need access to various subsets. Every group of spectators needs an alternate cut of the information — yet should not see its remainder. You have to issue an alternate token for every single group of spectators. You will require a very refined token administration and following framework to partition up the information, issuing and following various tokens for every group of spectators.

List Of The Best Data Masking Tools

  1. DATAPROF — Test Data Simplified

DATPROF delivers several tools:

  • DATPROF provides a smart way of masking and generating data for testing the database
  • DATPROF has a patented algorithm for subsetting database in a really simple and proven way
  • The software is able to handle complex data relationship with an easy to use interface
  • It has a really smart way to temporarily bypass all triggers, constraints and indexes so it is the best performing tool in the market.

Download Link: DATPROF — Test Data Simplified

2. Oracle Data Masking And Subsetting

Oracle Data Masking and Subsetting benefits database clients to advance security, quicken submission, and cut IT prices.

It helps in removing the duplicates for testing of data, development, and other actions by removing redundant data and files. This tool suggests data plotting and uses masking description. It comes up with encoded guidelines for HIPAA, PCI DSS, and PII.

Features:

  • Discovers Complex Data and its relationships automatically.
  • Wide Masking Plan Library and enhanced Application Models.
  • Revolutions of complete data masking.
  • Fast, Secure and Assorted.

Pros:

  • It proposes various customs for masking data.
  • It supports non-oracle databases as well.
  • It takes less time to run.

Cons:

  • High-cost.
  • Less secured for development and testing environments.

Pricing: Contact for Pricing.

URL: Oracle Data Masking and Subsetting

3. Delphix

Delphix is a fast as well as secure data masking tool for masking data across the company. It comes up with encoded rules for HIPAA, PCI DSS, and the SOX.

The Delphix Masking Engine is combined with a Delphix data virtualization platform to save and store data loading. DDM exists through a partnership company with HexaTier.

Features:

  • End-to-end data masking and creates reports for the same.
  • Masking Combined with data virtualization to progress transport of the data.
  • Easy in use as no training is required to mask data.
  • It migrates data steadily across sites, on-premises or in the cloud.

Pros:

  • Easy and in-time regaining of records.
  • Virtualization of databases.
  • Data refreshing is fast.

Cons:

  • High cost.
  • SQL server databases are slow and limited.
  • Reliant on NFS old protocols.

Pricing: Contact for pricing.

URL: Delphix

4. Informatica Persistent Data Masking

Informatica Persistent Data Masking is an accessible data masking tool that helps an IT organization to access and manage their most complex data.

It delivers enterprise scalability, toughness, and integrity to a large volume of databases. It creates a reliable data masking rule across the industry with a single audit track. It allows to trail actions for securing sensitive data via complete audit logs and records.

Features:

  • Supports Robust Data Masking.
  • Creates and integrates the masking process from a single location.
  • Features to handle a large volume of databases.
  • It has wide connectivity and customized Application Support.

Pros:

  • Decreases the risk of Data Break via a single audit trail.
  • Advances the Quality of Development, testing and Training events.
  • Easy deployment in the workstations.

Cons:

  • Need to work more on UI.

Pricing: A 30-day free trial is available.

URL: Informatica Persistent Data Masking

5. Microsoft SQL Server Data Masking

Dynamic Data Masking is a new safety feature announced in SQL Server 2016 and it controls unlicensed users to access complex data.

It is a very easy, simple and a protective tool that can be created using a T-SQL query. This data security procedure determines complex data, through the field.

Features:

  • Simplification in designing and coding for application by securing data.
  • It doesn’t change or transform the stored data in the database.
  • It permits the data manager to choose the level of complex data to expose with a lesser effect on the application.

Pros:

  • End operators are prohibited from visualizing complex data.
  • Generating a mask on a column field doesn’t avoid updates.
  • Changes to applications are not essential to read data.

Cons:

  • Data is fully accessible while querying tables as a privileged user.
  • Masking can be unmasked via CAST command by executing ad hoc query.
  • Masking cannot be applied for the columns like Encrypted, FILESTREAM, or COLUMN_SET.

Pricing: Free trial is available for 12 months.

URL: Dynamic Data Masking

--

--