AWS AI-Powered Health Data Masking: What Is it and How Does it Work?

Michael Zhang
Egen Engineering & Beyond
5 min readSep 20, 2019

In healthcare, data in electronic medical records contains sensitive and private information like name, DOB, Medical Record Number (MRN), and phone number. Any improper exposure or leak of that information could lead to terrible consequences. So, healthcare companies are looking for better ways to secure their data.

Masking healthcare data hasn’t always been an easy process. However, the good news is that Amazon Web Services (AWS) now provides AI-Powered Data Masking, and my first impression is that this service is a great utility all healthcare developers should consider using. Data masking is the process of hiding original data with modified content. Healthcare data will only get more complex, uncertain, and vulnerable to attacks, so data masking is becoming increasingly important.

I decided to test out the new API and document the process as well as the results. But first, let’s talk about what data masking is and what Amazon AI-Powered Health Masking can do for you.

What is AWS AI-Powered Health Masking?

AWS offers an incredible AI-Powered Health Mask to help healthcare organizations identify and mask healthcare data either in text or even images. This is great for organizations handling sensitive healthcare data, and makes the process of masking data properly seamless.

It is powered by the AWS artificial intelligence (AI) service behind a serverless API to identify and mask healthcare data.

Data Masking vs. Data Encryption

Many people have heard about data encryption but erroneously consider data encryption and data masking to be the same process. Although both enhance data protection, the fundamental difference between these two concepts is that data encryption requires reversibility while data masking is often irreversible.

Data encryption involves converting data into ciphered, unreadable text using certain encryption algorithms and keys. It is widely used when transferring sensitive data between networks and environments. Meanwhile, data masking is mainly used for testing sensitive data for development and research purposes. During testing, data is transferred through many hands and has a high risk of misuse. Therefore, the masking protects the original data and prevents the re-identification of information.

How you can deploy AWS Powered Data Masking

You can deploy this API using a CloudFormation template which deploys API gateway endpoints and also uses artificial intelligence solutions, like Amazon Comprehensive Medical to detect healthcare data with text and Amazon Rekognition to identify text with an image.

You can deploy this API using a CloudFormation template which deploys API gateway endpoints and also uses artificial intelligence solutions, like Amazon Comprehensive Medical to detect healthcare data with text and Amazon Rekognition to identify text with an image.You can follow this link for more details on the automated deployment. It consists of the following two procedures:

  1. Launch the stack
    - Download and launch the ai-powered-health-data-masking template
    - Create and name your stack
  2. Create an IAM policy to access the API
    - Follow the instructions in the link above
    - Don’t forget to add s3 access if you wish to test image masking through s3 bucket
    - The configuration will look like the following:

How do you test the AWS Health Masking API?

AWS also provides the template to test the API for both text and image. You can find more details in Appendix B of this link. Make sure you have your AWS credentials and s3 bucket set up properly before you proceed.

For Image testing, I used an X-ray example image provided by AWS and ran the sample code. The masking result will look like this:

What’s even better is the fact that the AWS API can recognize and successfully transform an image of handwritten data!

AWS provides this sample data which I copied onto a white board, took a picture of it, loaded it to my s3 bucket, and executed the code. The conversion looks like this:

This just looks absolutely amazing!

What if my image is in a PDF format?

So far, AWS only supports JPG, PNG, and DICOM formats. But what if my image is in a PDF format? Unfortunately, this API doesn’t handle PDF versions automatically yet, but we could easily write a Python script to check and convert the .pdf file into a form (e.g. .pdf) which AWS AI supports before the masking happens.

Personally, I use the Wand library, which is a ctypes-based simple ImageMagick binding for Python. You can install it using “pip install Wand”.

Given your file path (local or s3), you can use the following code to check whether the file extension is supported by AWS or not:

And convert it into a .jpg file (or other AWS-supported extensions):

Conclusion

In all, AWS AI-Powered Health Data Masking provides healthcare organizations a brilliant solution to identify and mask sensitive healthcare data in both image and text. Its deployment and testing are very easy as well. Although it is still your responsibility to comply with healthcare-related legal requirements because this tool doesn’t guarantee alignment with any regulatory framework, it’s still a very powerful tool that can be implemented into your data processing environment.

Reference links:

https://aws.amazon.com/solutions/ai-powered-health-data-masking/

https://docs.aws.amazon.com/solutions/latest/ai-powered-health-data-masking/deployment.html

https://s3.amazonaws.com/solutions-reference/ai-powered-health-data-masking/latest/ai-powered-health-data-masking.pdf

--

--