The How and Why of Reversible Tokenization

Justin Patriquin
Cape Privacy (Formerly Dropout Labs)
3 min readSep 11, 2020

--

When Cape Python first launched it came with a tokenization transformation that allowed users to tokenize their data so they didn’t leak information about that column while the data was in use. In some cases it’s important that this tokenization can be reversed. With the recent releases of Cape Python 0.3.0 and Cape Core 0.0.2, it’s now possible to reverse that tokenization. In this post, we describe why reversibility can be useful, and demonstrate how adding it to your existing Cape tokenization workflows is quick and painless.

Use Case

In general, after you’ve made a privacy-preserving dataset, you’d want to avoid reversing the privacy protections. There are some use cases, where, after careful consideration, it’s essential that the protections can be reversed. One example use case is in fraud detection where ideally you’d be operating on tokenized or obfuscated data. But, once a fraudulent event is detected you’d want to reverse the tokenization so that you can link it to an actual entity committing the fraud. Tokenizing data with reversible tokenization enables this fraud detection use case in Cape Python and Cape Core.

Technical Details

Regular tokenization uses the hashing function SHA256 to hash the data. Hashes are not reversible so for reversible tokenization we used the encryption algorithm AES-SIV. This allows Cape Python to encrypt values using a key and then later decrypt them with the same key. If you don’t have the key, you can not reverse the encryption. We intentionally used AES-SIV because it’s deterministic. Any piece of data that is encrypted with the same key returns the same output. This enables the additional feature of linkability. If you have two datasets with similar data you will be able to link them together using only the tokenized data.

Cape Core leverages a key management system (KMS) to help manage any keys required for the system to function. The KMS is also useful for managing the keys required for reversible tokenization. When you upload a policy containing reversible tokenizations, Cape Core manages generating and securely storing the keys. When Cape Python requests that policy from Cape Core it comes with the key baked into the policy.

Examples

Cape Python Example

It is easiest to experiment with reversible tokenization using Cape Python only.

The full tutorial can be found here.

Cape Core Example

As you move into a production setting, it is important that you have Cape Core to help you protect your keys. See here for instructions on how to set up the Cape Core Coordinator on your computer. We recommend deploying Cape Core using Kubernetes (more here).

Once the Coordinator is set up you can create a project and policy.

First create a file called reverse-policy.yaml which contains the following:

Run the next commands from CLI to create the project and policy:

Now you can run the following python to apply the reversible transformation to some sample data:

Have Ideas or Requests? We Want Feedback!

This release is the first of many based on user feedback for Cape Python and Cape Core. Let us know other features you are looking for in your privacy-preserving data science, or feel free to open an issue on GitHub (or a pull request!) with your idea. Together, we hope to make data privacy in data science accessible, easy, and community-driven!

About Cape Privacy

Cape Privacy is an enterprise SaaS privacy platform for collaborative machine learning and data science. It helps companies maximize the value of their data by providing an easy-to-use collaboration layer on top of advanced privacy and security technology, helping enterprises increase the breadth of data included in machine learning models. Cape Privacy’s platform is flexible, adaptable, and open source. It helps build trust over time providing for seamless collaboration and compliance across an organization or multiple businesses. The company is based in New York City and is backed by boldstart ventures and Version One with participation from Haystack, Radical, and Faktory Ventures.

--

--