Advancements in Machine Learning with Encrypted Data

A View into Blind Machine Learning

Sats Sehgal
The Startup
4 min readJan 3, 2021

--

Image by Author

Digital product development is fueled by data and analytics strategies that have become integral for a business to scale their operations and meet the rising demand from their customers. Included in these rising demands are privacy controls around data and how customer data is managed. On the flip side these organizations that hold all this data are in a dilemma and that is how can they leverage this data to further develop data driven solutions in an environment where customers want to limit how their data is used.

The rise of policies such as GDPR in Europe, Brexit in the UK and potential policies changes in North America that mimic GDPR will only make it more difficult for organizations to leverage data. This is where encrypted data takes a front seat with a mission to solve multiple issues. First, encrypted data decouples an organization from its data through encrypting the customer entity or any other data element that can link raw data back to their customers (within limits). Second, machine learning has been an increasing trend within organizations and can also be leveraged with encrypted data.

Enter Homomorphic Encryption, which provides a means for organizations to leverage encrypted data to perform simple to more advanced analytical modeling with it. There are two forms of homomorphic encryption

  1. Partial Homomorphic Encryption (pHE)
  2. Full Homomorphic Encryption (FHE)

FHE is still somewhat of an evolutionary research topic that is getting more and more research dollars from topic companies such as Microsoft and IBM but in a nutshell the difference between pHE and FHE is with pHE you can do either a multiplication or addition computation with encrupted data while FHE enables both. However, the infrastructure and computation power needed for FHE is expotentially higher and thus more expensive.

The other common misunderstanding about encrypted data is that data may be encrypted in a database but not necessatily encrypted in how its stored. For example if we have a record:

This data maybe sitting in a database in plaintext but the database itself might be encrypted. So to get access to this data all we would need would be to unlock the db. Pretty straight forward. However, that is not the usecase we’re going to cover today. In fact the encryption we’re going to look at is at the record level. This means in addition to the database being encrypted, so are the records below.

Again, this means any raw data beyond this such as transactional data cannot be linked to a customer because the data record level customer data itself is encrypted.

Let’s walk through a basic diagram of the encryption and computation process as outlined in the image below. In this example customer ABC is looking to get some analysis done by company XYZ but dont want to expose any of the underlying data to company XYZ. They just want them to use encrypted data to run an analysis for them.

Image by Author

Customer ABC would encrypt their database at the record level (as shown above) and send that file to company XYZ. Company ABC would use something called asymmetric etic encryption where the customer has a private key and they would encrypt the data with a public key. This data that is encrypted is referred to as cipher text. Company XYZ would then apply a pre-trained or a train a new ML model against the encrypted data. . The cipher text would interact with the ML model and produce an encrypted result that not even company XYZ can decode because they dont have the private key to decrypt the results. The encrypted results are sent back to customer ABC ahwere the results are decrypted using their private key. This method of transacting between two organizations will help preserve privacy of the underlying data but also enable analytical companies to better leverage widespread data.

As privacy continues to remain an important and growing concern, areas such as homomorphic encryption and ML will only see an uptick in usage over time. To see a code demo of how this would work please check out this video demonstration as well as the python code on how to enable pHE with limited data.

I hope this has helped you get an understanding and appreciation for the need for frameworks such as HE and its importance in a world where privacy requirements around data will only get more stringent.

--

--

Sats Sehgal
The Startup

Sats is a data and analytics business executive. He enjoys working with organizations to create Data, AI and Digital Strategies. He also enjoys teaching coding