Securing ML applications

Model security and protection

Vimarsh Karbhari
Acing AI
3 min readMay 28, 2020

--

Similar to software applications, machine learning applications need to be secure as well. Like with most emerging technologies, most data science teams have no idea how ML applications could be exploited. Therefore as the industry matures, these security measures need to be implemented quickly, especially for models that process highly sensitive or PII data.

If your models analyze sensitive information like credit card data or mortgage data for fraud detection or scanning legal documents to help with discovery for a court case, or health data to help discover potential cures, malicious hackers could use attacks to get this information which could be deemed proprietary.

Strategic Security : Photo by Balaji Malliswamy on Unsplash

Security Scenarios

There are various scenarios which could be leveraged in order to improve the security posture of the ML application. Data Science teams usually leverage one for more of these scenarios to make their applications more secure.

Model building and model execution

Enterprise grade models require enterprise grade security. Using open-source tools like TensorFlow or Pytorch could potentially expose the application to exploits. Open-source communities are known to patch up bugs rather quickly, but the time it would take to apply a fix may be more than enough time for hackers to initiate an attack or drastically affect business operations. Hence, if you are using tools like Tensorflow or Pytorch it is important to take extra steps to harden the security on the application. These could be in the form of authentication and encryption frameworks. Models, resources, URIs or directory locations could be encrypted Python’s hashlib. A simple example would be lets say in case of a LSTM model, we would save and calculate the hash of the model, deploy the model with the hash, and run model authentication at runtime before running the model in Production.

Model deployment and model updates

Model deployment and model updates usually follow CI/CD pipeline process. During the CI process, model metrics like confusion matrix or R1,R2 scores would be calculated. It is important to have authentication (two factor or otherwise) happen at the point of the CI/CD process. Usually, CD is performed by people with higher access and those norms from software engineering should be adhered in this scenario as well.

Real time model security

Models which are real time in nature require better security. Real-time models are described in detail here. If you have a model doing real time calculations like calculation house price estimates, calculating mortgage approvals or credit approvals in real time, real time model security is important. The data science team can build encryption using Pycrypto which is Python’s cryptography framework. This will help harden the inputs to the models. Additionally, spam detection and rate limiting should be built in to protect these ML applications using authentication and registrations.

Recommendations

As machine learning becomes part of standard application development, hackers will find new vulnerabilities and methods of attack. The data science team depending on model usage and maturity should leverage built-in security features from software applications to prevent model interference. Additionally, they should provide documentation and resources for ML Ops/Dev Ops teams to maintain the security posture with ML applications.

References:

Robust Physical-World Attacks on Deep Learning Visual Classification (arxiv)

Adversarial Examples for Malware Protection (Patrick McDaniel)

Subscribe to our Acing AI newsletter, I promise not to spam and its FREE!

Thanks for reading! 😊 If you enjoyed it, test how many times can you hit 👏 in 5 seconds. It’s great cardio for your fingers AND will help other people see the story.

--

--