Securing your models
AI is being so hyped up these days, companies rush to build something while forgetting some of the most fundamental concepts on IT projects. One of the most overlooked concepts is security. IT security is a very vast and complex topic that can’t be covered in a single blog, but there are a few fundamental concepts that anyone building enterprise AI solutions needs to be aware of. Since Data is the fuel of AI, it makes sense that these security risks are primarily related to Data. Data security and Intellectual Property (IP) are two pressing issues that the business or developers may overlook while being immersed in making sure the AI application is as accurate as possible and satisfies the end user.
Intellectual Property
IP is a very serious topic in Data Science, primarily because it defines who owns what. Fortunately (or Unfortunately for some), there are certain things in AI that can’t be protected by IP laws. Algorithms, for example, can’t be protected by Patents and Copyrights, however one can protect a certain specific implementation of an algorithm. As such, Neural Networks, Decision Trees, Regression Models are all public knowledge. So in AI, the only important thing worth protecting is the data used to train the models. Data itself is your IP, the data you collect and use to train the model is what defines how accurate it is.
The risk starts to arise when you expose your model to the public. When deployed, a model can be queried to to extract certain information about the distribution of the data. Such adversarial attacks utilize the prediction APIs you expose to send a large number of requests (without overloading the system) to obtain the confidence levels for the predicted results. With this information, an attacker can reconstruct the model and potentially uncover distributions in the training data.
To protect against such attacks, you must be careful who you expose your model to. If anyone can get a response from your model then this becomes a risk. One solution is to authenticate anyone who wishes to utilize the model (and thus track their usage using analytics). Another approach is to obscure the end result of the model behind a set of applications, so the end user does not receive the exact result; for example in large scale conversational systems, the end results of the of the core ML model is obscured behind complex responses that also depend on the context as well as the input question.
Data Privacy
Again data is the key ingredient in AI, so it is important to make sure that whatever data you use:
- Is your own data (or you have permission to use it)
- Is secured during dev/ deployment
- Can be exposed publicly (if your model is exposed)
The premise of these three points is that there is a privacy concern around AI, so it is important that you either have permission to use the data for training the model or own the data. Also, it is important to ensure whatever environment you use to train and deploy your models is secure. There seems to be a habit with developers to do things on their laptop, which is some situations may be a security risk.
Because your model will likely be deployed online, you need to make sure that the data you use to train the model can be exposed (to some extent). Because once the model is online, people can use it and extract information about the data you used to develop the model.
Last Words
When you build an AI solution, you are using data to fuel this application. It is important to ensure that you have the right to use this data, and that you are in control of who uses your models and for what purpose, because once a model is out, it exposes a lot of information about the data and your IP. Security is getting lost in this hype around AI, so whenever you go and build a model, make sure you stop and pause for a second about the data and the consequences of your applicaiton.
