Security challenges on AI Systems -Part-I

Dean Svahnalam
10 min readNov 11, 2021

The global race of Artificial Intelligence is ON! This is the third wave and it’s not just between different states, small and big corporations/organizations are in the same race and need to win that through providing the most TRUSTWORTHY AI products/services to their customers.

My name is Dean Svahnalam and this is my last project as a student at Hyper Island on a program called ”AI Business Consultant”. Having a background as an Information Security Specialist, the choice of my project was pretty natural, ”Securing AI”. I needed to understand in those two years of learning by doing, there I have created a bunch of cool services and products for both big corporations and municipalities, and there was always a missing part, the security part.

In this article, I am deep-diving into the security challenges we are facing on Artificial Intelligence systems/platforms. But there is a lot to understand before I deep-dive in security. I will go through what is that thing called Artificial Intelligence, which components it uses, where and how do we use it, the life cycle of Machine Learning, what is Big Data, what is ”trustworthy ai” means and why is it so important to have a trustworthy ai. What kind of targets and attacks surface we are dealing with.

What is Artificial intelligence (AI)?

Artificial intelligence is the ability of a system to handle representations, both explicit and implicit, and procedures to perform tasks that would be considered intelligent if performed by a human.

Components of AI

Application of Artificial intelligence includes Natural Language Processing, Speech recognition, chatbots, image recognition, sentiment analysis, expert systems, robotics, and machine vision. Machine learning and deep learning are subsets of AI.

Machine Learning (ML) life cycle

https://bit.ly/3AK3T2a

ML is a subset of AI. Most AI work now involves ML/DL because intelligent behavior requires considerable knowledge, and learning is the easiest way to get that knowledge.

To derive practical business value and to take advantage of machine learning and artificial intelligence (AI), organizations need to follow each step of the cyclical process.

1. Data Collection

2. Data Preparation

3. Choose a Model

4. Train the Model

5. Evaluate the Model

6. Parameter Tuning

7. Make Predictions

Machine Learning and its variants, includes

Supervised learning, in this model all the training data sets are labeled.

Semi-supervised learning requires just partially data that is labeled.

Unsupervised learning, In this model the data sets are unlabeled.

Reinforcement learning, This model is very different from the other models, where an agent learns through experience to maximize their award.

Big Data: The food for AI

Data is information. The shape of the data can be binary, text, video, or any kind of digital information. Data is the food for Ai applications and the oil of the world.

DATA = INFORMATION = POWER

Putin said;“Whoever becomes the leader in this sphere will become the ruler of the world.” and Elon Musk agrees on that.

Big data refers to data that collects from different sources in large quantities, more complex data sets that traditional data processing software can’t manage. It can be structured, semistructured, unstructured, or synthetic. AI applications need big data to extract insights and patterns and for advanced analytics, optimization, and much more.

Where do we use AI today?

AI for GOOD

Artificial intelligence is everywhere in our daily life;

Netflix, Instagram, smartphones, Retail, Warehouse, Fashion, Art, Chatbots, Sports, Production, Self-driving Cars, Healthcare, Security, Agriculture and farming, and much more.

With no surprise, we humans are using Artificial intelligence for both good and bad. With this powerful tool, we can destroy the planet or save it.

In this picture, we are working on saving the planet but it’s not going as fast as we want.

Artificial Intelligence is very good for monitoring and controlling us too, the digital nudging can be one of those bad/good things. With that said the reason is for not going as fast as it should be, is the lack of ethics, laws & regulations, and security.

Public-Private collaboration is the most effective way to work with ethics, laws & regulations, and security but some believe that digital nudging can make the process faster.

The benefits of this emerging technology are significant but so are the challenges. That’s why it’s so important that we have Trustworthy Artificial intelligence, which is ethical, secure, transparent, and explainable.

What are they talking about when they say Trustworthy AI?

If we accept the good things then we have to accept the negative ones too. And if they don’t use this technology trustworthy from now on then we don’t need to think about how our future will look like, we know, same as like our planet, sadly.

To prevent that and to make AI Trustworthy we need to think about the following steps:

  • human-in-the-loop, human-on-the-loop, and human-in-command approaches
  • AI systems need to be resilient and secure
  • Ensuring respect for privacy and data protection
  • The data, system, and AI business models should be transparent
  • Transparency is the key
  • Diversity, non-discrimination, and fairness
  • AI systems should benefit humanity and the planet
  • Auditability

The AGE OF EXPERIENCE

We are at the end of THE AGE OF INFORMATION and began with THE AGE OF EXPERIENCE there my identity is the information I have saved on the internet.

MY IDENTITY = INFORMATION I HAVE SAVED (text, photos, videos, web pages)

The second wave of the internet connects us with everything that has electricity. This is the first time in human history that we can connect globally as we do nowadays and we just got in love with the connectivity and as we know love is blind. It blended us and got stolen of our privacy, integrity, and data security.

So how can we secure our information?

Information security is designed and implemented to protect any form of confidential, private, and sensitive information. When we deal with data, we must consider the CIA train, three(3) components of Information Security.

CIA
https://bit.ly/3AMbbCD

Confidentiality

Organizations have to guarantee that sensitive and private information is secure and secret from others.

Integrity

This means the quality of information/data is honest and has strong moral principles and protects from deletion and modification.

Availability

This is the last component of the CIA Triad and refers to the actual availability of information/data.

Security challenges with AI Applications/Systems/Platforms

How can any system establish trust without covering the security of the systems?! To have trustworthy AI applications we need to consider the security for the whole AI models life cycle: Data sets, platforms, algorithms, and supply chain.

In early 2019, 200 distinct global groups worked on AI standardization, for both direct and indirect use of AI but none of those were specifically aimed at security. The European Commission has ethics guidelines for trustworthy AI but there are no direct considerations of how to ensure AI systems are secure. I guess everyone is thinking that it’s a traditional IT Security issue but there are many new security issues compared to traditional IT Security. We have new mitigation and verification security challenges.

After long research, I found one organization called ETSI (European Telecommunications Standards Institute) has released the first guidelines of Securing AI in August 2021.

Yes, you need to secure your network and assets with traditional Cyber Security but that is not enough to get the AI Systems secured.

Security threats in different phases of ML´s life cycle

Security threats in different phases of Machine Learning’s life cycle
Security threats in different phases of Machine Learning’s life cycle

Every security issue is different, depending on which kind of machine learning model you are using. For example, if you are using unsupervised learning there are no requirements for data labeling in the data curation stage.

Confidentiality challenges

Training phase

Learning is the core for machine learning and the training phase is the most critical phase cause that will create the behavior of AI applications. The model will run repeatedly until it gets the most accurate performance. That’s why it’s so important that the training data set has to be of high quality.

There are three types of attacks in that phase

  • Full knowledge attack — the attacker has full knowledge of the internal operation of the algorithm.
  • Partial knowledge attack — the attacker has some knowledge of the internal operation of the algorithm.
  • Zero-knowledge attack — the attacker does not know at all the internal operations of the algorithm.

When the attacker doesn’t know the model parameters, creates an extended data set with malicious synthetic inputs, to get it out the information about the original data sets. When the attacker gets out some of the information, takes advantage of unused bits to leak more information about the original model.

Deployment phase

In this phase, challenges are more generic, like any other software/hardware deployment we have to think about which kind of feature we have to use, and of course, the choice of feature would be, better level of protection and uses Trusted Execution Environments (TEEs) but maybe it can’t provide the high level of performance due to processors/GPUs.

In the process of deployment of machine learning models, the biggest vulnerability is the back-door attack. Which compromises the confidentiality of the training sets. The attacker uses specific malware to avoid normal authentication and gain access to the target system. Which gives the hacker great opportunities to go through all recourses in the system.

Deployment of machine learning on untrusted and unsecured devices can jeopardize the whole AI model.

Integrity challenges

Data acquisition

As we know now that the big data is the food for any AI system. We collect data from multiple sources such as sensors, CCTV cameras, mobile phones, medical devices, trading platforms, log files, and it can be in different shapes such as text, image, video, and audio. Data transmission and storage require good security to mitigate integrity challenges.

It’s very critical for machine learning systems that the data is of high quality, otherwise, data can be poisoned with an intentionally malicious attack. It’s called a poisoning attack.

Data Curation

This stage is very critical and has great integrity challenges and it’s important to ensure the quality and integrity of data are without risks. In this phase, all data prepares for the next stage. Preparing the data includes integrating data from multiple sources and formats, identifying missing components, removing errors and noise, conversion of data into new formats, labeling, data augmentation using real and synthetic data, or scaling the data sets.

For example, in supervise ML systems requires labeling of data and it is very important that data labeling is accurate, unbiased, and as complete as possible and is not compromised, e.g. through poisoning attacks.

Training & Deployment phase

Those two phases are the most essential and present unique security challenges since it establishes the baseline behavior of the application. There are a couple of integrity challenges; Poisoning, Input attack, and Backdoor vulnerability. In those kinds of attacks, the attacker can manipulate the input with zero, partial or full knowledge of algorithms and includes a special pattern during the training phase and it is almost impossible to detect. It’s only in the training phase where poisoning requires action.

Input attacks occur in the deployment phase when systems are already in use and do not require the integrity of the system itself to be compromised at all. The AI system simply behaves as it should, with the output being manipulated due to specific changes in the input.

Upgrade phase

This phase must get the same attention as any generic change to the deployment system otherwise it can result in integrity or availability issues. Back door attacks happen during the training phase och can be triggered by upgrading the model.

Availability challenges

Training phase

Poisoning attacks can compromise the availability of training data sets which can result in wrong interference results. “Denial of features” is another attack that should be taken into consideration cause Feature selection is an important step in unsupervised learning.

Testing phase

Performance validation of the model and its parameter occurs in the testing phase which in result shows us if the model is operating correctly from a functional perspective. Under the training phase some data sets are saved or not used, to do the testing and validate the performance of the model and as with traditional software systems, the code for implementing the model also needs to be tested. Standardization of adversarial testing and formal verification algorithms will be important in terms of ensuring the robustness of learned models.

Deployment and upgrade phase

Those phases have more generic systems challenges that why the choices about architecture, hardware/software deployment use of Trusted Execution Environments (TEEs) are very important. But it’s important to have it in mind that TEEs can provide a better level of protection for the system components but may not be able to provide the level of performance provided by generic processors or GPUs.

In the “Security challenges on AI Systems Part-II”, I will deep dive into the attacks and describe them in detail and write guidelines about how to mitigate those kinds of attacks with security including in the design process from the beginning.

In the end, I want to show how is the status of Cyber Crimes in the world.

Cybercrime report

https://bit.ly/3ujLmr9

In this picture, we can see the status of cybercrime and understand the importance of investing in cyber security. It’s not about IF you will be attacked, it’s about WHEN you will be attacked. The cost of cybercrimes by 2025 is $ 10.5T a year, globally. That’s almost $ 20M every minute.

Thanks for reading, please keep in touch with me on LinkedIn/ Dean Svahnalam

--

--