Data Poisoning: A Silent but Deadly Threat to AI and ML Systems

Published in

nFactor Technologies

6 min readJun 7, 2024

What is data poisoning?

Data poisoning is an increasingly critical issue in artificial intelligence (AI) and machine learning (ML), particularly for generative AI models, putting them at risk of generating biased or false data. This form of cyberattack involves intentionally tampering with training data to corrupt AI systems’ behavior. In most cases, the attacker is an internal actor such as an employee, or someone who has knowledge of the model and often the organization’s cybersecurity processes and protocols. This is known as an insider threat or white box attack. On the other hand, the attacker may be an external adversary that does not have access to internal information regarding the model or the data. This is known as a black box attack [1]. Logically, internal threats tend to be more successful in inflicting damage due to their familiarity with the system they are attacking. This can introduce vulnerabilities, backdoors, or biases, significantly undermining the model’s security, performance, and ethical integrity. The attack occurs during the ML model’s training phase, effectively compromising future performance. This interference typically occurs when the attacker gains access to the respective training data by means such as exploiting system vulnerabilities or directly injecting poisoned data.

Poisoned data refers to manipulations such as mislabeled samples (label flipping attack), noisy data points, or backdoor patterns. These malicious alterations to the data deceive the model and it proceeds to learn incorrect behaviors. The objective behind data poisoning is either a targeted attack, to influence the model’s behavior in response to specific input, or an untargeted attack, which disturbs the overall performance [3].

The success of data poisoning relies on stealth, efficacy, and consistency in the poisoned model. As generative AI technology becomes more widespread in daily usage and enterprise implementations, it is crucial to understand and counteract data poisoning attacks to maintain the reliability and security of these systems.

FIGURE 1. Data poisoning attacks during training phase affecting testing phase [6].

What are the consequences?

When a model trained with poisoned data is deployed, it can exhibit malicious behaviors intended by the attacker, such as:

Misclassifying Specific Inputs (Targeted Attacks): The model incorrectly classifies certain inputs as part of a targeted manipulation.
Overall Degradation in Accuracy (Untargeted Attacks): The model’s general performance and accuracy suffer, leading to widespread errors.
Activating Backdoors with Specific Triggers: The model can be controlled through hidden triggers that activate specific, malicious behaviors.

Here are the key consequences of data poisoning:

1. Manipulating Outputs: Attackers can insert poisoned data that causes the model to generate specific, targeted outputs when given certain prompts. For example, a text-to-image model might be poisoned to produce inappropriate or misleading images in response to specific keywords.

2. Bias Introduction: Poisoned data can introduce biases into the model, leading it to produce biased or discriminatory outputs. This is particularly harmful in content generation, where biased outputs can perpetuate stereotypes or spread misinformation.

3. Backdoor Creation: Data poisoning can create backdoors in the model, allowing attackers to trigger specific behaviors with hidden prompts. These backdoors enable the manipulation of the model’s outputs in a controlled manner, often without detection.

Real-world examples of data poisoning illustrate its significant and varied consequences. One notable instance is the Nightshade attack, where researchers executed a prompt-specific poisoning attack on text-to-image generative models. By injecting a small number of poisoned samples, they managed to corrupt the model’s ability to generate accurate images for specific prompts. This attack also caused the poisoning effects to “bleed through” to related concepts, complicating efforts to circumvent the damage [4]. Another example involved manipulating sentiment analysis models; researchers inserted poisoned examples into the training set, causing the model to consistently predict positive sentiment for the phrase “Apple iPhone,” regardless of the actual context [5]. Such manipulation could significantly influence business decisions based on sentiment analysis. Additionally, data poisoning has been employed in phishing and malware distribution schemes, where AI-powered help desks were manipulated to direct users to phishing sites or distribute malware. By corrupting the training data, attackers altered the model’s behavior to serve their malicious purposes, highlighting the broad and severe implications of data poisoning in real-world applications.

How can we prevent these attacks?

Implementing robust data validation and sanitization processes can help detect and filter out poisoned data before it is used for training, including checking for anomalies and inconsistencies in the data (points that deviate significantly from the expected patterns or behaviors). For instance, statistical techniques like z-scoring and isolation forests can assist in identifying extreme values and ensemble methods that allow for training on subsets of data and combining the predictions can dilute the impact of poisoned data. Introducing adversarial samples during the training phase can help the model learn to recognize and resist poisoned inputs, thus improving the model’s robustness against data poisoning attacks. Using multiple, diverse data sources for training can reduce the impact of poisoned data, as cross-referencing data from different sources makes it harder for attackers to poison a significant portion of the training set. Regularly monitoring and auditing the model’s performance and outputs can help detect signs of data poisoning, including tracking changes in model behavior and investigating unexpected outputs. Training multiple models on different subsets of data and comparing their outputs can help identify and mitigate the effects of poisoned data, thereby improving the overall robustness of the AI system.

Data poisoning poses a significant threat to the integrity and reliability of generative AI models. As these models become more integrated into various applications, it is crucial to implement robust defenses to safeguard against such attacks.

By understanding the mechanisms of data poisoning and employing effective mitigation strategies, organizations can protect their AI systems from malicious manipulation and ensure their outputs remain accurate and trustworthy.
Implementing rigorous data validation techniques to analyze incoming data for anomalies, inconsistencies, or patterns that deviate from established norms can involve statistical analysis, anomaly detection algorithms, and machine learning models to automatically flag and review suspicious data before it is used for training AI models.
Establishing secure, controlled environments for AI model training using virtual private networks (VPNs), firewalls, encrypted data storage, and strict access controls through role-based access controls (RBAC) helps shield the training data and models from external threats and unauthorized access.
Continuously monitoring AI models’ performance, outputs, and decision-making processes to detect any unusual behavior that may indicate a data poisoning attack involves implementing real-time performance dashboards, alert systems, and regular audits to adapt to new threats.

The insidious nature of data poisoning lies in its ability to compromise the model’s trustworthiness from within, without necessarily altering the model architecture itself. Even a well-designed model can produce unreliable outputs if trained on poisoned data, making data integrity paramount for secure and reliable AI systems.

References

Ramirez, Miguel & Kim, Song-Kyoo & Al Hamadi, Hussam & Damiani, Ernesto & Byon, Young-Ji & Kim, Tae-Yeon & Cho, Chung-Suk & Yeun, Chan. (2022). Poisoning Attacks and Defenses on Artificial Intelligence: A Survey. https://www.researchgate.net/publication/358762608_Poisoning_Attacks_and_Defenses_on_Artificial_Intelligence_A_Survey
OWASP Machine Learning Security Top Ten 2023 | ML02:2023 Data Poisoning Attack | OWASP Foundation. (n.d.-b). https://owasp.org/www-project-machine-learning-security-top-10/docs/ML02_2023-Data_Poisoning_Attack
Goldblum, M., Tsipras, D., Xie, C., Chen, X., Schwarzschild, A., Song, D., Madry, A., Li, B., & Goldstein, T. (2023). Dataset Security for Machine Learning: Data Poisoning, Backdoor Attacks, and Defenses. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(2), 1563–1580. https://doi.org/10.1109/tpami.2022.3162397
Shan, S., Ding, W., Passananti, J., Wu, S., Zheng, H., & Zhao, B. Y. (2023, October 20). Nightshade: Prompt-Specific poisoning attacks on Text-to-Image generative models. arXiv.org. https://arxiv.org/abs/2310.13828
Wallace, E., Zhao, T., Feng, S., & Singh, S. (2021). Concealed data poisoning attacks on NLP models. https://doi.org/10.18653/v1/2021.naacl-main.13
Poisoning Attacks and Defenses on Artificial Intelligence: A Survey — Scientific Figure on ResearchGate. Available from: https://www.researchgate.net/figure/Data-poisoning-attacks-during-training-phase-affecting-testing-phase-17_fig1_358762608 [accessed 31 May, 2024]

Data Poisoning: A Silent but Deadly Threat to AI and ML Systems

Written by Anya Kondamani