Responsible AI: Why it’s so hard, and why we should care about it

MMC Ventures
MMC writes
Published in
10 min readApr 17, 2024

For a real-world example, Neidle (2023) tasked AutoGPT (an autonomous AI system based on ChatGPT) with researching tax advisors who were marketing a certain kind of improper tax avoidance scheme. AutoGPT carried this task out, but followed up by deciding on its own to attempt to alert HM Revenue and Customs, the United Kingdom’s tax authority. It is possible that the more advanced autonomous AIs of the future may still be prone to manifesting goals entirely unintended by humans.

— Park et al. “AI Deception: A Survey of Examples, Risks, and Potential Solutions

Legally murky (and vigilante) applications of AI aside, we see these powerful models being used everywhere from cancer detection systems to autonomous vehicles. To avoid literal and figurative car crashes, you need to ensure these AI systems function as intended and are deployed in ways that don’t harm humanity. This has led to the rise of Responsible AI solutions, which enable AI systems to operate as intended; they ensure that AI performs reliably, produces fair outcomes (e.g. not rejecting loan applications from disadvantaged groups because of bias) and complies with regulations.

From the time Nitish wrote about Responsible AI (in the pre-ChatGPT era), the world has fundamentally changed. In our 3-part series on Responsible AI, we capture these shifting dynamics through conversations with 40+ practitioners and startup founders, as well as our own proprietary analysis of c.125 companies operating in the Responsible AI space. If you’re a founder building in this space, please reach out to Mina, Nitish or Advika — we’d love to hear from you.

Why is Responsible AI so hard?

AI systems are brittle; it doesn’t take a lot for them to malfunction. To illustrate, most traditional cybersecurity attacks typically require more sophisticated knowledge (at least coding), but AI systems can be broken by a user merely interacting with them — if you asked ChatGPT to repeat a word endlessly, it would disclose sensitive data. As a side note, asking ChatGPT to do so now represents a violation of its terms of service, so please don’t try this at home. But it’s not just about sophisticated capabilities (or lack thereof), it’s also about scale and resources — by poisoning just 0.01% of a dataset, you can cause an AI model to misclassify images, and you need a budget of about $60 (yes, only $60) to poison 0.01% of the data.

AI systems are less transparent than code. One of the challenges with AI systems is you can’t fully explain the reasons for some of the outputs of the model (even if you can verify its validity) so you can’t be sure of the vulnerabilities that have been introduced. While a software engineer can correct code in an application, getting an AI model to “unlearn” what it has learned from corrupted training data is much harder. This makes research into explainability (understanding and interpreting how an AI model makes its decisions or produces its outputs) so important. If you’re able to understand and explain your model’s behaviour, you can better detect anomalous behaviour and predict how the model will react to inputs it hasn’t seen before. But explainability is a double edged sword, as we will see later.

AI models keep evolving and adapting. Unless explicitly modified, application code typically stays as is, while AI models are continuously evolving and adapting in production. Larger AI models also demonstrate “emergent” capabilities, where AI models are able to do things they weren’t explicitly programmed or designed to do. For example, as GPT-3 models became larger, they gained the ability to perform arithmetic, even though GPT-3 received no explicit arithmetic supervision. Large Language Models also change along many personality and behavioural dimensions as a function of both scale and the amount of fine-tuning.

GenAI is challenging to test. This is because the inputs to the system (different modalities, languages, use cases, user bases, cultures) and the outputs of the system (such as text or image generation) can take countless forms. Depending on how you write your prompts, the outputs can vary meaningfully, resulting in inconsistent outputs even though the underlying intent (as a human would understand it) is the same. The infinite permutations/combinations also complicates the task of distinguishing correct inputs/outputs from incorrect ones.

“The non-deterministic nature of LLMs is an inherently intractable problem to solve.”

— Talesh Seeparsan, Founder at Bit79

Different AI systems are susceptible to different failure modes. AI models can perform a wide range of tasks or use cases (classification, generation) and work with different modalities (text, images, audio, graphs), and each of these AI models (tasks/modalities) have their own unique failure modes. For example, a computer vision model can be tricked into classifying a panda as a gibbon by using adversarial examples, but a graph machine learning model that detects fraud can’t be fooled the same way because it is difficult to generate adversarial examples in the discrete space. The more models you have, and the more different types of models you have, the harder it is to secure all your AI systems.

Side note: For our more technical readers, we haven’t forgotten about transferability of adversarial attacks or even universal adversarial perturbations. We’re just illustrating how varied the failure modes are.

AI models often contain or memorise sensitive data. There are certain machine learning models which actually contain parts of the training data in its raw form within them by design. For example, Support Vector Machines (SVMs) and K-Nearest Neighbours (KNN) models contain some of the training data in the model itself. Additionally, LLMs have been shown to memorise parts of their training data, and when prompted appropriately, they will emit the memorised training data verbatim. All of these could potentially result in sensitive data exposure and violate users’ privacy.

AI has broadened the attack surface. AI agents are increasingly being integrated into a company’s assets such as APIs, databases, code execution, services etc. To illustrate, a single consumer-facing chatbot could access multiple tools for fulfilling a user request; this means that each of these tools could be attacked from a single gateway.

“Why is this a game changer? Real AI security extends across all mediums, including endpoints, networks, clouds, applications, etc. This ranges from shadow AI used through browsers to backend services in AWS hosting Llama-7B, from API calls made from the CLI to the OpenAI API to developers using GitHub Copilot in their IDEs — the scope is incredibly broad! Additionally, we have transitioned from very structured inputs/outputs to undeterministic realms with unstructured natural language, which necessitates an additional level of contextual detection, usually only achievable with AI.”

— Itamar Golan, Co-Founder and CEO at Prompt Security

AI systems come with trade-offs. The holy grail of AI systems is for them to be accurate, robust (not easily susceptible to unintended failure modes or malicious attacks), explainable (you know how the AI model made a decision) and fair (free from bias). However, these objectives are sometimes at odds with one another — models with higher accuracy are less robust; similarly models with higher predictive power (higher accuracy) have lower interpretability/explainability. Meanwhile, explainable models are less robust simply because it’s easier to attack something you understand, and models can be robust or fair but likely not both.

Overview of AI System Trade-offs

Why should we care about it now?

AI is increasingly ubiquitous, especially in critical systems. From cancer detection to loan approvals to driverless cars, AI/ML systems are being used everywhere. A fault in these models could have life-altering consequences for people, yet AI incidents are much more common than they should be — a 2022 Gartner survey in the US, UK and Germany found that 41% of organisations had experienced an AI privacy breach or security incident. And this was before the dawn of the ChatGPT era, which thrust GenAI into our collective awareness.

“As narrow and general-purpose AI models become an increasingly crucial component of our cyber defense toolkit it’s evident that adversaries will shift their focus towards attacking these defensive layers directly with techniques we would currently consider novel. However, by drawing on the vast experiences of hardening attack surfaces in previous technology generations we can begin to fortify our AI defenses. A key mindset shift involves regarding our AI models as potential vectors of vulnerability within our designs and building out the layers of security and monitoring that form the basis of most good resilient architectures.”

— Rahul Powar, Founder & CEO at Red Sift

GenAI adoption is becoming more mainstream. There are 12,000+ GenAI tools currently (most of them being ChatGPT wrappers). Not only are GenAI tools proliferating, but they are also finding meaningful uptake — a 2023 McKinsey survey found that 22% of respondents were regularly using GenAI in their work. Rather concerning, a large chunk of GenAI users mistakenly assume that it always produces factually accurate and unbiased answers. That said, CISOs we spoke to were more anxious about data leakage through GenAI apps than anything else. But GDPR violations aren’t the only thing to be worried about, as we will see shortly.

Side note: If you want to give yourself new things to worry about, you could always go through the AI Incident Database.

Lawsuits, bans and other consequences of faulty AI. The Air Canada chatbot incident or the DPD chatbot incident get so much airtime, but a more critical one is the class action lawsuit against insurer UnitedHealth Group for using an AI algorithm that systematically denied elderly patients’ claims for extended care such as nursing facility stays. Or even FTC’s 5-year ban on retailer Rite Aid for using AI facial recognition that falsely tagged consumers, particularly women and people of colour, as shoplifters. Beyond the life-altering consequences that these have had on the affected people, deploying unsafe AI systems have resulted in damages, fines and other penalties. And this was before landmark regulations like the EU AI Act (which takes a risk-based approach to products or services using AI, to ensure consumer safety) were approved…

Regulations with penalties that pack a punch. Violating the EU AI Act could cost a company as much as EUR 35 million or 7% of worldwide annual turnover (whichever is higher). To contextualise these numbers, a GDPR violation could cost a company up to EUR 20 million or 4% of worldwide annual turnover (whichever is higher). Needless to say, regulatory compliance is a top-of-mind concern for enterprises, a problem whose complexity is only compounded by the proliferation of AI regulations globally.

“The first step to managing the risks of AI is to identify and classify all the AI deployments used by a business. While this should be best practice in any event, the AI Act requires it and will be supported by documents issued by global standards bodies. Meanwhile, RegTech tools and technical innovations in AI systems will support the work.”

— Charles Kerrigan, Partner & Lawyer at CMS

Hell hath no fury like the financial markets. In early 2023, Google lost $100bn (or 8%) of its market cap in a single day because the Google Bard chatbot advertisement showed inaccurate information. Then in early 2024, Google again lost $90bn in market value over Gemini creating racially inaccurate depictions of historical figures. One of the interesting outcomes of AI going mainstream is that AI failures are under greater spotlight, and the markets are frankly more punitive than any regulation could be — a reputational hit directly translates into a disproportionately high financial hit.

The (less) obvious business benefits: One of the most obvious benefits of using Responsible AI solutions is that AI systems perform more reliably and therefore actually make it into production — a major feat given half of AI solutions fail between pilot and production. The less obvious benefits are technological innovation, sustainability and other second order effects. For instance, Federated Learning (FL) is a Privacy Enhancing Technology that trains AI models in a distributed way, keeping the raw sensitive data in its original location (vs centralised learning, which involves moving or copying all the data to a single location and training a model on that). To illustrate, several hospitals may want to collaborate on an AI model for disease detection, but do not want to share sensitive patient data — so Federated Learning is a good option. However, recent breakthroughs in FL have produced a viable federated approach to LLM pre-training, which will enable the LLM to take advantage of distributed data and distributed compute, leading to stronger models. In fact, FL could also present a greener technology than data centre GPUs.

“AI-enabled companies need powerful, best in class technology to assess the robustness and safety of their ML-based products. Being able to do so not only dramatically reduces their runtime operational risks, and ensures regulatory compliance, but, crucially, it also unlocks the deployment of novel, cutting edge ML solutions that until now could not transition from R&D to product because of safety concerns.”

Alessio Lomuscio, Founder and CEO at Safe Intelligence

Saving AI from itself

While AI brings capabilities that can alter the world on a fundamental level, these systems are inherently brittle. It doesn’t take much to break them, there are infinite ways to do that, and the consequences of breakage are severe (potentially life-altering and certainly financially ruinous). Thankfully, a whole new breed of Responsible AI companies have emerged to save AI from itself — in Part II of our series on Responsible AI (coming soon!), we outline “The Who’s Who in Responsible AI.” If you’re a founder building in this space, please do get in touch with Mina, Nitish or Advika.

MMC was the first early-stage investor in Europe to publish unique research on AI in 2017. We have since spent time understanding, mapping and investing in AI companies across multiple sectors as AI has developed from frontier to the early mainstream and new techniques have emerged. This research-led approach has enabled us to build one of the largest AI portfolios in Europe.

Acknowledgements

Special thanks to Sandy Dunn, Fabrizio Cilli, Talesh Seeparsan, Guy Fighel, Hassan Patel, Luca Sambucci and the fantastic folks at MLSecOps Community (shoutout to Charlie McCarthy!) for their insights.

--

--