The Challenges of Enterprise AI and LLM Adoption — Part 1

Szabolcs Kósa
10 min readJun 27, 2023

--

In my previous post, I began to unravel the complexities of generative AI and Large Language Models (LLMs) adoption. We acknowledged their potential to redefine the status quo and the considerable hurdles that accompany such a monumental shift. Now, it’s time for me to delve deeper, to examine the specifics of these challenges in a more detailed manner.

Change is a constant in the world of technology, but the introduction of AI into the enterprise is a shift of a different magnitude. It’s not just an upgrade or an update; it’s a fundamental transformation that could redefine how we work. This isn’t a challenge that can be dismissed or overlooked.

Enterprises have faced significant shifts before. They’ve navigated the challenges of cultural and technological changes, from the pivot to agile methodologies to the marathon of digital transformation. But many of these attempts were less successful than we would have liked. Integrating AI, especially in the form of LLMs, is a unique challenge. For the first time, we’re seeing technology make its way into the realm of knowledge work, a domain that was once exclusively human. The ability of AI to handle the complexities of language is a sight we’ve never seen from a computer system before, marking a significant departure from previous technological leaps.

The pace of this change is also noteworthy. We’ve all experienced the acceleration of technology evolution and adoption. From the rise of the internet to smartphones and social networks, each new wave of technology has come faster and hit harder than the last. This has conditioned us to anticipate the next big wave of disruption, and advanced AI is shaping up to be just that. This anticipation is further fueled by the media hype surrounding AI, creating a sense of urgency and expectation that is hard to ignore.

When it comes to managing this change, the biggest and hardest challenge is the old ways of thinking. These entrenched mindsets can be more rigid than any corporate policy and more stubborn than any technical glitch. As we usher in this new era of advanced large language models, we’re not just changing the tools we use; we’re changing the very nature of human-computer interaction. The truth is, we’re charting a course through largely unexplored territory. Even the scientific and research community, the very people who are at the forefront of this technological revolution, are struggling to find a consensus on a sensible strategy. The landscape is evolving so rapidly that it’s like trying to hit a moving target. It’s not surprising, then, that enterprise leadership is often left puzzled about what to do.

In this post, I’m going to take an inventory of some of the most critical challenges of this shift. As I was putting my thoughts into words, it dawned on me just how many diverse challenges there are. It’s a lot to cover in one go, which is why I’ve decided to break this post down into different parts. In this first installment, we’ll dive into the realms of Regulation, Governance, Hallucination, Algorithmic Bias, and the control issues that accompany them.

Hallucination and algorithmic bias

Large Language Models (LLMs), despite the extensive data they’ve been trained on, currently lack a grounded, causal understanding of the world that humans gain from years of interactive experience. Instead, they can recognize and replicate patterns in language, their ‘world knowledge’ is a reflection of the patterns they’ve seen in their training data, rather than a deeper, more nuanced understanding.

This limitation often results in a phenomenon known as “hallucinations”. These hallucinations, which can present in various forms, are essentially outputs that, while appearing plausible, are not based on factual information. This becomes particularly significant in an enterprise context where accuracy and fact-based decision-making are paramount. Interestingly, these hallucinations can occur even when the models are working with specific inputs, such as analyzing given documents or datasets, presenting a challenge for tasks like document analysis, financial forecasting, or market research where reliable output is critical.

Another issue is that most of these models currently lack a self-checking mechanism to discern when their outputs veer from factual reality into fiction. This means that, without human oversight, LLMs could generate outputs that, while appearing logical and coherent, are not grounded in reality. This poses risks in an enterprise setting where automated responses or insights could be taken at face value, potentially leading to strategic or operational missteps.

The root causes of these hallucinations are complex and multifaceted. At their core, they stem from an over-reliance on the correlations found in the training data. Since this data can contain noise and biases, the models might inadvertently capture and reproduce these inaccuracies rather than factual information. Furthermore, the phenomenon of ‘hallucination snowballing’ can exacerbate the issue, where models over-commit to early mistakes, leading to a cascade of further inaccuracies. In business contexts, this could lead to inaccurate predictions or analyses, potentially causing misinformed decisions.

These models can carry embedded biases that can seep into corporate decisions, potentially skewing outcomes and undermining fairness. Algorithmic bias in large-scale language models refers to the systemic predispositions these models may exhibit that unjustly favor certain groups or viewpoints, or perpetuate harmful stereotypes. This bias is typically a byproduct of the data utilized for training and can influence a broad spectrum of applications, from automated content creation to advanced recommendation or rating systems.

The principal challenge arises from the fact that these models are trained on massive datasets, often derived from the internet or extensive text corpora. If the source data encapsulates different biases, the model may inadvertently learn and possibly magnify these. For instance, if the training data carries certain stereotypes, the resulting model might produce content that enforces these stereotypes.

It’s critical to understand that algorithmic bias isn’t solely a reflection of the training data. It can also be a consequence of the model’s architecture or the framework of its learning process. If a model is fine-tuned to prioritize accuracy on a specific task, it might disproportionately favor groups that are overrepresented in the data, leading to skewed outcomes.

The ripple effects of these biases and errors could extend beyond corporate stakeholders, permeating deeper layers of society. Consider the scenario where a biased hiring algorithm systematically disadvantages certain groups, perpetuating societal inequality. Similarly, an erroneous predictive policing model could perpetuate societal prejudices and disproportionately target certain groups.

These mistakes could introduce new types of errors into our systems, often slipping past our notice due to their unfamiliar nature.

Regulation and governance

Regulatory bodies worldwide are striving to establish comprehensive frameworks to address the challenges and opportunities presented by this transformative technology. The European Union, the United States, Canada, and the United Kingdom are at the forefront of this movement, each crafting unique proposals.

The EU’s proposed AI Act is comprehensive and far-reaching, targeting providers who are introducing AI systems into the market or service within the Union. This wide-ranging approach suggests that the Act will encompass a vast array of AI applications. The Act also proposes substantial penalties for non-compliance, potentially up to €30 million or 6% of total worldwide annual turnover, underscoring the seriousness with which the EU views this issue. The Act may follow the footsteps of the General Data Protection Regulation (GDPR), which has had a significant impact on data privacy practices worldwide.

In the United States, the White House’s proposed Blueprint for an AI Bill of Rights offers a set of non-binding guidelines applicable to both public and private sectors. This approach indicates a more flexible, principle-based approach to AI regulation. However, unlike the EU’s proposal, the U.S. blueprint does not specify any financial penalties for non-compliance.

Canada’s Draft Artificial Intelligence and Data Act takes a targeted approach, focusing on companies that design, develop, or manage high-impact AI systems. This legislation proposes penalties of up to CA$25 million or 5% of global revenue for non-compliance, indicating a serious commitment to enforcing AI regulations.

The United Kingdom is also making strides in AI regulation. The U.K.’s AI White Paper emphasizes regulating based on the outcomes AI is likely to generate in specific applications, indicating a flexible, outcome-oriented approach to AI regulation. However, like the U.S., the U.K.’s proposal does not specify any financial penalties for non-compliance.

Despite the differences in these proposals, the message to organizations worldwide is clear: the era of unregulated AI is nearing its end, and a new era of accountability and oversight is on the horizon. As these regulatory frameworks continue to evolve, organizations must stay abreast of the changes to ensure compliance and avoid potential financial penalties.

The recently published study “Do Foundation Model Providers Comply with the Draft EU AI Act?” by Stanford University researchers reveals that major AI foundation model providers, such as OpenAI and Google, largely fail to comply with the proposed EU AI Act. The Act mandates providers to disclose comprehensive information about their models, including data, compute resources, deployment practices, and key characteristics. However, providers often fall short, particularly in areas like the use of copyrighted training data, hardware and emissions in training, and model evaluation and testing. This lack of compliance underscores the global challenge of AI regulation, as the EU AI Act sets a precedent for AI governance worldwide.

Foundation model provider’s compliance with the draft EU AI Act — by Stanford Research
Foundation model provider’s compliance with the draft EU AI Act — by Stanford Research

The forthcoming global AI regulations present significant challenges for enterprises, primarily in auditing and governance. Auditing Large Language Model based workflows poses an intricate task due to their sophistication, versatility, and often opaque nature, demanding a comprehensive review of the AI infrastructure and a fresh auditing approach. Alongside this, establishing robust AI governance within enterprises emerges as a major hurdle, requiring frameworks that effectively manage technical, ethical, and legal aspects of AI while ensuring transparency in model development processes and appropriate management of risks. As we enter a more accountable era of AI, these challenges underscore the need for fundamental shifts in enterprise approaches to AI, demanding swift adaptation in the face of extensive change. The imminent era of regulated AI will test the resilience and agility of organizations worldwide.

Control

As AI systems attain increased autonomy and become more intricately entwined within business operations, the importance of robust control measures escalates. Although individual generative AI assistants under human supervision might present fewer issues, deeply integrated autonomous AI agents could pose substantial risks if not meticulously controlled. Recognizing and addressing these control challenges is fundamental to effectively leveraging the power of AI and ensuring its seamless alignment with an organization’s strategic objectives.

Steerability, the ability to direct an AI system towards desired outcomes, is a central requirement for any AI, but it becomes particularly critical for autonomous systems that have greater independence in decision-making. While a simple generative AI assistant might be sufficiently controlled through human oversight, an autonomous AI system needs robust mechanisms, such as natural language feedback, model retraining, and parameter tuning, to guide its behavior and align it with organizational priorities. A lack of steerability in these autonomous systems could lead them to optimize for goals that are harmful to key outcomes, making them unreliable for decision-making processes.

Alignment, or the degree of match between an AI system’s objectives and those of its human operators, also gains importance as AI systems are more deeply woven into business operations. A standalone AI assistant with limited capabilities might not significantly impact alignment, but autonomous systems that directly impact key metrics must be calibrated to the organization’s needs and values to ensure business success. Misaligned, integrated AI systems could significantly disrupt performance or adversely affect the organization’s reputation. Achieving alignment is challenging and requires iterating with executives, subject matter experts, and end users, but it is necessary for the adoption of AI in business. Lack of alignment poses risks that far outweigh the benefits of the system.

Interpretability, the ability to understand an AI system’s reasoning and decision-making processes, becomes a more pronounced challenge with autonomous systems. As these systems increase in complexity, their decision-making processes and internal logic can become less transparent, making it harder for organizations to comprehend why specific decisions are made. This opacity in decision-making, often referred to as the ‘black box’ problem, can lead to unpredictability, creating potential risks and uncertainties. However, when an organization cannot fully understand the reasoning behind an AI system’s output or action, it makes it difficult to predict, and therefore manage, the system’s behavior, particularly in the case of unexpected outputs or decisions. According to the European Union’s proposed Artificial Intelligence Act, for example, high-risk AI systems should be designed and developed in such a way as to ensure that their operation is sufficiently transparent to enable users to interpret the system’s output and use it appropriately.

Observability is integral to the effective control of autonomous or deeply integrated AI systems. It is not only vital for business leaders to assess the impact of AI on their organization’s performance, but it is also crucial for data scientists, engineers, and regulatory bodies. It enables technical teams to troubleshoot, optimize, and ensure the system’s reliability and allows regulatory bodies to confirm compliance with industry standards and legal regulations.

As these new generation models gain greater autonomy and more profound integration within business operations, testability emerges as a formidable control challenge. Testability, in this context, refers to the consistent and reliable evaluation of an AI system’s performance against specific benchmarks. The complexity of this process escalates with LLMs’ advanced capabilities, such as interaction with functions or API calls. Unlike conventional IT systems, which operate deterministically, autonomous AI systems could depend on probabilistic reasoning, leading to variable outputs. This variability, coupled with the intricate interactions with external systems like databases, other software, or cloud services, propels testability from a simple model validation to a more complex integration testing. This kind of testing uncovers any anomalies that may arise from the AI’s interactions with other systems, ensuring the AI agent’s seamless functionality within the broader operational infrastructure. The criticality of testability is further accentuated in the context of autonomous AI agents, as these systems make impactful decisions with minimal human intervention. As a result, testability is not just about validating AI functionality but also about asserting consistent performance, safety, and robustness under diverse inputs and changing conditions. Moreover, it calls for validating the AI’s learning efficacy, the clarity of its decisions, and its alignment with organizational objectives, while also effectively managing potential failures during external interactions.

And there you have it, the first part in our exploration of the challenges surrounding the adoption of LLM AI in an enterprise context. Stay tuned for the upcoming installments, where we’ll continue to delve into this fascinating subject. Coming up some challenges related to quality, integration, humans and organizations, the nature of work, integration etc.

--

--

Szabolcs Kósa

IT architect, digital strategist, focused on the intersection of business and technology innovation