Talk Gen AI: Generative AI in the Enterprise

Arte Merritt
TalkGenAI
Published in
8 min readJun 12, 2024

Generative AI has taken the enterprise by storm — enabling companies to develop new products, services, and processes for both customers and employees.

Businesses are now moving from experimentation to production, and with that comes new challenges and concerns.

At Talk Gen AI, I had the opportunity to discuss how enterprises are leveraging Generative AI with a panel of industry experts, including:

Why are enterprises implementing Gen AI

Generative AI is truly a disruptive technology that brings great promise to the enterprise and consumers. It is a phenomenon we have not quite seen since the 1990s Internet.

Michelle Sohn of AWS provides an analogy to the impact of the Gutenberg Press. It enabled more people access to books and the ability to publish books — a revolutionary, knowledge-distribution technology for its time. Similarly, Gen AI is unlocking and democratizing access to very powerful technologies to interact, collaborate, and generate new ideas.

Gen AI enables enterprises to unlock customer user experiences that were almost impossible prior to Gen AI. As Casey Phillips of eBay explains, with Gen AI, enterprises can provide value to customers in ways they had not been able to before, even if they hired more resources. Now enterprises can grow their business and top line, acquire new customers, and improve the bottom line.

The reasons for implementing Gen AI may not have all started with use cases in mind to improve customer experiences or employee productivity. As Vera Vetter of Salesforce points out, there was a bit of Fear of Missing out (FOMO) in play. Businesses, large and small, may have felt left out if they did not jump on the Gen AI train.

Crossing the chasm to production

While 2023 was the year of experimentation, 2024 is the year enterprises are starting to move to production implementations of Generative AI.

As Mahak Sharma of Google points out, last year, often the use of Gen AI in an enterprise was mandated by the C-Suite to do something with it, but no one really knew what to do with it. This resulted in a lot of experimentation and balancing of the potential value versus the risks in bringing that value.

Now the thought process is shifting to what is the business strategy that determines the AI strategy.

In terms of use cases, on the external side, there are a lot of enterprise use cases around customer success — like automated chatbots and agent assist capabilities.

On the internal side, we are in the midst of a “productivity” revolution. There are so many tools to help employees with internal productivity especially in the go-to-market (GTM) funnel with marketing content, sales enablement, training, outreach, and more. There are also AI tools for emails, calendars, and code generation.

These internal tools are becoming possible because there is now an easier way to put all an enterprise’s structured and unstructured data in a place that can be accessible by Large Lange Models (LLMs). Employees can now ask any questions about the enterprise data.

A lot of low-to-medium skill-set tasks could be replaced. However, as Sharma points out, anyone who is an excellent writer, coder, or artist cannot be replaced.

Democratizing access to technology

The penetration and availability of Generative AI tools are helping to level the playing field in working with AI technologies.

In the past, as Vetter points out, if an enterprise team wanted to do something in AI/ML, it was difficult to get started, unless one had a PhD or access to a data science team. Now, folks have the ability to play around themselves, experiment, and prototype on their own. It is a tremendous advantage.

However, there is a cost. As Vetter adds, while Gen AI has provided more accessibility, the folks using the tools may not have experience bringing their experiments to production. They just see their prototype working and want to ship it.

There are more layers to being enterprise production ready, though. Teams have to consider data security, legal, and ethical aspects, as well as strategies for evaluating and maintaining models.

Data security is paramount

How an enterprise’s data is being used by an LLM, and who has rights and access to it, are key concerns for enterprises. One really needs to review the terms of service of the model provider.

This is one of the reasons enterprises may choose to host their own LLM — to ensure they have control over the data.

Ebay, for example, has an internal hosted version of ChatGPT. This way employees are not exposing internal Intellectual Property (IP) to the external, public version of ChatGPT. The caveat is the version may not be the latest public version, depending on how frequently the enterprise updates the model.

As Sohn explains, enterprises need to have a data strategy in place on how data is being handled securely across departmental silos.

Fine tuning models

Whether to build an in-house model, use an out-of-the-box model, or fine-tune on top of an existing model, depends on the enterprise and its business use case.

An out-of-the-box model may not be “good enough” for a particular use case. It depends on the level of tolerance to that “not good enough” an enterprise may have.

At the same time, as Sharma indicates, another issue with the standard out-of-the-box models, is that a business may not need that big of a model for every application.

However, the level of effort to build a model from scratch is not trivial. As Vetter explains, it can require massive investment. Inference is very expensive and the majority of the cost. One has to consider the business justification. Making that amount of investment should justify the risk and value back.

When the use case involves highly sensitive data or regulatory concerns, as in finance and healthcare, fine-tuning or custom models may be more justified.

Accuracy is important

Accuracy of Generative AI responses is a concern. If you have experimented with Generative AI, you are most likely familiar with “hallucinations” — i.e. when an LLM responds with inaccurate or fictitious information.

In addition to custom built models or fine-tuning mentioned above, there are other options to help improve accuracy including “grounding” and Retrieval Augmented Generation (RAG). Grounding involves building a system that fact checks your answer — “grounded” against a database that is the authority on the topic. RAG involves pointing to a particular set of content and documents to limit the source of responses.

Context is key.

It is important that the responses are contextually aware and factually correct. For example, Sharma referenced a case where someone asked an LLM if it is okay to smoke when pregnant, and the model responded it was okay, as it was referencing an article from a country that found it acceptable, whereas in the US it is not. It is important to ground on data specific to the culture, country, or context.

Currently the methods to improve accuracy tend to be very manual. Having a large context window can help, but can also be more expensive. At the same time, having a human in the loop providing common sense in evaluating the answers can help, as Sohn points out.

The level of accuracy may depend on the use case. As Vetter explains, when evaluating accuracy in Generative AI, one needs to be aware of what “good enough” is. In a highly regulated environment like insurance or finance, the tolerance might be quite low. However, in a creative environment there may be higher tolerance. For example, a user may be more open to seeing multiple drafts of something and using those for inspiration.

Responsible AI

In addition to ensuring factual accuracy, it is also important to consider responsible AI.

Responsible AI teams are looking at cases when a customer asks something completely out of bounds. If someone asks something outside the scope of the use case, how do you handle it? If someone swears, how do you make sure the LLM does not swear back?

As Phillips illustrates, if one has a Gen AI tool for retail e-commerce, would you even attempt to answer a question on politics? What really is the benefit in answering that?

Another way to improve accuracy is limiting the user prompts to the domain of what your Gen AI tool is trying to solve. One way to do that is to have an intent classification layer to see if the input is within the bounds of the use case, as Phillips explains. It will not only help save inference costs, it will prevent the case of providing an incorrect answer to a question there is no real value in your company answering anyway. It is another form of guardrails.

The future of Gen AI

Each panelist was asked about what they saw coming next in Generative AI.

Phillips envisions a lot more being done with image and video in Generative AI. A lot of what enterprises are doing today is text-based, but that should change in the near future.

Vetter sees a new emphasis on Gen AI agents and agent systems, along with Large Action Models (LAM) and an agent ecosystem. An agent can perform a task or set of tasks on behalf of a user, increasing the level of autonomy AI would have to help achieve a user’s jobs-to-be-done.

She is also excited about fundamentally new architectures being created — i.e. what comes next after the transformer models? While the research is currently more in academia, it will be interesting to see what new types of AI might come about.

Sohn is seeing more interest in Gen AI applications and processing at the edge. She is also excited to see more personalization at the consumer level, in having AI write text in your own voice or creating videos of you speaking.

Sharma also sees more interest in agents. Part of the reason is the current interaction model with LLMs is not how humans really interact. It is currently more one-on-one, when it needs to be multi-turn and multi-step — more like natural conversation.

She is also looking forward to more democratization in this space — making Gen AI more globally accessible. Currently a lot of the research, companies, and funding is here in the Bay Area — it would be good to see that spread more globally.

Watch the video

Arte Merritt is the founder of Reconify, an analytics and optimization platform for Generative AI. Previously, he led the Global Conversational AI partner initiative at AWS. He was the founder and CEO of the leading analytics platform for Conversational AI, leading the company to 20,000 customers, 90B messages processed, and multiple acquisition offers. He is a frequent author and speaker on Generative AI and Conversational AI. Arte is an MIT alum.

--

--

Arte Merritt
TalkGenAI

Conversational AI & Generative AI Entrepreneur; Founder of Reconify; Former Conversational AI partnerships at AWS; Former CEO/Co-founder Dashbot