Delivering Business Value with AI

Published in

Pi Labs Insights

10 min read1 day ago

Artificial Intelligence has a long history; people have been wondering about giving machines human-like characteristics since the dawn of computer science.

Over time, people have interpreted the details of what this means in different ways, but much current activity is connected to ‘generative AI’, that is, AI which can generate new content as well as interpret existing data, with a particular focus on natural language applications, i.e. tools which process human languages rather than structured data and computer code.

This has been driven by a series of technical advances in neural networks, which have produced a series of Large Language Models (LLMs), the most well-known of which is the GPT series of models from OpenAI.

There has been a lot of discussion of the potential for generative AI to disrupt employment, but recently we’ve also started seeing articles questioning whether early AI projects have really delivered the expected business value. In my role as Pi Labs’ technology venture partner, I spend a lot of time thinking about how to build products and businesses on top of AI. I speak to many founders looking for feedback on their ideas and many people at established businesses thinking about how to deploy this new technology.

'Jobs may disappear': Nearly 40% of global employment could be disrupted by AI, IMF says | CNN…

Almost 40% of jobs around the world are at risk of being affected by the rise of artificial intelligence (AI), a trend…

edition.cnn.com

In this article, I’ll argue that there is significant business value to be
unlocked, but only if care is taken to deploy AI correctly. I’ll also
make the case that the best businesses in the space understand the strengths and limitations of the latest models, focusing on specific tasks within broader workflows, and use deep expertise in their industry domains together with proprietary data sets to create true defensibility, even though they’re building on underlying models which are widely available. I’ll focus primarily on natural language applications, but the same thinking applies in other disciplines, such as computer vision, which I’ll also briefly mention towards the end.

My target audience in writing this article is primarily business leaders who
are thinking about how to deploy AI within their organisations through either internal or external innovation, but hopefully, it’s also of interest to
the broader community, including entrepreneurs and investors.

LLMs as a platform for new products

There’s a good chance you’ve used ChatGPT, the conversational interface built on top of the GPT. It’s presented as a text messaging tool, similar to WhatsApp or Microsoft Teams, and ChatGPT itself is presented as an ‘agent’ you can talk to.

ChatGPT is mainly understood as a stand-alone product. You may have used it to generate text (‘write me a job application letter’) or to summarise knowledge (‘tell me the five youngest presidents of the United States’). As GPT has been trained on large quantities of data from the internet, it possesses a certain level of factual knowledge out of the box. LLMs are also capable of basic reasoning and have been shown to be capable of writing computer code — functionality which we’ll later see to be very useful. We’ll come back to the important question of how good they are at these tasks later, but the key point here is that the models are capable of many different things, and therefore, AI shouldn’t be viewed as a monolithic technology which is always applied in the same way.

Despite all of this, ChatGPT is, however, a proprietary product owned by OpenAI, so how can it help other companies produce innovative products?

As well as the chat interface, GPT has an API that allows it to be integrated into third party applications. This allows applications such as Retrieval Augmented Generation (RAG), which combines the LLM’s summarisation and reasoning abilities with more traditional search engine technologies to use proprietary data sets. LLMs are also capable of using tools, where the LLM’s output can include requests to use other information systems — e.g. looking up items in a product catalogue — which can then be fed back into the LLM to be summarised or for key information to be extracted. As LLMs can be used to write code, this code can also be automatically run, which enables use cases in areas traditionally covered by analysts and data scientists, e.g. ‘produce a graph of the last ten years of sales data broken down by region.’ Again, we’ll leave aside for now the question of how good they are at these tasks.

Within the third-party application, all of the above can be presented to the user through a chat interface, but it doesn’t have to be. Often the LLM is behind the scenes, never directly exposed to the user. For example, as LLMs are capable of summarisation and simple reasoning they’re good at information extraction, for example, as specific application could use LLMs to pull environmental data from company annual reports and use it to populate a structured database for compliance purposes.

What are LLMs good at?

Let’s return to the question of the performance of LLMs on individual tasks.
Surprisingly, we know less about this than you might think. This is because
evaluating LLMs is extremely hard. Evaluation requires the creation of high
quality objective tests — not dissimilar to a school exam or a psychology
experiment. These can’t be based on public data, as it’s very likely that the
LLM will have seen that data in training. Thorough evaluation of LLMs is
extremely interesting and will be the subject of many future PhDs, but it’s not a straightforward process right now, and we, therefore, cannot say we have a full understanding of exactly what the models can and can’t do.

We also don’t have a complete model of cognitive ability and human
intelligence. LLMs are definitely capable of impressive performance on some cognitive tasks and still appear to be getting better, but even when direct comparisons to humans are made, they’re often at a level you’d expect of a child. LLMs also fail in ways we don’t expect from humans, e.g. ‘hallucinations’ where factually incorrect answers are confidently resented and outside of verbal intelligence (for example, visual and tactile tasks)
they’re less mature. They also don’t yet have lived experiences in the way that humans do and don’t participate in social interactions. Given all of this, in my opinion, it’s a real stretch to talk about being anywhere close to Artificial General Intelligence (AGI).

Risks and rewards

What does this mean from the perspective of evaluating product and business ideas? In short, risk levels are probably higher than you might think.

As with any R&D activity, success comes from giving yourself enough dice rolls. Startups are funded with this in mind, but the level of funding needs to match the risk profile of the project, so it’s important to get this right. The first step to success is to be realistic about the level of R&D required.

Improvement of the underlying models is extremely expensive — it takes hundreds of millions of dollars to build one from scratch, and there are all sorts of practical issues, such as the availability of large enough quantities of hardware and negotiating access to proprietary data sets. It’s also surprisingly slow — although new models come out every few months, much of the improvements are driven by increased scale, and the current generation of model architectures is the result of academic research which has played across a global community over a time-frame measured in decades.

Given this, the reality is that you’ll be building a product on top of someone else’s model with very little scope to modify or control it. There are some
levers — models can be fine-tuned, and there’s a whole discipline of prompt
engineering, but these are often less defensible and less effective than you
might think, so caution is advised when these are presented as differentiating factors.

Risk comes from making unjustified assumptions, for example, that LLMs are capable of ‘cloning’ a human or that they’re capable of writing code well enough to be used by non-technical users. Both of these things might be true, but they can’t be taken as given so anyone attempting to build a business which relies on them needs a plan (and funding) to properly address the risk, which involves addressing the issues with evaluation described above as well as making improvements to the underlying technology. There are also product acceptance risks — for example, assuming that users prefer a chat interface for a particular application or that users will trust the output of an LLM.

So, what are more reliable sources of defensibility? One option is proprietary data, which may be largely separate from the development of the underlying technology. This could be an existing corpus, which might be available at a publishing company, data which is built up during routine business operations, or a curated dataset which is constructed as an investment.

In many cases, a stronger source of defensibility is deep industry expertise.
Typically this will include generative AI as part of a broader product offering, and it will typically be behind the scenes. The reason for this is
that it enables the LLM to play a specific role in a larger business process,
but to do it quickly, repeatably and at scale. Examples of this include
information extraction, summarisation and document classification, all of which are relatively simple NLP tasks in the general case, but where applications have to be tailored for specific use cases. Examples of this within the Pi Labs portfolio are Alrik, who use generative AI to extract data from order slips as part of their logistics system, or Airly, who use it to automatically generate environmental reports as part of a platform built using a deep understanding of specific industry processes.

Interestingly, one of the better-regarded recent AI announcements, Apple
Intelligence has many of these characteristics — deployment of AI for very specific tasks designed on the basis of deep industry experience (which, in
Apple’s case happens to be the consumer IT user experience).

It’s still just product development.

The best startups take an agile approach to product development, seeking
customer feedback early and often and focusing on customer value. The same applies to generative AI. When using advanced technology it’s tempting to think in terms of R&D projects, working in isolation using proxy metrics such as model performance on test datasets, but this is unnecessary and is a mistake. As we’ve seen above, it’s easy to make assumptions about technology translating into customer value, and these assumptions need to be tested as thoroughly as possible, otherwise you’re creating a solution looking for a problem.

With LLM products it’s also important to be thorough with research processes, which means thinking carefully about evaluation and being methodical with testing hypotheses. Doing this quickly requires a lot of discipline, but the alternative is flying blind without clear data confirming that effort is pushing things in the right direction.

LLMs aren’t the only game in town

Although it’s easy to forget sometimes, there’s also a lot more to Machine
Learning and AI than just LLMs.

LLMs are one aspect of the deep learning revolution. Deep learning uses large-scale neural networks and has benefited in general, from advances in
computation (such as GPU chips originally developed to accelerate computer graphics) and larger data sets, which have, in turn, created a flywheel effect with new mathematical discoveries along the way.

Outside of NLP, Deep Learning is most well-known in computer vision
applications. This includes generative image models but also includes more traditional computer vision applications such as object detection and
recognition. It was progress in object recognition which first put deep
learning on the map before the first generation of foundational language
models were developed. Trunk, one of the companies in the Pi Labs portfolio, uses this technology to read building floor plans as part of their comprehensive platform to support modular construction.

In general, Deep Learning is very flexible, and component networks can be
combined together for more sophisticated tasks, e.g. for control application
(self-driving vehicles), or optimisation (for example, in structural engineering). CONXAI, another of the Pi Labs portfolio companies, has built a flexible platform that makes this technology available for construction use. As with natural language processing, there are widely available foundation models which can be included behind the scenes in specialist applications, leveraging some combination of domain knowledge, proprietary data sets and market access.

There’s also a large stable of more traditional machine learning methods,
dating back to the 1990s, but still relevant in many applications. This is
especially true where a strong, proprietary data set is available, or for very
specific tasks within specialist applications. These methods are very easy to
implement — most are available as open source software libraries, can be
deployed very quickly, and do not require the substantial (and expensive)
computational resources needed by LLMs and some other Deep Learning Models. The visibility of LLMs has convinced some people that they’re the right choice for everything, and in many cases, an older, simpler approach may well be better.

Conclusions

AI is and always has been, an exciting field and real advances have been made in the past few years, particularly in computer vision and NLP. Unfortunately, there’s also a lot of noise, particularly around superficial applications of chat interfaces and talk of AGI, which makes it harder to understand how to best use it. A deeper look reveals that there are valuable products to be built, but the best applications are highly targeted, making use of deep industry knowledge and proprietary data sets rather than general-purpose tools.

The teams producing these products combine domain experience with deep technical skills, and they use many of the same agile product development processes as the best companies in traditional SaaS applications. Once all of this is in place, as it is in the Pi Labs portfolio, expect impressive progress to be made.