If you’re plugged to the same LLM as everyone, how do you build a competitive advantage?

Published in

Explain AI

6 min readMay 2, 2023

A great strategy for B2B AI start-ups is to build “vertical AI” products: become the AI assistant to a specific industry, say to lawyers, financial services, doctors. Sounds simple: you harness the superpowers of LLMs and apply them to the specific data and use cases of your pet industry. This is what we do at Explain: Explain is the AI assistant to professionals working with the public sector (infrastructure, utilities, real estate, construction, etc.), helping them dig for key information within super-boring calls for public tenders, summarize official reports or write a draft for a building permit.

But here’s the twist: if you’re plugged to the same LLMs as everyone, how do you build a moat, i.e. a defensible competitive advantage? And how do you not become dependent on your LLM provider and make sure the value created goes to you instead? Based on our experience, we’ve found three ways that work.

Moat n°1: boring workflow automation wins over clever insight generation

Standard B2B strategy: if you become embedded in your users’ processes and workflow, you lock them in. This still holds true for AI products, and for us it has meant focusing on process automation over insight generation.

A motto within our team is “Boring reading and writing is for machines”: we sometimes fret over the upcoming rise of AI overlords, but as of right now most of AI is still about automating boring, repetitive tasks that are easy for humans to do once, but impossible to do thousands of times. What LLMs do is largely expand the range of what can be automated. One of our clients wants to keep track of the winner and losers of every public tender in their industry; for each tender, a small report is published by the local authorities and the client team has to extract 3 pieces of info; but there are 100 000 tenders reports every year, the info is written in a semi-structured format and must be imputed in the client’s CRM in another format. Massively boring, automatable with AI and very much in the workflow!

For us, these use cases work better than complex scoring, trend analysis or great data visualization, which shine but don’t stick.

Moat n°2: you need some data of your own (LangChain is not enough)

The LLM is what gets us all excited, but without some data of your own, the LLM does not matter. It can be data you’ve publicly sourced and built into a database (news, official documents, legal, financial reports), it can be your users’ data (their archive of internal memos or previous responses to public tenders), but you need a privileged access to data that contain relevant information for your users. This is why, at Explain, our first step was building our unique database of public documents, that we scraped around tens of thousands of websites and then cleaned, classified, and structured. Then comes the LLM.

To apply the LLM to your data, you’ll likely use some version of the LangChain package, that allows you work on your own data by feeding it to the LLM as a part of the prompt, instead of simply talking to GPT and letting it answer based on its parameter’s weights. So your stack will look like this: based on your database, you’ll extract relevant parts of your data (retrieval phase) ; then, you’ll feed these parts to the LLM in the prompt and phrase the task (prompt engineering phase). Many recent LLM-based tools use this type of architecture (described for instance in this paper) : the new Bing or tools that allow you to use LLM in your browser, on a pdf, or within your OS (I’ve for one been playing around with Dust, Bearly and Monica).

This is a really powerful stack, which avoids many of the hallucinations and mistakes LLM falls victim to if left to their own devices. It is also very flexible and within our team it has become the default response to many traditional NLP tasks (sentiment analysis, entity recognition, summary, etc.). One aside: if you have a current NLP stack that is built around many, specialized, non-LLM based algorithms, you have NLP debt and you might not have a competitive advantage anymore.

But the key point here is that while the superpower features come from the LLM component, the moat comes from the database component. So if you haven’t built a unique database first, LangChain won’t be enough.

Moat n°3: fine-tuning the LLM wins over prompt-engineering for scalability

None of the previous two moats came from the LLM component per se. This one does.

One limitation to the previously described, retrieval + LLM architecture is that it risks being non-scalable. Sure, writing cleverly engineered prompts and calling the GPT-4 API does work, but it can be very costly: with GPT-4, it cost us 3 dollars to build the list of the top 50 local officials favorable to wind power in one region. No way we could put it in the hands of users.

According to research conducted by Explain’s CTO Guillaume Barrois, the stack that will work for us is the following.

Instead of using the latest, most expensive LLM (GPT-4), we use a smaller model (so far we’ve used open sources models from the Hugging Face library). Costs are 10 to 100 times lower.
Then, to increase performance, we fine-tune it to our use cases based on our documents: we feed it training sets of elected officials statements about new infrastructure projects or 3-line summaries of one-page official decisions.
And here is the magic: the training sets used for fine-tuning are quite large (up to 100 000 examples), but we don’t build the training set by hand: we use GPT-4 instead. One of the under-radar achievements of current LLMs is that they achieve human-level performance on many, low- to medium-complexity tasks: as one college-level teacher reported to me, GPT-4 is better at summarization than 80% of undergrad students. So, for such tasks, instead of a human labeler you can use GPT-4 to build the training set, and then feed it to the open-source LLM for fine tuning. You can build a state-of-the-art, automated stack dedicated to fine-tuning any LLM.
And so you end up with a cheaper, smaller, quicker LLM that works, for your tasks, 75% as well as the last-gen flagship LLM for 1% of the ressources.

This strategy is well described in the Stanford Alpaca paper, and based on our tests it seems the right one for us. Tools are popping to help you do it easier. Sure, it might not work for high-complexity or high-variance tasks, but we believe it’s the way to go for B2B use cases, where users tend to repeat a limited number of tasks across a large number of instances.

We also believe that this strategy has a much better cost-to-performance ratio that the “hardcore” route of training your own LLM from scratch based on your data, such as reported by Bloomberg lately. It’s unclear to me what part of that effort was about efficiency vs. R&D showing off their LLM skills, but we have our money on the fine-tuning route instead. Leaks from inside Google agree.

This finetuning stack is not about cost optimization: it allows for a completely different user experience, since it allows you to unleash the power of the LLM for all of your core tasks and it quickens drastically the inference performance. And this advantage is defensible because it is based on your data and your knowledge of use cases.

And this stack will limit your dependance towards one particular LLM provider: the core of your stack is the fine-tuning component, and it can be plugged on many different, general-purpose LLMs.

At Explain we very much believe in a vertical AI strategy for SaaS companies. We use both product-driven and tech-driven strategies to build strong barriers to entry: know your customers well and get in their workflow; invest in having a unique database; build a scalable LLM stack fine-tuned for your data and use cases.

If you’re plugged to the same LLM as everyone, how do you build a competitive advantage?

Moat n°1: boring workflow automation wins over clever insight generation

Moat n°2: you need some data of your own (LangChain is not enough)

Moat n°3: fine-tuning the LLM wins over prompt-engineering for scalability

Written by Arthur Muller