Key Technological Trends Shaping Enterprise GenAI in 2024

Now that most .com companies have renamed themselves as .ai in 2023, the trillion-dollar question is: what comes next? Which key obstacles, tools, technologies, and methods will reshape the landscape? So here are my top 7 predictions on where Enterprise GenAI tech is headed in 2024.

Kunal Sawarkar
Towards Generative AI

--

Enterprise AI is different in that it is focused on measurable, governed output which companies can control and associate their brand with. This year will be dominated by technical tools to make GenAI overcome challenges to become a reliable tool rather than just potential.

Data scientists are really bad at predicting the future in their own fields (pun intended!). So take these predictions with a 95% confidence interval :)

Image Generated by Dall-E

1. Turning of Capex Cycle

There isn’t an enterprise out there that hasn’t invested in GenAI by now. Sometime this year, those investments will start demanding returns. The capital expenditure (capex) product cycle for new tech typically follows an 18-month cycle: 18 months to invest, 18 months to find the market, and then decide whether to further develop to reach top spot or invest elsewhere. Building GenAI PoC is the easy part, but taking it to production is incredibly challenging, given issues with hallucinations, governance, lack of evaluation standards, and architectural mess.

By the fall of 2024, we will see companies attempting to find product-market fit for everything they started embedding AI into last spring. This means that some will miss the mark, and the boys will be separated from the men. It would mean that companies that don’t find product-market fit will face difficult decisions regarding their investments and the teams working on them.

Does this mean that the GenAI bubble is going to burst? Far from it. However, we will see consolidation by enterprises to focus not just on PoC projects but on what they are taking to production by the end of this fall. And those who don’t make the cut will have to either show the money or divest.

2. Inferential Optimization

The area that is most ignored in most GenAI PoCs so far is the cost of running these huge LLMs based inferences. It may appear to have a small unit price, but costs add up quickly for even a mid-size company with a few million inference calls.

SemiAnalysis, a newsletter that covers the chip market, estimated in February that OpenAI spent $0.0036 to process a GPT-3.5 prompt. At that rate, if Google were to use GPT-3.5 to answer the approximately 320,000 queries per second its search engine receives, its operating income would drop from $55.5 billion to $19.5 billion annually. In February, Google cited savings on processing as the reason it based its Bard chatbot on a relatively small version of its LaMDA large language model.

The big push for this year will be to develop methods for inferential optimization. It will be an area that can balance the cost-benefit tradeoff to serve inference by taking into account various factors like accuracy, ROI projection, underlying on-premises vs cloud infrastructure vs SaaS model, size of LLMs, and whether it should be RAG or fine-tuned or a custom-tuned model.

It’s a complex problem, and AI leadership will need to think ahead before they experience a billing shock after putting GenAI into production. Watch out this space for new frameworks being developed for Accelerated Computing, which can run models faster or at lower compute requirements.

3. So long ‘Prompt Tuning’ (Welcome Back Fine-Tuning)

Prompt tuning is a great technique to establish baseline, but not good enough to build production-grade solutions. Prompt tuning is a zero-shot learning method which does not adjust the values of the tensor. It means it will not learn anything specific to your enterprise dataset. After all, if your chatbot sounds just like everyone else, why should anyone prefer you over them? What makes your GenAI unique?

While prompt tuning can show quick value without the need to train any models, it comes with inherent limitations as well. You will end up with a solution that is too costly for inference and too generic. Fine-tuning can address this.

For any company that wants to add value to its customers using GenAI, it needs to bring the uniqueness of its data. And the way to do so is either by RAG (Retriever Augmented Generation) or Fine-Tuning. The debate between RAG vs Fine-Tuning is a false one as it depends on the use case and the kind of data you have. RAG is a much better choice for Generative Q&A while Fine-Tuning is suitable for other problems like text2sql.

While Prompt Tuning will live-on as a method, it will be a step in the process rather than the end of the GenAI process itself. For any enterprise that has a sizeable amount of data, it makes more sense to use LLM embedding as a base model and then build a custom or fine-tuned model on top, which is lighter, cheaper, and better.

Watch out for a new space being created for “AI Middle-ware” with higher abstraction frameworks to easily fine-tune models (like SuperKnowa).

4. Rise of SLM’s, MLM’s and VSLM’s

You don't need a missile to kill a mosquito!

Continuing with the previous themes, you can obtain better and cheaper models if you know how to tune or train a smaller model to your data. This will drive a newfound interest in SLMs (Small Language Models) and MLMs (Medium Language Models), which typically have less than 7B parameters and can easily fit on a single GPU. I had also previously written about this in my blog “Why Bigger Isn’t Always Better.

Experimenting with larger models is not only expensive but also slow. It takes a long time to run experiments on a 20B LLM. Another key driver is the shortage of GPUs, which will persist throughout 2024.

I anticipate that we will see the rise of even Very Small Language Models (less than 1B parameters) that one could fit on edge devices to embed GenAI into hardware as well as quickly custom train without the need for GPUs. Embeddable NLP libraries already exist for various NLP tasks, which internally use 200–300M parameter models.

5. Multi — Multi — Multi

Multi-model, multi-modal, and multi-cloud — these are the terms you will hear multiple times this year. Orchestrating these many options for any enterprise is going to create a new range of problems. This year, AI leaders will need to solve a few challenges:

  • How many distinct LLMs does an enterprise want to support in its architecture? (Remember, with GPU shortages it takes 6 GPUs just to load & infer a 34B parameter LLM). Do we want to support all, or pick more smaller models, or a few medium-sized ones, or one large and one small?
  • Running on one cloud is risky and expensive, so how do we manage workload across multiple clouds versus on-premises LLMs? Often, on-premises model deployment saves a lot of money in the long term and offers much better control over governance and output.
  • Multi-modal — Combining text with images, videos, and audio will be a theme to watch out for. Deployment of these multi-modal models is still an area that needs tools development.

6. Achilles Hill- ‘AI Governance’

If there is one area that gives sleepless nights to top executives when it comes to GenAI, it’s AI Governance. They don’t know when their chatbot application will spew incorrect information and they will have to pay a fine (like in the Air Canada case), or when it will blurt out hateful content and be pulled out (like Google), or how to comply with the new set of regulations like EU laws.

The answer that all companies want to know is what it will do once it’s put into production. The yet unsolved problem is “how to measure” it. There is no easy metric or solution to measure hallucinations, especially when it comes to domain-specific data that does not have a universal ground truth (like policies). Another challenge is tooling for compliance, just as it is for auditing purposes. Making black boxes open and predictable is a challenge that’s keeping many from pushing their GenAI PoCs to production.

Expect a lot of discussion on this problem in both private and public spaces. Hopefully, some laws will pass in the US Congress and an acceptable set of universal standards (similar to those for pharmaceutical drug trials) will be adopted by the industry. Much of it is wishful thinking, of course, and until then, tools like WatsonX.Governance will be indispensable.

7. Return of the Jedi (aka Data Scientist)

Last year, it looked like being an AI Engineer was all you needed to do AI (with some misleading new reports claiming that being an AI engineer can pay you $900K and $400K without the salary).

The last but perhaps slow-to-realize change this year will be the realization that Data Scientists are still very much needed to push GenAI from potential to potency.

First, what’s the difference between the two? Traditional ML required training models, where Data Scientists would bring in their stats and algorithms expertise to fit the model to your use case. With LLMs, you already have a pre-trained model, so it was thought that you don’t need to do it any longer. Whatever you need to can be done with Prompt Tuning, which is smart English writing. And so came the proliferation of AI Engineers, who are less focused on the science part but more heavily focused on ML-Ops engineering to configure, integrate, and deploy GenAI inside applications.

As it turns out, reality is a little more complex, especially for Enterprise GenAI. Take RAG applications as an example. It is easy to build RAG for a few dozen documents, but when it’s a few million documents, you need skills that know the depth of retriever and re-ranking algorithms (like KNN, Encoders, BM-25). For another use case like text2sql, you need skills to fine-tune models (viz. PEFT, Beam Search, etc.). Moreover, all GenAI applications need skills to evaluate models and come up with correct statistical metrics that are suitable for that dataset and use case (like NDCG, Fleiss-Kappa). All of this means that core skills in Stats and ML algorithms are still very much relevant, and just putting boxes together is not enough.

So, Data Scientists will make a return as Jedi to bring GenAI to life. Core stats skills will be the key factor in how quickly you can deliver highly accurate GenAI solutions.

Expect the Unexpected

The beauty of AI lies in its unpredictability (which is not a bug but a feature, given that everything is built on top of probability). Everyone including Sam Altman failed to predict the current GenAI wave. We saw this first with the highs of 2012 with vision models and the hype over self-driving cars. Now, there is another boom over LLMs.

All enterprises should expect that their own plans can be thrown out of the water with some new model that performs better than expected (like Llama3 or GPT5) or something really different that is unanticipated. A new model, method, dataset, or server can disrupt the scene. Always expect the unexpected in GenAI, both highs and lows.

Those are my top predictions for what will influence GenAI for enterprises in 2024. What do you think? In case I missed out on any major ones, please add them in the comments below.

Follow Towards Generative AI for the latest advancements in AI.

--

--

Kunal Sawarkar
Towards Generative AI

Distinguished Engg- Gen AI & Chief Data Scientist@IBM. Angel Investor. Author #RockClimbing #Harvard. “We are all just stories in the end, just make a good one"