Written by Apoorva Joshi, Senior AI Developer Advocate @MongoDB
Until a year ago, I would cringe whenever someone used “AI” as an all-encompassing term for anything related to machine learning. In my head, it was a term that only made sense in sci-fi movies based in 2050 and beyond, where machines are shown to gain human-like intelligence. However, the advent of foundational models and the dizzying pace of development this past year has proven to be a reality check.
My social media is regularly flooded with announcements of new state-of-the-art (SOTA) models, novel agentic applications, etc., but it also makes me wonder how much of this is currently applicable to problems that enterprises are trying to solve.
In this article, we will cover the following:
- Requirements for production systems
- Current challenges with using LLMs in production
- What’s possible with AI today
- Production case studies from MongoDB customers
But first, let’s start with what we know….
Requirements for production systems haven’t changed
If being an applied data scientist has taught me anything, it is that most bleeding-edge research doesn’t make it into production, at least not where time and speed are important. Some of the most common constraints I have seen for AI models in production environments are as follows:
- Speed: This is important in systems where time is of the essence — for example, malware detection and personalized product recommendations, where the failure to respond in real-time can result in system compromise and missed sales opportunities.
- Correctness: The model should produce correct results for the task at hand. Whether it is optimizing for false positives and/or negatives, summarizing text, or generating code, the model should meet a pre-defined set of evaluation criteria. Failure to respond correctly can have serious consequences depending on the use case.
- Cost: Reducing operational costs is the overarching goal of any business. Given a list of options, the one that offers the best balance between performance (in terms of speed and correctness) and cost-effectiveness wins.
Current challenges with using LLMs in production
While a lot of research in the field right now is focused on making LLMs more reliable, performant, and cost-effective, these will continue to be challenges, at least for now.
The size and complexity of LLMs make them intrinsically slow at inference. Models like GPT-4 and Gemini Pro that are accessible only via APIs will also add additional network overhead. While latency and hallucinations are tolerable in chat and offline applications, they are a no-go for mission-critical applications in the healthcare, finance, and security industries. It is important to consider how many LLM calls are being made in your application and what tasks can be offloaded to a non-generative system component.
The probabilistic nature of LLMs makes them inherently non-deterministic, sometimes producing incorrect information, a.k.a. hallucinations. These edge cases can make for a poor and inconsistent user experience for user-facing applications such as customer service chatbots. This stochastic nature also makes these models difficult to evaluate and their outputs difficult to use in downstream processes that expect a specific schema.
From a cost perspective, dealing with LLMs can also be expensive. OpenAI, for example, charges for input and output tokens, meaning longer prompts and LLM responses can inflate costs. Fine-tuning smaller (still a few billion parameters!) open source models could be a cost-effective option in the long run, but the infrastructure costs of serving these models during peak traffic and real-time scenarios are non-trivial.
Data privacy and security are added concerns for enterprises when using generally available models such as GPT-4, given the risk of proprietary and/or sensitive data being shared with entities outside the organization.
What’s possible with AI today
Despite the above challenges, there is no denying how powerful these latest AI models are. It is no surprise that large enterprises have started using it for various internal and external use cases and the number of startups that have built their entire business around these models. The underlying theme across the board is the same though — time savings for individuals and cost savings for enterprises.
At large enterprises, from what I’ve seen, internal use cases mostly revolve around code and query generation, support and workplace documentation chatbots, copywriting, etc. External use cases involve customer service chatbots, in-product features for summarization, knowledge base Q&A, and data analysis. While most use cases in 2023 were powered by prompt engineering on hosted models, in 2024, we are seeing embeddings, vector databases, and retrieval-augmented generation (RAG) taking center stage. Simply put, RAG aims to improve the quality of pre-trained LLM generation using data retrieved from a knowledge base. RAG is a widely adopted alternative because it requires almost zero upfront investment, unlike fine-tuning, and gives organizations control over what data these models have access to, making it secure by default.
“Applied” AI startups (i.e., companies using AI to solve a problem), on the other hand, are keying in heavily on assistants that reduce human time spent on mundane tasks by integrating into everyday applications that people use, such as Google Docs, Sheets, and Salesforce. For example, Adept’s new class of models, called Action Transformers, is trained to take actions on a computer based on natural language instructions. This way, you can have the model order your dinner, schedule an email, learn new workflows, etc. Jasper boasts an end-to-end AI co-pilot for enterprise marketing, which includes creating marketing campaigns and search engine optimization (SEO) content.
Overall, given the state of AI today, it seems most logical to use it as a mechanism for automation and optimization in scenarios where there are no dire consequences from occasional bad/incorrect outcomes. A maximalist approach to AI where we expect LLMs to do everything is wasteful and will likely not result in the desired outcomes. Software design concepts and patterns are evergreen, and success lies in treating LLMs as a module in a larger system.
Production case studies from our customers
In addition to some of the startup use cases above, here are some interesting use cases we are seeing from our customers at MongoDB:
- Potion has a unique value proposition — video prospecting. Personalized videos, turns out, are more effective than cold emails when engaging with prospective customers. That being the case, a tool that requires a salesperson to record a video only once, takes care of personalizing it to multiple prospects and adds customized branding sounds like a huge time-saver. Read more about how Potion uses MongoDB.
- Ada offers a meta customer service as a service product. While customer service chatbots aren’t new, what stands out with Ada is the reasoning engine that powers the customer service AI agent. The engine uses foundational models that have been fine-tuned on customer conversations and feedback, along with knowledge sources consisting of company-specific policies and guidelines, to provide accurate solutions to customer inquiries. Read more about how Ada leverages MongoDB Atlas.
- VISO TRUST is a great example of using AI for time and resource optimization. Traditionally, cyber risk analysts would spend the majority of their time gathering all the required data for investigation. When it comes to conducting risk assessments, however, there are usually clear policies and procedures in place to determine the security posture of third-party vendors. This makes it the perfect fit for automation using AI. VISO TRUST’s Artifact Intelligence offering does exactly this — no more analyst time spent analyzing questionnaires and reading posture documents. Read more about how they’re building AI with MongoDB.
Conclusion
In this article, we took a closer look at the state of AI in production in 2024. The requirements for AI models in production haven’t changed — speed, model correctness, and cost remain priorities, and the same remain challenges with AI.
Challenges aside, though, it’s an exciting time to be in AI. If you are looking to build the next cool thing in AI, check out our AI Innovators program to get expert technical advice, free MongoDB Atlas credits, and go-to-market support as you build your venture. We would also love to hear from you about what you are building, so come join us in our Generative AI community forums!
Finally, if you found value in this article and would like to hear more from me: