What Do OpenAI’s Recent Product Announcements Really Mean for You?

Cynthia O'Rourke
DataRobot

--

Two weeks ago — though it feels like two years ago — OpenAI hosted its first DevDay, announcing and demoing many improvements and a few new introductions. On the improvements side, GPT-4-turbo has fresher training data than its predecessors, its context windows are substantially larger, developers will have increased control over model responses, consumption prices are substantially lowering, and rate limits are increasing for “established customers.” Additionally, OpenAI joins Google and Microsoft in offering some level of legal protection against copyright lawsuits. On the newer side of OpenAI capabilities, ChatGPT will be getting in-platform retrieval-augmented generation (RAG) capabilities, OpenAI is offering consulting services to support fine-tuning of its models, and GPT-4-turbo has gained vision and text-to-speech modalities. On the very new side of things, non-coders and coders are both receiving simplified new toolkits to build “GPTs,” agent-like GPT-powered apps equipped with plugin tools and developer-defined workflows.

On DevDay, OpenAI was in an enviable but tough place. ChatGPT took off in November 2023, rapidly capturing the attention of the world — and, crucially, not just the tech world — but also rapidly accruing “eye-watering” costs. Those costs had been offset by their powerful partner and (rumored) 49% stakeholder, Microsoft, but that partnership means that OpenAI’s enterprise offerings will be running up against its own APIs — and now potentially also its own IP, ex-CEO, and researchers — from within one of the world’s most established and dominant enterprise SaaS platforms. Even leaving aside the political inadvisability of OpenAI directly competing with its largest funder and single stakeholder, Microsoft has decades of experience in delivering at the enterprise level, and an entire integrated services suite crafted to this end. As a newcomer — again, running up against its own APIs — it’s hard to see where OpenAI would stand a chance.

OpenAI’s consumer story is something else entirely. The power of ChatGPT’s consumer appeal has been enough to rejuvenate Microsoft’s own tech innovation story over the last twelve months. ChatGPT didn’t just catch the eye of developers and Silicon Valley venture capitalists, but of users from every walk of life worldwide. Almost single-handedly (with some early help from DALL-E and other generative AI models), the release of ChatGPT spiked the search term “AI” to previously unseen heights across most of the Google Trends globe. “AI” has stayed at those heights ever since.

But hype is difficult to monetize, and GPUs and machine learning engineers are expensive. OpenAI can’t run on donations indefinitely, and even less so now that its consumption rates have skyrocketed along with its search term popularity. To fund its expenses and support the research underlying its mission statement, it needs revenue, and all the more so now that an $86B employee stock sale has been endangered by the events of last week. Its enterprise offerings are reportedly costly, at around $60 per seat per month (varying from customer to customer), with a minimum seat count of 150 for a 12-month subscription. In contrast, Microsoft Copilot will run $30/seat at the enterprise level, though with a minimum seat count at (reportedly) 300. Identical pricing is rumored for Google Duet, which is similarly available as an add-on at the “Business Standard” or larger enterprise packages. Even if OpenAI drastically reduces its enterprise pricing, it will still be competing with much larger and more established vendors that are delivering fully-baked LLM applications directly into users’ existing workspaces, and — new from Microsoft as of last week — are even enabling custom-made applications.

However, OpenAI does not necessarily need to wrap its models around an existing workspace, and it additionally offers a $20/month individual paid seat. With no individual tier for Microsoft 365 Copilot, and no obvious individually-tailored tier for Google Duet, OpenAI has this demographic to itself. Moreover, Microsoft has not shown recent interest in the lone consumer, which means that OpenAI may be able to build out here without later running into conflict with its biggest partner. At DevDay, ex-CEO Sam Altman announced that OpenAI currently has 100M weekly active users. Outside estimates put the cost of hosting ChatGPT at $100,000 to $700,000 per day. If OpenAI can get just 1% of those 100M weekly users over the line and into those $20/month seats, that’s already near, above, or well-above covering ChatGPT’s server costs. It’s not necessarily profitability, but it’s a substantial step in the right direction.

So how does OpenAI get more individual users into paid seats?

That brings me back to GPTs. By this point we’re all familiar with LLM assistants. ChatGPT and Bard, as they are currently delivered, can function as generalist assistants. Github Copilot, Amazon CodeWhisperer, and the Google Codey trio are specialist assistants, tailored for code support. An assistant works closely with a human in the loop to help that human with a single relatively simple task at a time. For example, an assistant can help its human user to improve a marketing email, or to debug a chunk of code.

At DevDay, Sam Altman described GPTs as a step towards “agents.” Agents are, relative to assistants, a step up in complexity and work abstraction. Agents coordinate multiple functions to complete more complex tasks, or sequential tasks. For example, while an assistant might help me to tweak the wording of an email, an agent might target and compose a personalized email by (1) visiting our Salesforce data and a predictive AI propensity model to determine who would be a good person to send an email to on this day, and then (2) personalizing a draft email that maps the recipient’s needs and their existing tech stack (as recorded in Salesforce) to our newest features (as recorded in our release notes), and finally (3) scheduling an optimal send time given the recipient’s time zone and what a second predictive AI model says about this recipient’s probability of opening that email depending on time of reception. The LLM coordinator of an agent can call on apps, predictive models, assorted knowledge bases (i.e. curated reference information), and even other, more specialized LLMs. They can therefore perform more complex tasks. If an LLM assistant is the Batmobile, then an LLM agent is Robin.

With GPTs gated behind the $20 tier — new subscriptions to which were paused just days after DevDay — it looks like this subscription incentivization from OpenAI could be paying off fast.

However, agents are not a new concept, either in open source software or in enterprise SaaS. AWS supports agent build flows in SageMaker JumpStart, though AWS, unlike Google or Microsoft, does not have a dedicated workspace suite to bake those agents into. Microsoft supports the open-source agent-building project AutoGen, and has introduced fully-baked and now also low-code DIY Copilots (with plugins) for Microsoft 365. While Google Cloud has so far not enabled agents per se, it does enable integration of different services (for example Vertex AI Conversation and Dialogflow, or Duet AI in Workspace in collaboration with Salesforce’s Einstein Copilot) to create sort of an LLM “team” that can serve the same complex task functions as an LLM agent. OpenAI itself earlier moved some distance towards agents by giving ChatGPT “tools” (plugins), but it has now built plugins into the GPT-building process in a more abstracted, dev-friendly way.

The pre-existence of agent and agent-like tooling from established vendors is important for enterprises that wish to develop and deploy LLM agents. An individual can happily build and share the new OpenAI agent-like apps, GPTs, with other individuals. Individuals can build GPTs to split vacation costs among friends, or to scale their mentoring work, or to coordinate a meeting between two crowded calendars — all without using code. Code-first developers can enjoy an easier workflow with the new OpenAI Assistants API, so long as they’re working on use cases that have a relaxed set of success criteria. A developer working on mission-critical agent use cases, however — the ones that require enterprise-level security for both data submitted through prompts and data returned through responses, real-time centralized monitoring for accuracy, toxicity, attacks, cost, and custom metrics, strict SLAs, innovative confidence support from predictive AI models, and robust governance, auditing, and improvement processes — those developers will be using workflows supplied by experienced enterprise vendors.

For a case study on why enterprises will usually want to work with enterprise-experienced vendors, pull up the U.S. Google Trends history for “AI” as a search term over the last 12 months. The release of ChatGPT spiked that search term in November 2023, and it has remained popular, but never more popular than during a second spike in April 2023, when versions of “how to delete/unpin/disable snapchat ai,” “is snapchat ai safe,” and “why did snapchat make an ai” comprised 64% of “AI”-related queries. Some of the other query clusters were positive — “how to get my ai on snapchat,” for example — but no company wants the top Google query about its newly-released feature to be “how to unpin [this feature].” When working with very new technology in high-risk, high-reward use cases, it is of crucial importance to work with trusted, experienced technology partners.

Where does DataRobot fit in?

An LLM agent requires three basic components: (component 1) the LLM or manager LLM, which must be sophisticated enough to navigate multiple stepwise tasks, along with (component 2) reasoning-like decision support and (component 3) access to additional software APIs or “plugins” as tools for each discrete task. In the case of Microsoft’s AutoGen, the tools are at a high-level specialized LLMs, but also include lower-level tools that are implemented by those specialized LLMs. Even with just these three components, the agent tech stack is already shaping up to be substantially more complex than the assistant tech stack, which at a bare minimum requires only an LLM.

Those three basic components are sufficient to get an individual up and running with an agent to suggest vacation activities, or recipes and shopping lists based on what’s already in the user’s fridge, or to spruce up the weather forecast.

For enterprise deployments, however, there’s more on the line, and users will need more than the basic components. They’ll need tools to ensure that those vacation activities actually exist, and haven’t been hallucinated. They’ll need guardrails and context to prevent the agent from, for example, suggesting a tourist trip to a food bank. They’ll need a broad, curated, and maintained knowledge base that keeps an updated list of shoppable ingredients while weeding out missteps like including bleach in a recipe. They’ll need cost monitoring and controls to make sure that an organization’s cloud bills aren’t blown up by employees requesting poems about the weather, and value tracking to identify, track, and expand their high-ROI LLM use cases over time. They’ll need tight, mature informational security to make sure that their generative AI apps don’t expose source information, as has already been observed in early GPTs (examples 1, 2), and also to make sure that prompts themselves aren’t recorded and mis-purposed. All of these bad outcomes that I’ve just referenced are missteps that have already occurred at the enterprise level. Similar missteps may be tolerated when the agent is a DIY tool for individual use, but such missteps are not tolerable when they make international news and impact a company’s reputation.

Even a relatively simple LLM agent will need a healthy, reliable, well-behaved model at its heart. More complex multi-LLM agents, or agents that rely on predictive models as part of their toolkit, will benefit from comprehensive, centralized monitoring of both their generative AI and predictive AI components. Careful enterprises may want to make choices that balance trade-offs between performance and perceived legal or reputational exposure. CFOs will push for selection of components that mitigate costs during development and production. Even if just considering app “performance” in a vacuum free from legal, reputational, and financial realities, there will be trade-offs between accuracy, perplexity, latency/speed, tone, and more from a rapidly growing list of generative AI performance metrics. As new models and new model versions are released, enterprise options for these trade-offs will constantly evolve. There are many considerations and decisions to be made when developing a generative AI app, and in this evolving landscape, few of those decisions will be final — even when the app is moved to production.

The DataRobot advantage

Generative AI workflows are complicated and can rapidly become chaotic. DataRobot’s mature and innovative abstractions of complex AI workflows empower enterprises to rapidly create and improve AI solutions and to comfortably manage their full generative and predictive AI stacks. In addition to our well-known AI development and predictive AI production capabilities, DataRobot has best-in-class LLMOps tooling. We provide out-of-the-box and customizable real-time monitoring and maintenance support, in a centralized Registry and Console that eliminates model ops disarray and, along with our platform’s development capabilities, prevents the creation of further tech debt. We create deep audit trails and preserve developer optionality, so that devs can try out a new tool or even a new management LLM for their organization’s most popular agent without disrupting downstream users.

That optionality is crucial in a landscape that’s developing as rapidly and (on some weekends) as noisily as the generative AI landscape. Even as the dust settles and generative AI becomes increasingly mature and commoditized, model improvements and other technological developments will reward agile organizations that remain ready to rapidly iterate on their existing generative AI apps.

Let’s say, for example, that you’re a developer of an LLM assistant that is already popular in your company, maybe something like a chatbot that helps users to fill out requests for proposal (RFPs). Those OpenAI pricing developments might mean that you’re suddenly paying more for your RFPBot’s API than you would if you switched to GPT-4-turbo. However, your bot is already popular with hundreds of end-users, who have incorporated it into their daily workflows, and you need to be able to experiment with changes to your bot without disrupting those end users. Maybe the new developments aren’t around cost per token, but rather are around information security, performance, or non-technical considerations.

Regardless of why changes need to be made to an app in production, DataRobot’s experimentation tooling and automated metrics caching enables devs to dig deep into the effects of potential changes and to easily hotswap LLMs and other agent components, so devs can iterate on and improve their generative AI applications without disrupting end users. With DataRobot, AI devs can incorporate the best of the moment components without sacrificing future optionality.

Of equal importance, we fully integrate end-to-end with your enterprise stack, from AWS, Azure, Google Cloud, Snowflake, and Databricks to Slack, Microsoft Teams, Tableau, Streamlit, and Qlik. Our cost monitoring and control tools in both development and production enable organizations to stay within budget and to understand where their generative AI spend is going. Our custom metrics enable businesses to assess, by their own KPIs, what the hard ROI on that spend is.

Last but far from least, our global teams are made up of hundreds of dedicated, reliable AI professionals, with thousands of cumulative years of experience across all industries. Just as OpenAI is about to gain an incredible research opportunity in the consumer product space with the introduction of its GPT Marketplace, DataRobot has been able to learn what works and what doesn’t alongside over a thousand enterprise customers. We’ve been working across diverse industries and departments to build, measure, and maintain generative AI solutions, and — as has long been our habit — we’ve been baking those best practices into our platform.

Learn more here about key considerations for generative AI in production, and how DataRobot’s capabilities can help mitigate these risks. Check out more around the DataRobot AI Platform and our LLMops capabilities.

--

--

Cynthia O'Rourke
DataRobot

Customer-focused, product-minded data scientist at DataRobot.