The New Generative AI Infra Stack

Published in

Cowboy Ventures

7 min readMay 22, 2023

Companies want to use generative AI, but they need new infrastructure tools to make it happen

Generative AI has taken the tech industry by storm. In Q1 2023, a whopping $1.7B was invested into gen AI startups as hundreds of millions of users adopted applications like ChatGPT and GitHub CoPilot. Tech-forward companies are scrambling to figure out their generative AI strategies and many are struggling to get applications into production. Even the most cutting-edge engineering teams are facing challenges training, deploying, and securing generative AI models in a safe, reliable, and cost-effective way.

A new infrastructure stack for generative AI is emerging. At Cowboy, we see huge opportunities for new startups in this landscape, especially for those addressing the high costs related to deploying models into production, data management, and model evaluation.

The New Infrastructure Stack for Generative AI

Category Breakdown¹

Foundation Models

Foundation models are trained on massive datasets and perform a broad set of tasks. Developers use foundation models as the basis for powerful generative AI applications, such as ChatGPT.

A key consideration when choosing a foundation model is open vs. closed source and we’ve outlined the pros and cons of each below:

Open source:

Pros: Open source models are easier to customize, provide more transparency into training data, and give users better control over costs, outputs, privacy, and security.
Cons: Open source models may require more work to prepare for deployment and can also require more fine-tuning and training. While set-up costs may be higher with open source models, at scale, companies will have more control over costs versus closed source models where usage can be hard to predict and costs can spiral out of control.

Closed source:

Pros: Closed source models typically provide managed infrastructure and compute environments (e.g. GPT-4²). They may also provide ecosystem extensions that extend model capabilities — such as OpenAI’s ChatGPT plugins. Closed source models may also offer more “out of the box” capabilities and value since they are pre-trained and often accessible via an API.
Cons: Closed sourced models are black box, so users get little insight into their training data, making it difficult to explain and tune outputs. Vendor lock-in can also make costs hard to control — for example, usage of GPT-4 is charged on both prompts and completion.

We think open source will be the more attractive option for enterprise teams building generative AI applications. As two Google researchers lay out, open source models have advantages around community-driven innovation, cost management, and trust.

Fine-Tuning

Fine-tuning is the process of adjusting the parameters of an existing model by training it on a curated dataset to build “expertise” for a specific use case.

Fine-tuning can increase performance, reduce training time and costs by allowing developers to leverage pre-trained, powerful large models.

There are a range of options for fine-tuning pre-trained models, including open source frameworks like TensorFlow and Pytorch and secure end-to-end solutions like MosiacML. We want to call out the importance of labeling tools for fine-tuning — clean, curated datasets speed up the training process and boost accuracy!

Recently, we’ve seen a lot of activity around domain-specific generative AI models. Bloomberg launched BloombergGPT, its proprietary LLM trained on data specific to the financial industry. Hippocratic, a startup that’s built its own LLM trained on healthcare data to power consumer-facing applications, raised $50M out of stealth from Andreessen Horowitz and General Catalyst. Synteny AI is building a model trained on the binding affinity between proteins to power better drug discovery. We think incumbents are well-positioned to fine-tune powerful models on their own proprietary data and build their own AI advantage.

Data Storage & Retrieval

Data storage for long-term model memory and data retrieval are complex and costly infrastructure challenges, presenting opportunities for startups to build more effective solutions. Vector databases have emerged as a strong solution for model training as well as the retrieval and recommendation systems that come after. This has made vector DBs one of the buzziest generative AI infra categories:

Vector databases can be used to power a variety of applications, including semantic search (a data searching technique), similarity search (using shared features to find similar data), and recommendation systems. They also give models long-term memory, which helps reduce hallucinations (confident responses by an AI that are not justified by training data).

We see a lot of opportunities for innovation here. There’s no guarantee the current method of semantic search & retrieval from databases will continue to be the most efficient (speed and cost) and effective (coverage); Cohere recently announced its Rerank endpoint — a search and retrieval system that eliminates the need for migration to a vector database. We’re also seeing teams use LLMs as reasoning engines attached to vector databases. We’re excited to see the data storage and retrieval category evolve and more startups emerge.

Model Supervision: Monitoring, Observability & Explainability

The three terms under supervision are frequently used interchangeably, however, they describe different steps in evaluating models during and after they’re live in production. Monitoring involves tracking performance, including identifying failures, outages, and downtime. Observability is the process of understanding why performance is good or bad, or evaluating system health. Lastly, explainability is about deciphering outputs — for example, explaining why the model came to a certain decision.

Supervision is a staple of the more traditional MLOps stack, and incumbents (e.g. Arize) have started building products for teams deploying generative AI models. However, black box, closed source models can be hard to supervise and it’s difficult to explain hallucinations without access to training data. Recent YC batches produced several companies tackling these challenges, including Helicone and Vellum, highlighting how early the space is. Notably, both have focused their messaging on tracking latency and usage — signaling that cost remains the biggest pain point for teams building in generative AI.

Model Safety, Security and Compliance

Model Safety, Security, and Compliance will be increasingly important as companies bring models to production. In order for enterprises to trust generative AI models, they need a suite of tools that provide accurate evaluations of model fairness, bias, and toxicity (generating unsafe or hateful content). We also believe that teams deploying models will need tools that help them implement their own guardrails.

Enterprise customers are also deeply concerned about threats such as extraction of sensitive data, poisoned training data, and leakage of training data (especially sensitive third-party data). Notably, Arthur AI recently announced its new product, Arthur Shield, which is the first firewall for LLMs that protects against prompt injection (using malicious inputs to manipulate outputs), data leakage, and toxic language generation, among other features.

We see big opportunities for compliance middleware. Companies will need assurances that their generative AI applications don’t compromise compliance standards (copyright, SOC-2, GDPR etc.). This will be especially important for teams building in strictly regulated industries, such as finance and healthcare. We’re excited to see innovation from startups, as well as incumbents — for instance, our Cowboy company Drata is well-positioned to integrate with or build functionality for generative AI model compliance.

Conclusions

We believe Generative AI will unlock huge efficiency gains for companies and create big, new company opportunities in infrastructure. Two of the biggest bottlenecks to adoption will be cost and security. Infrastructure startups with those core value props will be well-positioned to succeed.

We also see open source playing a major role in generative AI infrastructure. Startups that use this model will have an easier time gaining trust with users and benefit from the innovation and support that comes from open source communities. At Cowboy, we’re huge believers in open source which is why we’ve made a number of open source investments and co-host the Open Source Startup Podcast (along with our awesome co-host Tim Chen from Essence VC).

Our team at Cowboy is actively looking to make investments in the generative AI infrastructure landscape. If you’re a founder building in one of the categories mentioned above, we’d love to chat!

Find us on twitter at @matt_lu_ and @amanda_robs 👨‍💻👩‍💻

[1] It’s worth noting there is no standard process for building generative AI applications — each team’s infrastructure stack will depend on choice of model (open vs. closed source), existing AI/ML tooling, training data, and more.

[2] Teams and companies can build applications on GPT-4 via OpenAI’s API, such as copywriting company Jasper AI. However, there is a waitlist for access due to GPU constraints.

The New Generative AI Infra Stack

The New Infrastructure Stack for Generative AI

Category Breakdown¹

Conclusions

Written by Cowboy Ventures