Deep Dive: Generative AI

Cavalry Ventures
Cavalry Chronicle
11 min readMay 24, 2023

--

Saddle up for our take on this new technological frontier 🤠

By Laura Zecca & Nicola Andreottola

Chat GPT 4 can pass the American state bar exam, but before you go expecting to see robot lawyers taking over the courtroom, hold your horses, cowboys and cowgirls — we’re not quite there yet. That being said, AI is becoming increasingly more human-like, and VCs have already started thinking about how this new wave of technology is going to affect the way we build and run businesses. What do we need to do differently? How can we make sure that our investment strategies are reflecting these changes? It’s a brave new world out there, and we’ve all got to keep the big picture in mind!

👾 Differences between AI, ML, Neural networks & Deep learning?

First, let’s get some basics down. If AI refers to intelligent machines, machine learning (ML) is a subset of AI that focuses on developing methods to teach machines how to be intelligent.

Within the field of ML we then talk about Neural Networks (NN), where in order to let the machine learn from data, the models avail of a network of neurons to process information and make predictions. Neural Networks aim to replicate the way that the human brain thinks, and are capable of creating more scalable models that require less supervision. Then comes deep learning, which is basically what we refer to when we use multiple neural network layers, usually more than three.

Deep learning models are capable of tackling both labelled data sets (e.g. an Excel spreadsheet with x and y), as well as unstructured raw data (e.g. text, images).

AI, ML, NN, Deep Learning

Generative AI uses deep learning techniques to be able to exceed the previous capabilities of Machine learning models.

⏳ A timeline of Generative AI — From the basics of deep learning to now

Generative AI: The Timeline

1950s

  • Hidden Markov Models (HMMs) and Gaussian Mixture Models (GMMs) were introduced as the first generative models to generate sequential data such as speech and time series.
  • N-gram language modelling was first introduced in the field of natural language processing in the 1950s, but these models were not effective with long sentences.

1960s

  • 1965: Alexey Ivakhnenko trains the architecture of the first known deep network using layer-by-layer square fitting in 1965, demonstrating the potential of deep neural networks to learn complex patterns and create diverse outputs.

1970s

  • Autoencoder neural networks are developed, which can learn to compress and reconstruct data. They have since been used in various generative models, such as Variational Autoencoders (VAEs).

1980s

  • NLP models evolved as Recurrent neural networks (RNNs) were first introduced in the mid-1980s. The Hopfield Network developed by J.J. Hopfield can be considered one of the first networks with recurrent connections. The recursive nature of RNNs could now allow for neural networks to handle inputs & outputs of varied sizes but created the issue of vanishing gradients which was then solved by the introduction of LSTM.

1990s

  • 1995: One of the first texture synthesis algorithms was introduced by David J. Heeger and James R. Bergen, which alongside texture mapping was one of the traditional image generation algorithms used in Computer Vision.
  • LSTM for recurrent neural networks was first proposed by Sepp Hochreiter and Juergen Schmidhuber in 1997. This sort of layer-by-layer pre-training allowed deep learning models to produce better results.

2000s

  • In 2006, Deep Belief Networks (DBNs) were developed. This architecture started the deep learning renaissance and it’s a type of unsupervised learning algorithm that can be used for generative tasks, such as image and text generation.

2010s

  • In 2013, the Variational Autoencoder (VAE) was introduced by Diederik Kingma and Max Welling, which combines neural networks and probabilistic models to generate high-quality and diverse outputs, such as images and text.
  • In 2014 we see for the first time the Gated Recurrent Unit (GRU), a new gated recurrent unit that is proposed by Cho et al. as a simpler alternative to LSTM. The development of LSTM & GRU made it possible for better processing of sequential data like text, speech and time series. Now researchers could use RNNs to improve the NLP models.
  • In 2014, Generative Adversarial Networks (GANs) were developed by Ian Goodfellow and his colleagues, which opened up new possibilities for generating high-quality and diverse outputs, such as images, music, and text. GANs are a kind of machine learning where two neural networks engage in a game where the generator seeks to produce increasingly accurate outputs in order to fool the discriminator.
  • In 2017, the introduction of transformers marked a significant breakthrough for generative models. Initially utilised in NLP tasks, the emergence of transformer-based models paved the way for large-scale training and revolutionised AI generation.
  • 2018–2019: several popular NLP language models start adopting transformers as the basis of their architecture, such as ElMo (Allen Institute for AI), BERT (Google), and GPT-2 (OpenAI).

2020s

  • 2021: Transformer architecture is applied to CV with the introduction of Vision transformers by Dosovitskiy. These models excel in terms of developing unsupervised and semisupervised vision models starting from huge datasets thanks to their applicability and scalability (e.g. image classification, image captioning, etc.).
  • 2021: As we discover how a transformer can be used in generative AI the first multimodal models such as CLIP start appearing, where vision and language are combined allowing it to be trained on a massive amount of text and image data.
  • 2023: Chat GPT-4 is introduced which is a large multimodal model which greatly surpasses Chat GPT-3’s performance, and continues to show the current steep improvement that the models can face in a short period of time.

🧑‍🎓Generative AI Gets a Brain: The Tech That Made AI Smarter

The past 10 years have brought us several new technologies that have made the current AI frenzy possible. The most relevant are GANs and Transformers.

In 2014 GANs were introduced: this new architecture enabled more realistic image generation via a deceptive game between a generator and a discriminator network. The generator transforms a random input vector into a synthetic output; for instance, the generator generates the image of a duck, while the discriminator has to determine whether it’s a real duck from a database of real images or a fake one. Once the discriminator can’t distinguish real from synthetic images accurately, we can say that the model produces highly accurate outputs.

GANs

If GANs have had a major impact on image generation, transformers have had the same with NLP techniques. They are a type of Neural Network architecture capable of transforming a sequence (such as a sentence) into another sequence (Seq2Seq). Unlike traditional Seq2Seq models, transformers leverage the attention mechanism to focus on key terms in a sentence, resulting in more accurate translations even for complex sequences that have several dependencies or connections. Thus, transformers are like virtual translators that can pick up on the nuances of inputs and translate them more accurately, improving how we handle language.

🧩 Cavalry Market Mapping

Since the first appearance of generative AI in our lives, many experts and VCs have been trying to provide a smart way to map the generative AI landscape. While this is definitely a helpful exercise to get an understanding of the wide scope of this technology, many have failed, according to us, to achieve its main purpose: understanding something out of this ordeal! There are, indeed, several dimensions and perspectives from which we can look at the generative AI landscape; we identified at least five of them: type of generative AI applications (text, image etc.), impacted business functions (customer support, marketing, sales etc.), industries impacted (gaming, healthcare etc.), tech stack (foundation models, application layer) and business models (open source, E2E etc.). We can say, for instance, that customer support teams can leverage general writing text applications to build up their email services or think about how the pharma industry can leverage generative AI for new drug development. These different perspectives have led, from our understanding, to overlapping market mappings that combine, within the same representation, elements of different clusters, generating even more chaos. That’s why, at Calvary, we decided to try our way and developed the mapping described below.

We assumed that the two most urgent pieces of information needed to build the generative AI puzzle are definitely the answers to these two questions:

  • What can generative AI do? — Type of applications
  • How does it do it? — Tech stack

Well, when we think about the potentialities of AI we are basically thinking about the applications of generative AI in our daily life; however, they only represent the surface of the overall technology required to produce such applications. This underlying technology we refer to is composed of large pre-trained models from which the applications source their “intelligence”. As described in a Sequoia post we can think of Generative AI apps as a UI layer and “little brain” that sits on top of the “big brain” that is the large general-purpose models.

Platform and Application layer

This representation not only allows us to understand the link between what we call the platform layer and the application layer (with all its use cases) but also, with a little twist, to have an immediate understanding of all the business models that this segmentation brings with itself.

Thus, in the application layer, we can easily see two main business model approaches, the first one being a vertical approach focused on single applications of generative AI and the other one being a multimodal strategy trying to cover different domains.

Business Models

In the platform layer, instead, two main scenarios open up: using a closed-source model or an open-source one. While in the first case, the business model would be pretty straightforward with the closed source model developers enabling access to their models to third parties through the sale of APIs, in the second case a more articulated value chain is required, with the open source models being hosted and shared by model trading providers.

Finally, as in almost every business, an end-2-end strategy is always an option, requiring lots of vertical integration and quite holistic know-how.

By clicking here you can access an overview of the 60+ startups that are currently using generative AI that we presented in our market mapping.

We are a VC, a pretty cool one, but still a VC. And, as VCs usually do, we also love trying to make predictions about the future. In this case, trying to meddle in a market sizing exercise may appear as a waste of time. We can all agree on the fact that there is no reason not to be bullish on the future evolution of this space, there is no need to doubt that the market will be large enough to make all VCs consider ways to enter it. What we think is definitely more interesting to ask ourselves, wearing the VC sunglasses 😎, when thinking about the size of this space is: where in this market will value accrue?

A good criterion to assess is thinking about how emerging companies in both the platform and application layers will be able to build their mid-long-term defensibility. Commoditization is indeed the risk that companies in the platform layer may soon have to face: open-source models are already being trained on similar amounts of data than closed-source ones and are leveraging on similar transformers algorithms, hence drastically reducing margins for differentiation. On the other hand, the application layer already appears to be a red ocean with many players building fancy UX on top of common pre-trained models (with low to zero fine-tuning), lacking any defensibility on the tech side.

These risks on the product end have led us to develop the two following matrices, with the purpose of providing a useful tool to assess new investment opportunities while keeping an eye on the main levers for differentiation across the tech stack:

Platform layer matrix
Application layer matrix

We believe that companies commercialising foundational models will have to find the right balance of industry and use case specialisation: industry focused LLMs will be able to generate better-performing models, catering to specific industry use cases, and benefitting of more targeted GTMs and stronger defensibility coming from their unique access to industry data; on the other hand, multi-industry and eventually multi-applications solutions will face lower defensibility, eventually balanced by wider market size.

On the application layer, our belief is that companies that achieve success will be those that will manage to build actual generative AI-enabled products (and not just plug-in features) by creating a virtuous flywheel cycle, where the increased usage of their platform by users generates more data, which in turn are used to refine their models and deliver improved and personalised outcomes to customers.

🔮 Looking into our crystal ball

At present, generative AI tools still heavily rely on human contribution. To reach a satisfactory output you’ll need to be an excellent prompter, and usually, it will still take you multiple attempts and further fine-tuning post-generation. However, these gaps are already being taken into consideration by the industry, and platforms like Leonardo.ai are adding solutions specifically aimed at making prompts more effective and accurate. But that’s just the beginning. We’re also seeing the rise of autonomous agents that let AI make decisions on its own. With a main objective given, these agents can figure out the rest, as demonstrated by pixelated villagers in an experiment conducted by Google and Standford, who organised a Valentine’s Day party on their own after being given a prompt.

It’s clear that we’re at a pivotal moment in time when human-like AI feels closer than ever. A survey of 356 AI experts conducted in 2022 found that 90% of them believe human-like AI will be achieved within the next 100 years. Half of the experts even predict that this milestone will be reached before 2061, highlighting the rapid pace of AI development.

So, as we continue to ride the wave of technological advancements, it’s time to embrace the opportunities that generative AI tools can provide.

Please find our full research on the current state of Generative AI here:

--

--

Cavalry Ventures
Cavalry Chronicle

We make early-stage investments in teams who question the status quo and think big. Here, we share our insights on all things tech.