Are we en route to the next generation: “Gen AI”?

Published in

MindBloggingTech

18 min readDec 27, 2022

The impact of technology on human lives and its contribution to shaping human progress has been one theme that has been very close to my heart — and this blog has been a means for me to think out loud my thoughts on some of the technology trends that I find exciting. I have been fascinated by some of the tech-led revolutions in the past that in my mind have altered the trajectory of human progress (for good)— the PC revolution happened in 1980s; the internet revolution happened in the 1990s, and smartphone revolution happened in the 2010s. Travel back in the past, and the printing press invention in the 1400s (often touted as the most significant invention of the millenium) stands out as a key turning point in the history of mankind. It is this fascination that keeps me in the constant intellectual pursuit of the “next big thing”.

Recent developments in the world of AI have provided me with a sense of hope that we might be finally at the cusp of something revolutionary — and leading the charge is a class of technologies clubbed under the term “Generative AI”.

So what is “Generative AI”? How would one describe it to a 5-year-old child? To answer this, I asked chatGPT (the AI chatbot which has very quickly become the flagbearer for the world of generative AI) for help — this is what it generated (verbatim):

“Generative AI is a type of computer program that can create new things, like pictures or stories, that are similar to things it has seen before. It uses special algorithms to make the new things look real and interesting, and it can help people make new art or tell new stories.”
Another way to explain generative AI to a child is to use an analogy. For example, you could say: “Generative AI is like a master chef who knows how to make lots of different dishes. The chef has a big cookbook with lots of recipes, and he uses those recipes to make new dishes that are similar to the ones in the cookbook, but also unique and tasty. Generative AI works in a similar way, using its big ‘brain’ of knowledge to create new things that are similar to what it has seen before, but also different and interesting.” — as answered by chatGPT

This is typically the quality of response that you would expect from a human expert on a topic — it sounds intelligent and is very contextual (you can also ask chatGPT to rephrase the definition for an AI researcher which will be much more technical). The fact that it is generated (without any human intervention) by a chatbot is hard not to be amazed by. And even more impressive is the traction that this chatbot has seen with consumers by amassing 1M+ users in about five days. To understand how extraordinary this milestone is, it is important to contextualize vs other consumer products considered viral sensations and other AI products:

Hitting 1M users for some popular consumer tech companies.

Twitter → 24 months
Netflix → 41 months
Facebook → 10 months
Instagram → 2.5 months
ChatGPT → 5 days (!!)

Hitting 1M sign-ups for some recent AI product releases.

GPT-3 → 24 months
Github Copilot → 6 months
Dall-E → 2.5 months
ChatGPT → 5 days (!!)

While the comparison is not like-to-like given that the environment is vastly different for the products (eg. unlike ChatGPT, Facebook did not have the distribution might of social networks at its disposal when it launched in 2004), the momentum in this space is hard to ignore.

Looking at this trend, I get this feeling that we might be sitting at an inflection point, with an explosive growth trajectory in the near future:

Source: waitbutwhy.com

Or could it just be hype? I mean, we have had several candidates for such inflection in the last few years with similar hype: AR/ VR (I myself wrote a blog post back in 2015), Crypto/ Web-3, Voice Assistants (Siri/ Alexa) but we are yet to see tangible results at scale from these big bets.

While it is quite possible that it could turn out to be vaporware, a couple of thoughts on why it is more likely to be disruptive:

Barriers to end-user adoption are much lower: Unlike tech innovations like AR/ VR, there is no new hardware requirement for the end-user and most of the applications work on existing interfaces like a web browser (eg. chatGPT can be accessed via the web browser with a simple login). This leads to much easier end-user adoption and also faster product iteration owing to quicker feedback loops.

Initial evidence of tangible real-world impact already visible: We already see the tech has quickly evolved from powering demos/ gimmicks to solving real-world use-cases with workflows/ interfaces that are 10x simpler for users owing to the AI-powered leverage (eg. jasper.ai is able to help marketing/ content teams in copywriting and has already generated $45+ revenue in last year). These solutions by insurgents are already challenging incumbents in real-world/ industry use cases with $120b+ revenue.

To start understanding the impact of this technology better, I think it is first important to understand its broad architecture to appreciate its true potential.

Generative AI Tech is modular and is composed of 5 different layers

Source: NFX

General AI models (eg. GPT-3) form the very foundation of the technology breakthrough — these “large language models” (LLMs) deal with broad categories of output (eg. GPT-3 for text, DALL-E-2 or Stable Diffusion for Images, Whisper for voice) and are able to generate text/ images/ speech by learning to predict the next word/token based on a previous word/token sequences using past data. This learning however is quite complex and computationally expensive which means that training of general models is typically confined to large companies with deep pockets (a single training run for GPT-3 cost $12 million — hence the fact that Open AI raised $1b from Microsoft in 2020 helps). These models are designed to be flexible and easy to use (via APIs etc.), allowing for participation by external developers. Some are also open-source (eg. Stable Diffusion), allowing for tweaks/ refinements by the community.
Specific AI models are trained to perform specific jobs (eg. copywriting, generating home interior images, etc.) using more specialized and narrow datasets to enable higher accuracy vs general models. This is made possible by a very interesting attribute of the general models like GPT-3 — it is a meta-learner meaning that it has “learnt to learn”. This enables “fine-tuning” of the model with much less data, allowing nimbler startups to develop more focused models.
Hyperlocal models are extensions of general/ specific models trained on personal/ proprietary data and enable personalization/ customization for the end user. For example, copywriting style can be customized for each company basis past content.
The Operating Systems and APIs layer sits between the AI models and the end-user applications, allowing for interoperability and “plug-n-play” access to AI models.
Applications are software programs that help end users solve their needs. Owing to the modular architecture and the fast-paced development happening in the layers below, application developers can leverage existing models and focus on developing user interfaces and product features using AI capability. (For eg. chatGPT essentially provides a conversational chat interface for answering questions using the intelligence of GPT-3 as a foundation)

The tech foundation was laid in 2016, but progress has accelerated in the last few months

Open AI (the creator of GPT-3, DALL-E and chatGPT), the most visible organization catalyzing the generative AI movement was founded in 2016. The core technology behind generative AI i.e. the general LLMs had its beginnings at Google Brain in 2017 where they open-sourced their research on using Transformers for natural language processing tasks. However, the technology only started appearing at the forefront in the last few months. A couple of trends have led to the recent spurt in activity in the space:

Proximity to “artificial general intelligence”: For years, we have been fascinated (and also scared) by the idea of machines reaching human-level intelligence. Owing to years of research and investments, we are finally at a point where the output generated by the AI models is able to match humans (OpenAI highlighted in their research paper that human judges could only identify 52% of articles written by GPT-3, only slightly above mere chance of 50%). And while we might still be some years away from “artificial general intelligence”, it is the closest we have gotten by far. (Check out this extremely insightful article on the development and future of Artificial Intelligence). This has been made possible with the rapid advancements in the AI models, supported by

Advancements in computing capabilities enabling more complex models — (GPT-3 employs 175 billion parameters/ coefficients to make its predictions, almost 100x higher than the older GPT-2)
Proliferation of data on the internet enabling training on large data sets (GPT-3 was initially trained on 45 TB of data), and
Inflow of capital to fund research and development both internally by big tech companies (eg. Google with its research labs, Google Brain) and externally by corporate investors/ VCs ($5b+ invested into generative AI startups in the last 6 years)

Open Source movement gathering steam: Most of the initial development of general AI models happened in closed ecosystems led by big tech companies — Google, Meta, or Microsoft (via their investment in Open AI for an exclusive license). However, open-source alternatives to the proprietary models have emerged in the last few months, which has enabled easier and cheaper development. Eg. Stability.ai allowed the use of Stable Diffusion (text to image alternative to DALL-E which launched in Aug 2022) for free and also offered an open-source license for companies to build on. This has allowed several developers/ companies to create variants (eg. Lensa AI) and has drastically reduced the cost to generate images to 1/100th vs 2 months ago.

Several real-world applications are already mushrooming with massive potential

There is already a lot of commentary and discussion around the magnitude of impact that Generative AI tech might have. Some observers have gone on to call it a potential “rocketship for the human mind” (akin to the “Bicycle for the human mind” comparison of the PC by Steve Jobs). What is interesting is that the discussion on the impact of this tech has already moved from the geeky tech circles on Twitter to discussions by the likes of top-tier VCs and popular business publications.

“They [generative AI technologies] threaten to upend the world of content creation, with substantial impacts on marketing, software, design, entertainment, and interpersonal communications. This is not the “artificial general intelligence” that humans have long dreamed of and feared, but it may look that way to casual observers.”

Source: HBR

“Generative AI can make these workers at least 10% more efficient and/or creative: they become not only faster and more efficient, but more capable than before. Therefore, Generative AI has the potential to generate trillions of dollars of economic value.”

Source: Sequoia

Now “trillions of dollars of economic value” as predicted by Sequoia might seem like hyperbole on the face of it — but look closely at the evolving use cases and it starts to make sense. What strengthens the argument for such a massive impact for generative AI tech is the fact that it promises to amplify 2 key attributes that make us humans unique among living beings:

Our ability to imagine and believe in abstract concepts like religion
Our ability to collaborate/ coordinate at scale using language

This amplification is similar to the past big technology breakthroughs: The invention of the printing press enabled widespread creation and dissipation of knowledge; the Invention of the Personal Computer made it possible to digitally consume and create content at scale

IMO, generative AI tech could hold similar transformational potential and significantly change the way we do things, under 4 big change themes, together worth at least $120b+ opportunity (via a very conservative rough estimate — could easily be much bigger as more use-cases are built upon)

Unlocking the creative potential of creators/builders by providing leverage in navigating the zero-to-one journey ($48b+ opportunity)

1. Getty images was ~$950m+ in revenue in 2022, Shutterstock ~$770m in 2021; 2. Adobe CS is $11.6b+ in revenue in 2022, Canva is $1b+ in revenue; 3. Autodesk is $4.8b+ in revenue; 4. Figma is $200m+ in revenue; Webflow is $100m+; 5. Upwork TAM is $1.3t+; 14–15% is creative services (as per Fiverr) -> $180b+ TAM for creative services; Assuming a 10% share of this to be impacted by generative AI -> $18b+; 6. Total games revenue is $190b+; ~30% is R&D spend (Zynga spends ~20% revenue, Ubisoft spends ~40% on R&D); Assuming ~10% spend on generative AI tools -> $6b+; 7. Based on analysis of NFX Generative AI open source market landscape

Creators/ builders often face the “cold start” problem i.e the struggle of generating ideas. This “creativity” has typically been an activity considered intrinsically very human-driven and often requires us to deeply think and brainstorm (often with peers/ colleagues)

With the generative AI tools, this becomes much easier for content creators to generate a wide variety of ideas, pick a few worth doubling down on and refine them using the tools available (eg. one can now use DALL-E for visualizing images based on text instructions).

As a hobbyist designer, I see great value in the leverage that such tools could provide creators and builders. For instance, I redesigned the logo of my blog by using a mix of chatGPT (for generating logo ideas) and MidJourney (for generating logo concepts for these ideas) — probably a topic worth covering in depth in another blog post, but I could easily feel a 10x leverage provided by the tech in generating and prototyping ideas.

And as a startup builder, I find some of the tools extremely useful which could have provided significant savings in terms of resources — eg. jasper.ai for writing marketing copies, RunwayML for video generation/ editing, Debuild for prototyping and launching simple apps.

The tech can transform the way creators write copies/ content, design, compose music, generate videos, or write stories. On one end, these tools can democratize content creation (by making it possible for more people to create content), on the other end the same tools could make it possible for existing creators to improve their craft (by making it possible for them to create higher quality content and experiment with more complex formats). And while there is also a risk of commoditization of content, I expect it to be net positive in terms of the overall quality of content.

This is at least a $48b opportunity assuming the current use cases, but easily be much bigger given the potential for more use cases and an expansion of the creator pool.

Improving productivity in knowledge work by acting as a “co-pilot” ($72b+ opportunity)

1. Contact center software market size is $25b+; 2. Microsoft Office Commercial is $38b+ revenue; 3. Github is $1b+ revenue; 4. Based on analysis of NFX Generative AI open source market landscape

Despite the tech boom, jobs typically considered “knowledge work” have typically not seen any significant transformation in terms of the workflow and productivity for the last few years. However generative AI tools are showing promise toward transforming the way knowledge work is done. A few of which I can personally relate to given my experience are:

Programming or coding, often touted as an important 21st-century skill is on the verge of disruption. Tools such as the GitHub CoPilot are able to assist programmers in writing code — and early results are very promising, with indications that the Github CoPilot is able to write almost 40% of their users’ code. Given that code is at the foundation of the tech boom that we are seeing around us, the ramifications of such a productivity boost are unfathomable.
Presentation/ storytelling which is a very important corporate tool for communicating ideas and getting agreement has been typically been dominated by a single software i.e. Microsoft Powerpoint which was invented almost 35 years ago and has broadly remained similar in terms of the overall workflow. We are now seeing insurgents (eg. Tome) who are challenging this on the back of generative AI tech, which can enable powerful deck creation using simple text-based prompts.

Data analysis which typically involves an analyst sifting through the data structure and writing SQL queries to extract meaningful data insights could now be done using queries in natural language using text-to-SQL tools.
Note-taking/ knowledge management, another boring yet super effective task can be done by generative AI tools such as cogram which can make it extremely easy for someone to keep notes for important meetings

What the tools do here is take over the “boring” part of the work and act almost like co-pilot, providing a good amount of leverage to the person doing the job. This “productivity” angle of generative AI tools is itself $72b+ opportunity again taking very conservative estimates.

Enabling simplification of consumer lives via more intuitive human-computer interfaces and personalization

In the last few years, we have witnessed a proliferation of multiple consumer tech devices and services that have made our lives easier. With the advent of generative AI tech capabilities, a few trends that I can imagine making consumer lives even simpler:

More intuitive human-computer interfaces —Past progress in making interfaces simpler and more intuitive (eg. touch screens on mobile phones) has yielded fabulous results in terms of making tech more accessible to people and also opening up new use-cases (eg. services like Uber made possible via smartphones). Research in the field of generative AI is already exploring the creation of more intuitive natural language interfaces (Adept.ai, Inflection AI) which might make the fiction of assistants like Jarvis (from Iron Man fame) a reality.
Curated search/ discovery — Since the beginning of the internet and search engines, information search and discovery have broadly remained similar. And while we have seen a proliferation of information and choice on the internet (eg. a quick search on Zomato present 1500+ restaurants with 1.5L+ unique dishes to order from in Gurgaon), the search and discovery process hasn’t changed much. With generative AI, it might be possible to build a conversational interface (like chatGPT) that might make this process much more streamlined and also personalized (using hyperlocal AI models)
Personalization at scale — Traditionally, there always has been a trade-off between scale and personalization — with commoditized mass-manufactured products on one end of the spectrum and hand-crafted personalized products at the other. With generative AI, the trade-off could become much less important with the barriers to creating “something new and just for you” reducing. For instance, one can start imagining 3D products using just text prompts using Point-E and 3D-print them using a cheap home-based 3D printer. Some consumer tech businesses such as StitchFix have already started experimenting with this idea of enabling personalization for consumers and more are likely to follow suit.

This change theme is still quite nascent and still evolving — while it might be hard to estimate the size, the opportunity is quite massive given the value some of the past consumer tech disruptions have captured (the latest blockbuster human-computer interface innovation, the iPhone makes $190b+ for Apple, Google Search makes $150b+ revenue for Alphabet)

Accelerating research and discovery in the fields of pharmaceuticals, healthcare, genetics

Molecule/ protein generation helps in accelerating the process of drug discovery by enabling the generation of proteins. Traditionally, the process of finding the right proteins with the potential to become drugs has been not very different from finding a needle in a haystack. Generative computer models can allow scientists to design specific proteins, that can be tested in the lab hence accelerating the process of drug discovery.
Synthetic data generation helps in easily generating realistic test data which helps in validating algorithms/ hypotheses in research fields such as genomics, computer vision for self-driving cars etc.

Again it is hard to estimate the impact of generative AI tech in research and discovery, but it could also easily be several billion dollars (Top 10 Pharma companies invest $130b+ in R&D)

Opportunity for several winners to emerge across the tech-stack and different use cases

With such a massive potential opportunity, the question arises: Who is likely to benefit the most? Is it the big tech companies with the big dollars? Or the nimbler insurgents? Given the large size of the opportunity and the modular nature of the tech stack, there are going to be several winners, especially on the application side where the breadth of use cases allows for several differentiated solutions to thrive together.

The foundational general model layer has all the conditions of being the domain of large players — the model-building process requires a significant amount of resources to train the model and access to large troves of data for the model to learn from. Once the model is successfully trained, it creates a natural barrier of entry for new players — plus the data on its usage helps it continuously become better, raising the barrier higher for a new entrant. Big tech companies with their access to capital, research talent, and proprietary are likely to have a natural advantage here.

The application layer presents an opportunity for nimbler new-age insurgents to build disruptive solutions for specific use cases. Working on finding more use cases and perfecting the user interface will require rapid iterations and also a willingness to break things — which are attributes of young startups. Large incumbents on the other hand face the “innovator’s dilemma” which put them at a disadvantage. For instance, chatGPT and other conversational interfaces might pose a threat to Google Search given the ability to get summarized answers in one shot instead of browsing through multiple pages. But despite already having the capability to build something like chatGPT (Google developed its own conversational large language model, LaMDA in 2021), it might be slow to respond owing to the larger reputational risk and cannibalization risk (because anything that stops people from scanning search results will hurt Google’s transactional business model of getting people to click on ads which helps generate ~80% of their ~$250 billion revenue)

However, a few ethical/ legal concerns need to be solved before the technology becomes mainstream

Correctness and unbiasedness of output generated by AI is something that remains a work in progress. In the process of being more human-like, the tech has also picked the human flaws —systemic biases (from the vast troves of the human-generated biased data that it has been trained on) and also errors (eg. chatGPT can make comical errors and yet claim to confidently provide wrong answers, notoriously for simple math problems). Additionally, the fact that GPT-3 and other models are unable to explain and interpret why certain inputs result in specific outputs or provide sources means that the incorrect/ biased output is also hard to validate.

Copyright on the content created by generative AI remains a matter of debate

On the input side, question is that since the data used to train the model might not necessarily be copyright free, is it legally okay? While use of the data while training the model might be considered fair use, any output generated using the model might still be considered an infringement. Generative AI companies such as Stability AI have tried to dodge this using AI data laundering (i.e. leveraging tie-ups with academic institutions for research to shield themselves from direct scrutiny) — however, it still remains a grey area.
On the output side, the question is about the copyright of the content that the AI model generates. While there is no copyright protection for the works generated solely by a machine in the US patent law, copyright might be possible if one can prove substantial human input. It is using this argument that an artist got first-of-its-kind registration for a comic book created using generative AI tech. However, proving “substantial human input” won’t always be easy and will remain debatable.

Generative AI tools pose the threat of the proliferation of fake news/ deepfakes. Since the quality of output that the tools produce is so good, it has become much easier to generate and dissipate fake news/ content that could easily fool people into believing it to be true. Additionally, the tech can also generate doctored images/ videos — For example, in March 2022, a deep fake video of Ukrainian President Volodymyr Zelensky telling his people to surrender was broadcasted on Ukrainian news that was hacked. Given the amount of fake news that we see spreading through WhatsApp already, the impact that this tech might have in accelerating it might be of deep concern and will require some reality-checking solutions to be built.

The next few years are likely to be exciting and see interesting new solutions emerge

Given the rapid pace of development in Generative AI tech and the volume of investment already flowing into startups building solutions, the next few years are likely to see interesting developments. The core tech is improving consistently and likely to see bigger upgrades in the coming months (Open AI released GPT-3.5 along with chatGPT and is rumored to launch GPT-4 next year). 150+ startups have raised $5.3b+ to build solutions leveraging generative AI tech and are likely to build and scale disruptive solutions. Existing tech companies are also likely to adopt the tech to improve existing products or diversify (eg. GitHub launched CoPilot, Microsoft announced Microsoft Designer).

This rapid development will significantly alter the way we do things (create/ build stuff, do work) and alter the behavior of a whole new generation of humans — we could maybe call it the “Gen AI” (after Gen Z, Gen Alpha). Exciting times ahead! (definitely for tech enthusiasts like me and hopefully also for non-tech enthusiast end-users)