The Buzz Around Generative AI: Introduction

Welcome to part one of a new blog post series where I set in motion a journey that will track the history and growth of Generative AI while expounding on concepts from autoencoders, generative adversarial networks (GANs), flow models, diffusion models, generative pre-trained transformers (GPTs), and more.

Pooja Ravi
7 min readJun 29, 2023
Photo by Eric Krull on Unsplash

As an active researcher and developer in the field of Artificial Intelligence, I decided to document my learning journey through blogs since they are effective in imparting knowledge and furthering curiosity for both you and me. This blog will serve as a succinct preamble to a comprehensive series that will be published in the upcoming months.

Through this series, I aim to elucidate the working and further touch upon the mathematics behind various state-of-the-art generative models. It will include models such as VAEs, GANs, normalizing flows, stable diffusion, and GPTs. Since this is the first blog, let’s get acquainted with generative AI and its potential in today’s technology-dominated atmosphere.

Generative AI — the what and the why

Generative AI (or GenAI) is a branch of Artificial Intelligence that uses generative models. It employs deep learning-based algorithms to essentially generate or produce new and hopefully, relevant data corresponding to the inputs provided.

This phenomenon was initially (sort of) kindled by Hidden Markov Models (HMMs) which predicted the next data sequence given the current state. Back in the day, HMMs were notably used in speech recognition and time series analysis. Nowadays GenAI is used for generating not just text but images, videos, code snippets, and synthetic datasets.

ChatGPT from OpenAI

With the public release of ChatGPT which uses Generative Pre-Trained Transformers (GPTs), a family of generative models from OpenAI, the company itself and its CEO Sam Altman have become frontrunners in the GenAI race. ChatGPT also dragged into the spotlight how efficaciously GenAI tools can be fully leveraged by laypersons and academicians alike. Now millions of users arrive at quick solutions to myriad problems by merely dropping their queries as prompts in the ChatGPT interface.

ChatGPT garnered over 100 million users only a mere two months after its launch. It has turned heads across the world and also attracted billions of dollars worth of investments from tech giants such as Microsoft which integrated ChatGPT into its Bing search engine. Further, Github Copilot X has adopted OpenAI’s GPT-4 model to power its AI-based code completion feature.

GIF

Further, Generative AI’s widespread usage can be seen in the following applications recently released by other major tech giants:

  1. Adobe Firefly uses the image generation capability of AI to assist in editing, extending, projecting, or even transforming images.
  2. Stable diffusion models from stability.ai have been used to generate various photo-realistic images of text prompts provided by users based on their requirements.
  3. Midjourney AI has a dedicated discord server with text prompt-based image generation capabilities if and when we use the discord app.
  4. NVIDIA’s Avatar Cloud Engine (ACE) is a tool that uses Generative AI for instilling behavioral/personality traits in virtual characters.
  5. Google’s MusicLM and Meta’s recent release, MusicGen are both text-to-music generative models that can produce musical sequences from text prompts.

Concerns regarding Generative AI

Although generative AI is becoming ubiquitously useful in accelerating productivity, some still perceive it as a threat. There has arisen speculation and criticism regarding the following aspects of utilizing GenAI in our day-to-day lives:

  1. Ethical aspects of building such software — Certain jailbreak techniques or prompts help circumvent the legal boundaries of generative models, forcing them to generate immoral or unethical answers. This imperils the usage of such models and exposes their vulnerabilities.
  2. Transparency of decision-making by the model — How generative models arrive at the end result is at times a gray area for the layman given their black-box architectures and arcane training processes. Ideally, these models’ outputs and suggestions shouldn't be taken at face value while making important decisions.
  3. The origin of training data for all the models — Where the training data is sourced from remains a concern since publicly available text or image data is not immune to human bias and may hence be skewed in its representation or less in fairness quotient.
  4. Job security for human workers — Another rising point of contention is how safe are our jobs from AI. Some believe GenAI’s exceptional capability to produce new/original content will eventually supplant human workers from their traditional work roles so that corporations can boost productivity and produce better results.
  5. How reliable (or biased) are the outputs — Significant research must be pursued to determine what kind of bias patterns are frequently noticeable in the outputs generated by the AI software. Pinpointing where the model is skewed by human bias and how/if it reinforces any stereotypes through its outputs, is imperative to building equitable systems.
  6. How open-sourced each model is — Huge corporations and tech giants should advocate for transparency in coding, training, testing, and releasing different versions of their generative models so as to healthily promote research and development (R&D) in such sectors and improve accessibility.

Bearing all this in mind, we must come to realize: whether such criticisms hold true or not and to what extent they do — is not going to change the fact that research, as well as public interest in these areas, is raging.

When generative models are democratized for public use through accessible user interfaces, they gain astronomical popularity.

Hence, if you are interested in either jumping on the bandwagon or just brushing up on the latest developments, this blog series is your best friend.

The all-in-one timeline for generative models

Let’s check out how generative models have been evolving! I have attached a supremely informative chart below which tracks the release of various generative models and the category they belong to.

GenAI Timeline

As the graph depicts, we can see the rapid advancement of Generative AI with the advent of variational autoencoders (VAEs) in 2013 which can be considered an improvement over the preexisting autoencoders but with variational inference. These concepts will be thoroughly explained in the next blog so stay tuned.

Further, Generative Adversarial Networks (GANs) were brought into the limelight by Goodfellow et al. in 2015 and with this, we saw a flurry of models being released: DCGAN, GANs for image-to-image translation, Wasserstein GAN, StyleGAN, and many more. I plan on dedicating one blog in this series to exploring GAN-related concepts and analyzing some popular GANs along with their code implementations. For an example of what GANs are capable of, here’s HyperStyle — a modification or variation of StyleGAN from Alaluf et al.—it can modify/edit a person’s features such as facial hair, hairstyle, age, etc. The image below demonstrates its functionalities.

HyperStyle: A modification over StyleGAN. Read More

In the language modeling sphere; RNNs and LSTMs ceased to remain the only options for language-oriented tasks. The “Attention is all you need” paper by Vaswani et al. released in 2017 paradigmatically shifted the focal point of NLP research from simple recurrent networks to transformers and LLMs. Such breakthroughs have paved the way for today’s powerful generative beasts. Now, companies are racing against time to involve themselves and outdo their competition in leveraging the generative capabilities of Artificial Intelligence.

Lately, we have been witnessing the power of GenAI-based research such as OpenAI’s DALL-E 2/ChatGPT, stability ai’s stable diffusion/ControlNet, and Google’s Bard powered by LaMDA. These are just a few of the many widely discussed models that show how powerful Generative AI can be when implemented astutely.

I will dip all our toes into the vast ocean of Generative AI with this blog series. I also plan on including links to my own GitHub code repositories of select generative models so that interested readers can learn the implementational details. With that said, I plan on releasing separate blogs for explaining the following types of generative models:

  1. Autoencoders
  2. GANs
  3. Transformers/Language models
  4. Flow-based generative models and
  5. Diffusion models.

Conclusion

This blog series not only incentivizes me to compactly present/document all that I know about these topics but also thoroughly learn the concepts before penning them down for my readers (you) to enjoy. So I am very excited to see this journey through patiently. In the next blog, we will dive into the working mechanism behind autoencoders, denoising autoencoders, variational autoencoders, etc. I will catch you all there!

Until then, if you have any doubts/suggestions, or would simply like to chat, feel free to reach out to me via LinkedIn.

References and resources

  1. An intuitive video on Hidden Markov Models (HMMs)
  2. A detailed blog on HMMs
  3. Learn more about ChatGPT with this video
  4. Try out ChatGPT
  5. Try out ControlNet or Stable Diffusion
  6. Research paper: HyperStyle
  7. Research paper: Attention is all you need

--

--

Pooja Ravi

An aspiring researcher looking to address worldly concerns using Computer Vision and Generative AI.