GANs: A Creative Adversarial Relationship

Digitate
9 min readFeb 21, 2024

--

By Nishtha Arora

Photo by Daniel Eliashevsky on pexels

Consider an enthusiastic employee, let’s call him the “generator,” who works hard to produce groundbreaking ideas and solutions. Meanwhile, the discerning manager, acting as the “discriminator,” assesses and evaluates each proposal with a keen eye for quality and innovation. This symbiotic relationship echoes the essence of Generative Adversarial Networks (GANs). A GAN has 2 components — a generator and a discriminator. The generator tirelessly crafts new data, akin to our employee generating ideas, while the discriminator rigorously critiques and refines, mirroring the role of a managerial figure.

This generator-discriminator duo led to some very creative use cases. Let’s take a look at some:

GANs are used to,

  • Generate synthetic photographs that are practically indistinguishable from real photographs.
  • Generate faces of anime characters. pokeGAN was an interesting project to generate Pokemon characters!
  • Perform image translation. Some cool examples include translating a photo from day to night, translating a satellite photo to Google Maps, and translating a black-and-white photo to color.
  • Perform text-to-image translation. For example, you type a description — “create an image of a bird with a Black beak, Black body, and feathers fading from Black to Gold from head to tail.”
  • Generate new face angles and new poses of human models.
  • Generate photographs of models wearing clothing present in a catalog or online store.
  • Generate 3-D views of objects from their 2-D views.
  • And this list goes on.

Join us, as we unravel the ‘HOW’ behind these fascinating applications, showcasing the true transformative potential of GANs in the world of artificial intelligence.

Journey from Classification to Creation

In the past, AI mainly focused on tasks such as categorizing images and distinguishing between cats and dogs. However, the landscape of AI has evolved, allowing AI not just to identify but also to create complex images. This transformation results from combining structured learning, as seen in supervised learning classification, with a more exploratory nature of unsupervised learning techniques.

Supervised Learning — In this approach, an AI solution is trained with labeled data. The AI learns to recognize patterns and make predictions by iterating through a systematic process of feedback and correction. AI, under the umbrella of supervised learning, refines its understanding, becoming adept at identifying and classifying objects within the given parameters. Supervised learning is similar to an employee who is learning from a handbook and seeking manager’s guidance from time to time.

Unsupervised Learning — In this approach, an AI solution navigates through data without predefined labels and categories, autonomously extracting patterns and relationships. Operating without labeled examples, this approach self-discovers inherent structures within the data, comprehending subtle relationships and nuances to understand probability distributions. Unsupervised learning is similar to a self-directed employee who operates without explicit managerial guidance.

In the broader scope of AI, the evolution from classification to creation arises from merging supervised and unsupervised learning. Supervised learning provides the structured foundation for recognizing and classifying, while unsupervised learning allows for autonomous exploration and generation of new content. The synergy of these two learning methods empowers Generative AI to not only understand but also to create, unlocking a realm of creative possibilities in artificial intelligence.

Unraveling the Essence of GANs

What are GANs?
Generative Adversarial Networks (GANs), introduced by Ian Goodfellow and colleagues in 2014, stand as a groundbreaking class of AI algorithms. GANs consist of two neural networks engaged in a distinctive adversarial relationship, each serving a unique purpose. The first component, known as the generator, is tasked with creating realistic data, often images. On the other side, the discriminator plays a crucial role in distinguishing between real and generated data. This dynamic interplay between the generator and the discriminator fuels the unique and powerful capabilities of GANs.

The GAN Architecture
Let’s delve deeper into the technical intricacies of GAN architecture. GANs consist of two neural networks: the Generator and the Discriminator. The generator is in a constant process of deceiving the discriminator by creating synthetic data that is as close to real as possible. Discriminator, on the other hand, is in a constant process of beating the Generator by distinguishing the synthetic from the real.

GAN Architecture

The Generator:

Technical Objective: The generator’s primary role is to create synthetic data that is indistinguishable from real data.

Structure:

  • Input: The generator starts with a random noise vector, often sampled from a standard normal distribution.
  • Neural Network: Comprises layers of neurons organized in a way that transforms the input noise into a complex output resembling real data.
  • Output: Generates synthetic data, such as images, which ideally should be indistinguishable from real data.

Technical Training Process:

  • Adaptive Parameters: The generator undergoes training to adapt its parameters, leveraging backpropagation and optimization algorithms to refine its ability to generate realistic data.
  • Error Correction: Through back-and-forth iterations, the generator adjusts its parameters based on the error signals from the discriminator, refining its approach to create more convincing outputs.

The Discriminator:

Technical Objective: The discriminator’s main role is to differentiate between real and synthetic data.

Structure:

  • Input: Takes either real data from the dataset or synthetic data generated by the generator as the input.
  • Neural Network: Evaluates the input data and generates a probability score indicating whether the input is real or generated.
  • Output: Delivers a probability score that serves as a judgment on the authenticity of the input data, aiding in the adversarial process between the discriminator and the generator.

Technical Training Process:

  • Adaptive Parameters: The discriminator undergoes training to adapt its parameters, employing backpropagation and optimization algorithms to enhance its ability to classify real and synthetic data accurately.

Understanding Back-and-Forth Training in GANs

Let us delve into the dynamics of the back-and-forth training in Generative Adversarial Networks (GANs) through an illustrative example. We’ll explore how to build a Deep Convolutional GAN (DCGAN) for generating synthetic handwritten digit images using the MNIST dataset.

MNIST Dataset: The MNIST dataset is a widely used benchmark dataset in the field of machine learning. It consists of 60,000 training images and 10,000 testing images of handwritten digits ranging from 0 to 9. Each image is a grayscale and has a resolution of 28x28 pixels.

Training Images of MNIST Dataset

The process of training GAN models involves two main steps. They are:

1 — Training the Discriminator: In this step, the discriminator is trained independently in a supervised manner. During the training, it learns to distinguish between real handwritten digit images from the MNIST dataset and fake images generated by the generator. This process helps the discriminator understand the characteristics of genuine handwritten digits, such as stroke patterns and digit shapes. The goal is to classify these images as either real or fake correctly. The discriminator’s loss function is formulated as the sum of the losses for the real and fake batches:

where,

  • D(x) represents the average output of the discriminator for the real batch,
  • D(G(z)) represents the average output for the fake batch.

Initially, D(x) should be close to 1, indicating high confidence in classifying real images, and D(G(z)) should be close to 0, indicating low confidence in classifying fake images.

2 — Training the Generator: Unlike the discriminator, the generator cannot be trained alone because it needs feedback from the discriminator to improve its generated images. Therefore, a combined network is created, consisting of the generator and discriminator models.

The generator learns to generate synthetic handwritten digit images that resemble those from the MNIST dataset. It aims to mimic the distribution of real digit images, including variations in digit shapes, stroke thickness, and background noise.

The goal of training the generator is to minimize log(1-D(G(z))), aiming to generate fake images that are increasingly difficult for the discriminator to classify as fake. As the generator improves, D(G(z)) should converge towards 0.5, indicating that the discriminator is unable to distinguish between real and fake images effectively.

Iterations

The training process unfolds through multiple iterations, with the generator and discriminator engaging in an adversarial dance. The generator produces initial fake images, and the discriminator evaluates them, providing feedback. Building upon the feedback, the generator refines its output, while the discriminator adjusts its discrimination criteria. Further iterations witness continual improvement in the generator’s output and the discriminator’s discrimination ability. Ultimately, both models converge to a state of equilibrium, where the generator generates high-quality, indistinguishable images, and the discriminator struggles to differentiate between real and fake ones.

Images generated over the iterations

Use Cases

  • Health Care: In healthcare, GANs play a transformative role in denoising and creating high-resolution images from low-resolution inputs. This capability enables medical professionals to utilize lower-resolution equipment without compromising diagnostic accuracy or exposing patients to excessive radiation levels. By enhancing image quality and clarity, GANs contribute to improved diagnostic accuracy and patient care while minimizing potential risks associated with medical imaging procedures.
  • Advertising and marketing: In advertising and marketing, GANs are invaluable for creative content generation. They can autonomously generate high-quality images, videos, or text, which can be tailored to specific target audiences. This capability streamlines the content creation process, enabling marketers to produce engaging and personalized content at scale. By leveraging GANs, companies can enhance their marketing strategies, increase brand visibility, and better connect with their customers through compelling and customized content.
  • eCommerce: In eCommerce, GANs play a pivotal role in enhancing customer experiences and streamlining operations. Firstly, GANs can generate high-resolution images of models in various body types and poses, offering a diverse range of visual content for product displays. This enables eCommerce platforms to showcase products more realistically, leading to increased customer engagement and satisfaction. Additionally, GANs facilitate personalized experiences through virtual try-ons, allowing customers to visualize how products would look on themselves before making a purchase. This not only reduces the likelihood of returns but also enhances the overall shopping experience, driving customer loyalty and sales.
  • Mobile apps: Imagine browsing through a photo library and effortlessly unlocking a world of creative possibilities with a simple tap. GAN-powered photo filters and transformations empower users to add artistic flair to their images, transforming them into stunning visual masterpieces. Whether one wants to enhance colors, apply vintage effects, or completely alter appearances, these intuitive tools unleash creativity like never before. Say goodbye to ordinary photos and hello to a world where every snapshot is a canvas waiting to be transformed, igniting a passion for photography and sparking joy with every edit.

Challenges

  1. Mode collapse: One significant challenge in GAN is the “mode collapse” which refers to a scenario where the generator tends to focus excessively on producing a limited set of outputs. In this scenario, the diversity of generated content diminishes, as the model becomes fixated on specific patterns or representations. This limits the range of generated outputs and hampers the GAN’s ability to capture the full complexity of the underlying data distribution, potentially leading to less realistic and diverse results.
  2. Training instability: Another challenge lies in the sensitivity of GANs to hyperparameters, contributing to training instability. Fine-tuning parameters such as learning rates or network architectures can be a delicate task. Small adjustments might lead to significant changes in training dynamics, causing the model to oscillate or fail to converge. This sensitivity demands a careful and iterative tuning process, making GAN training a challenging endeavor that requires expertise and experimentation to achieve stable and optimal results.
  3. Evaluation of difficulties: Assessing the quality of generated outputs poses another challenge in the realm of GANs. The subjective and complex nature of evaluating generated content makes it challenging to develop standardized metrics. Differentiating between genuinely creative outputs and those that may seem plausible but lack realism becomes a nuanced task. This subjectivity introduces uncertainty in objectively measuring the success of a GAN model, making it essential to explore various evaluation methods and consider multiple perspectives to assess the performance and quality of generated content comprehensively.

Conclusion

In the exciting world of artificial intelligence, Generative Adversarial Networks are like magical tools that bring endless possibilities. Think of GANs as your secret sauce for unleashing creativity. They blend learning and exploration, opening doors to a future where your ideas can flourish without limits. Let GANs be your companions on a journey of limitless possibilities, where the joy of innovation prevails.

About the Author

Nishtha Arora is a Senior Machine Learning Engineer at Digitate and a B.Tech Gold Medalist. Her focus is on leveraging Large Language Models, NLP, and Machine Learning to streamline automation processes within the Quality Assurance domain.

--

--

Digitate

Digitate is a leading provider of SaaS-based, autonomous enterprise software, bringing agility, assurance, and resiliency to IT and business operations.