The Minimax Game and its Generalizability to other domains

Anup Deshmukh
thebadmaxx
Published in
5 min readOct 4, 2018

--

The Generative Adversarial Network (GAN) has been very active area of research in Deep Learning community. Although there have been several architectures offering a minutiae topping on existing frameworks, the crux of this idea — Adversarial games remain untouched. You will find tons of articles explaining how GAN’s work and I would recommend readers to go through some of them. Here is the link for all Medium lovers: https://medium.com/@jonathan_hui/gan-whats-generative-adversarial-networks-and-its-application-f39ed278ef09

A more intuitive explaination: http://blog.kaggle.com/2018/01/18/an-intuitive-introduction-to-generative-adversarial-networks/

Adversarial game, also referred to as a Minimax game is domain of Game theory where two or more agents play a game against each other. We can take this in a literal sense and imagine two machines playing a game of Chess, Go or simply Tic-Tac-Toe. One machine A acts as an attacker by playing (generating) difficult moves while the machine B tries to not get fooled and stay alert for the moves of A.

Like humans, in these games, strategy can be reinforced and machines can be considered entities equivalent to rational decision-makers.

The goals of these rational decision makers are usually conflicting. These conflicting games: Zero-sum games, are most often solved with the minimax theorem which is closely related with Nash equilibrium. Let’s admire the beauty of this jargon for sometime and ask ourselves some intersting questions! Is it possible to extend the idea of Minimax game to other domains and force our machine A to learn something different?

IRGAN: A Minimax Game for Unifying Generative and Discriminative Information Retrieval Models is such formulation! It is most recent and one of the few works which uses key ideas from GAN formulation to design their own Minimax game. Let’s first quickly go over the traditional methods of information retrieval (IR) which will put lights on motivation behind this idea.

In text retrieval, the classic relevance model of IR is focused on describing how a (relevant) document is selected from a given information need: q → d, where q is the query (keywords, user profile etc.), d is its corresponding document (textual documents, information items etc.). The modern school of thinking in IR shifts to a discriminative (classification) solution learned from labelled relevant already made decisions. It considers documents and queries jointly as features and predicts their relevancy or rank order labels from a large amount of training data: q +d → r , where r denotes the relevance. Given above two schools of thinking, the next step would be to use the best of both worlds and IRGAN does exactly the same. Without further due, let’s get into the Minimax game of IRGAN.

Generative retrieval model: tries to select relevant documents, from the candidate pool for the given query q. While doing this, it also tries to approximate the true relevance distribution over documents.

Discriminative retrieval model: tries to discriminate between relevant documents and non-relevant documents for the query q as accurately as possible. In simple words, all it is doing is binary classification!

Overall objective:

Minimax game in IRGAN — what is new here?

From the above equation, we can see that the optimal parameters of the generative retrieval model and the discriminative retrieval model can be learned iteratively by maximising and minimising the same objective function, respectively.

Optimizing the discriminator — exactly the same as vanilla GAN’s

In order to obtain the optimal parameters, the objective for the discriminator is to maximise the log-likelihood of correctly distinguishing the true and selected relevant documents by the generator.

It is very important to note here that unlike GAN, goal of the generative model is to directly select known documents (in the document identifier space) not their features, because here work in IR intends to select relevant documents from a given document pool.

Optimizing the generator — here comes the tricky part!

As the sampling of d is discrete, the generator cannot be directly optimised by gradient descent as in the original GAN formulation. A common approach is to use policy gradient based reinforcement learning (REINFORCE). This article here particularly explains the need of giving discrete rewards to generator.

The illustration of IRGAN training — analogy borrwed from the authors of IRGAN

How do the discriminator and the generator help each other? In each epoch of training, the generator tries to generate samples close to the discriminator’s decision boundary to confuse its training next round (machine A attacking machine B in a game), while the discriminator tries to score down the generated samples. Since there exists positive correlations between the positive but unobserved (true-positive) samples and part of the observed positive samples, the generator should be able to learn to push upwards these positive but unobserved samples faster than other samples with low relevancy.

To understand this process further, look at the illustration provided in the above figure. Even if the generator cannot perfectly fit the conditional data distribution, there could be still a dynamic equilibrium, which is obtained when the distribution of the positive and negative unobserved soaps get stable at different depth of the water. Since the unobserved positive soaps are linked to those observed positive soaps staying on the water surface, overall they should be able to reach higher positions than the (unobserved) negative soaps in the end.

References:

[1] Wang, Jun, et al. “Irgan: A minimax game for unifying generative and discriminative information retrieval models.” Proceedings of the 40th International ACM SIGIR conference on Research and Development in Information Retrieval. ACM, 2017.

--

--

Anup Deshmukh
thebadmaxx

Known to rant about AI, technology, life, movies and everything else