Notes on: Srivastava, A., Valkov, L., Russell, C., Gutmann, M. U., & Sutton, C. (2017): Veegan: reducing mode collapse in gans using implicit variational learning
Table of Contents
Overview
- Models are prone to mode-collapse
- VEEGAN uses a reconstructor network, reversing the action of the generator by mapping from data to noise
The main benefits are as follows:
- Specifically "fights" mode-collapses
- Does not require a loss-function to be specified on actual inputs; instead one only needs to specify a loss-function between the distributions over the representation parameters
Notation & Definitions
- mode-collapse refers to the model only characterizing a few modes of the true distribution
is the representation vectors , i.e. the random variables drawn to parametrize
when generating a sample. Typically drawn from a standard normal distribution.
is the generator network, which maps representation
to data
is the discriminator network, whose task is to discriminate between generated and real samples
is the reconstructor network, which maps samples (both from real and generated distribution) to the representations "estimated" to have been used for the "generating" procedure
is the distribution of the outputs of
when applied to a fixed data point
, which we let be Gaussian with unit variance and mean function
is the distribution of
, i.e.
.
GANs
Objective function
where
refers to taking the expectation over the standard normal
is expectation over
At optimum
At the optimum, in the limit of infinite data and arbitrarily powerfuls models, we will have
where is the density that is induced by running the network
on normally distributed input, and hence that
.5423_generative_adversarial_nets
Reconstructor network
generates samples, where the parameters for the generation (
) is being drawn from some distribution
attempts to map the generated "distribution" back to the distribution which was used by
to generate the samples
The main idea is then that we can use this "trained" inverse of to identify mode-collapses of
.
We got two main cases:
- Suppose
is approx. inverse of
.
- Draw samples from
- Apply
to both samples from
and samples from real distribution
- The resulting distribution over the "parameters"
(don't think about specific values for each of these) ought to be the same as the underlying values
used
- Any difference here is a "training signal" for
- Any difference here is a "training signal" for
- Draw samples from
- Suppose
can successfully map the true data distribution
to a Gaussian
- If
mode collapses, then
will not map all
back to the original
and we use this mismatch for updating both
and
- If
Objective function
Basically, they deduce that the wanted objective function would be
where:
- 1st term is to minimize the
loss between
and
- 2nd term is to minimize the entropy between the representation and the inverse mapping from data to representation
This is intractable, hence they go on to show that rather than minimizing the intractable , they can minimize the upper bound of
With sufficiently powerful networks and
, the distributions
and
can represented, respectively. Thus, we can estimate
as