# Notes on: Srivastava, A., Valkov, L., Russell, C., Gutmann, M. U., & Sutton, C. (2017): Veegan: reducing mode collapse in gans using implicit variational learning

## Table of Contents

## Overview

- Models are prone to
**mode-collapse** **VEEGAN**uses a reconstructor network, reversing the action of the generator by mapping*from data to noise*

The main benefits are as follows:

- Specifically "fights" mode-collapses
- Does not require a loss-function to be specified on actual inputs; instead one only needs to specify a loss-function between the
*distributions*over the representation parameters

## Notation & Definitions

**mode-collapse**refers to the model only characterizing a few modes of the true distribution- is the
**representation vectors**, i.e. the random variables drawn to parametrize when generating a sample. Typically drawn from a standard normal distribution. - is the
**generator**network, which maps representation to data - is the
**discriminator**network, whose task is to discriminate between generated and real samples - is the
**reconstructor**network, which maps samples (both from real and generated distribution) to the representations "estimated" to have been used for the "generating" procedure - is the distribution of the outputs of when applied to a fixed data point , which we let be Gaussian with unit variance and mean function
- is the distribution of , i.e. .

## GANs

### Objective function

where

- refers to taking the expectation over the standard normal
- is expectation over

#### At optimum

At the optimum, in the limit of infinite data and arbitrarily powerfuls models, we will have

where is the density that is induced by running the network on normally distributed input, and hence that .5423_generative_adversarial_nets

## Reconstructor network

- generates samples, where the parameters for the generation () is being drawn from some distribution
- attempts to map the generated "distribution" back to the distribution which was used by to generate the samples

The **main idea** is then that we can use this "trained" inverse of to identify mode-collapses of .

We got two main cases:

- Suppose is approx. inverse of .
- Draw samples from
- Apply to both samples from and samples from real distribution
- The resulting
*distribution*over the "parameters" (don't think about specific values for each of these) ought to be the same as the underlying values used- Any difference here is a "training signal" for

- Suppose can successfully map the true data distribution to a Gaussian
- If mode collapses, then will not map all back to the original and we use this mismatch for updating both and

## Objective function

Basically, they deduce that the *wanted* objective function would be

where:

- 1st term is to minimize the loss between and
- 2nd term is to minimize the entropy between the representation and the inverse mapping from data to representation

This is **intractable**, hence they go on to show that rather than minimizing the intractable , they can minimize the *upper bound* of

With sufficiently powerful networks and , the distributions and can represented, respectively. Thus, we can estimate as