Notes on: Srivastava, A., Valkov, L., Russell, C., Gutmann, M. U., & Sutton, C. (2017): Veegan: reducing mode collapse in gans using implicit variational learning

Table of Contents

Overview

  • Models are prone to mode-collapse
  • VEEGAN uses a reconstructor network, reversing the action of the generator by mapping from data to noise

The main benefits are as follows:

  • Specifically "fights" mode-collapses
  • Does not require a loss-function to be specified on actual inputs; instead one only needs to specify a loss-function between the distributions over the representation parameters srivastava17_veegan_9c15196dd07b1add486b8b54592e74bfe946ed95.png

Notation & Definitions

  • mode-collapse refers to the model only characterizing a few modes of the true distribution
  • srivastava17_veegan_22db7e86445fd844b4640d126be1e5f0044307fc.png is the representation vectors , i.e. the random variables drawn to parametrize srivastava17_veegan_9a36a4b6b6b4c1d461ec9131210d2a18c15bbc9e.png when generating a sample. Typically drawn from a standard normal distribution.
  • srivastava17_veegan_50193edf016ceb165f3962228456838ff2f83c17.png is the generator network, which maps representation srivastava17_veegan_9c15196dd07b1add486b8b54592e74bfe946ed95.png to data srivastava17_veegan_3c314f80373742988ad542a6d4ce66a111b17847.png
  • srivastava17_veegan_6a3ad63767b77bffa2207f22c15c027ad25042eb.png is the discriminator network, whose task is to discriminate between generated and real samples
  • srivastava17_veegan_e89d1a95492bc5de4c3f5aca35ed291bb6769abb.png is the reconstructor network, which maps samples (both from real and generated distribution) to the representations "estimated" to have been used for the "generating" procedure
  • srivastava17_veegan_5871b81787de62ecf87100dfa2b9f33740227f52.png is the distribution of the outputs of srivastava17_veegan_41cfd4d8d768011b2e9dc2e6d583ae246f17e287.png when applied to a fixed data point srivastava17_veegan_3c314f80373742988ad542a6d4ce66a111b17847.png, which we let be Gaussian with unit variance and mean function srivastava17_veegan_ba472569b18e0249dc15ee9bcd83a609727e94c4.png
  • srivastava17_veegan_50a0a7ea1bc6dc7e4e99901a66cf2e9ca47fe9db.png is the distribution of srivastava17_veegan_9c15196dd07b1add486b8b54592e74bfe946ed95.png, i.e. srivastava17_veegan_9b609cdf2b41a1cc275f7dc6f7d49ca44dd00f1e.png.

GANs

Objective function

srivastava17_veegan_b7370130acff3605c93c372d23bc8d5a0bca2946.png

where

  • srivastava17_veegan_4d545765816264cfae8760278c596a8b95fcf53f.png refers to taking the expectation over the standard normal
  • srivastava17_veegan_68caa93a0e367b5818e96ac065382ecce6bef618.png is expectation over srivastava17_veegan_e89168996a065100b69f75f3fc121549ab9f209d.png

At optimum

At the optimum, in the limit of infinite data and arbitrarily powerfuls models, we will have

srivastava17_veegan_2d2d252f7d309c1fe2bbee83c2b3d7833c19f692.png

where srivastava17_veegan_f4b8cb7ff6a9404d1af92213c6bc80cc0abcb17b.png is the density that is induced by running the network srivastava17_veegan_9a36a4b6b6b4c1d461ec9131210d2a18c15bbc9e.png on normally distributed input, and hence that srivastava17_veegan_e2979933e984017f2874ad9b3ac034b0652ec09f.png.5423_generative_adversarial_nets

Reconstructor network

  1. srivastava17_veegan_9a36a4b6b6b4c1d461ec9131210d2a18c15bbc9e.png generates samples, where the parameters for the generation (srivastava17_veegan_9c15196dd07b1add486b8b54592e74bfe946ed95.png) is being drawn from some distribution
  2. srivastava17_veegan_41cfd4d8d768011b2e9dc2e6d583ae246f17e287.png attempts to map the generated "distribution" back to the distribution which was used by srivastava17_veegan_9a36a4b6b6b4c1d461ec9131210d2a18c15bbc9e.png to generate the samples

The main idea is then that we can use this "trained" inverse of srivastava17_veegan_41cfd4d8d768011b2e9dc2e6d583ae246f17e287.png to identify mode-collapses of srivastava17_veegan_9a36a4b6b6b4c1d461ec9131210d2a18c15bbc9e.png.

We got two main cases:

  • Suppose srivastava17_veegan_41cfd4d8d768011b2e9dc2e6d583ae246f17e287.png is approx. inverse of srivastava17_veegan_9a36a4b6b6b4c1d461ec9131210d2a18c15bbc9e.png.
    1. Draw samples from srivastava17_veegan_9a36a4b6b6b4c1d461ec9131210d2a18c15bbc9e.png
    2. Apply srivastava17_veegan_41cfd4d8d768011b2e9dc2e6d583ae246f17e287.png to both samples from srivastava17_veegan_9a36a4b6b6b4c1d461ec9131210d2a18c15bbc9e.png and samples from real distribution srivastava17_veegan_e89168996a065100b69f75f3fc121549ab9f209d.png
    3. The resulting distribution over the "parameters" srivastava17_veegan_9c15196dd07b1add486b8b54592e74bfe946ed95.png (don't think about specific values for each of these) ought to be the same as the underlying values srivastava17_veegan_9a36a4b6b6b4c1d461ec9131210d2a18c15bbc9e.png used
      • Any difference here is a "training signal" for srivastava17_veegan_9a36a4b6b6b4c1d461ec9131210d2a18c15bbc9e.png
  • Suppose srivastava17_veegan_41cfd4d8d768011b2e9dc2e6d583ae246f17e287.png can successfully map the true data distribution srivastava17_veegan_e89168996a065100b69f75f3fc121549ab9f209d.png to a Gaussian
    • If srivastava17_veegan_9a36a4b6b6b4c1d461ec9131210d2a18c15bbc9e.png mode collapses, then srivastava17_veegan_41cfd4d8d768011b2e9dc2e6d583ae246f17e287.png will not map all srivastava17_veegan_95f17fe55d4efe46f5245bc70ce77c8c09247698.png back to the original srivastava17_veegan_9c15196dd07b1add486b8b54592e74bfe946ed95.png and we use this mismatch for updating both srivastava17_veegan_e7a86163f39fa21d4a2ed66946369cdeb900ef42.png and srivastava17_veegan_ec150cd354803373f200e63346b8a8df0f63f554.png

Objective function

Basically, they deduce that the wanted objective function would be

srivastava17_veegan_7789bd749cb20f951b4fdb9efafaf57462dc102b.png

where:

  • 1st term is to minimize the srivastava17_veegan_42cc62e9bb5cfa835f2ec2f176687466eca9ac99.png loss between srivastava17_veegan_9b609cdf2b41a1cc275f7dc6f7d49ca44dd00f1e.png and srivastava17_veegan_0013091b5f60d6e301bbe12856f37489b4e4d0a3.png
  • 2nd term is to minimize the entropy between the representation and the inverse mapping from data to representation

This is intractable, hence they go on to show that rather than minimizing the intractable srivastava17_veegan_ec37885db73e10a5ca3a99a05b0d08d759b9659e.png, they can minimize the upper bound of

srivastava17_veegan_9dd9e99985b0b05b034e04e793e4bb9da6930752.png

With sufficiently powerful networks srivastava17_veegan_9a36a4b6b6b4c1d461ec9131210d2a18c15bbc9e.png and srivastava17_veegan_41cfd4d8d768011b2e9dc2e6d583ae246f17e287.png, the distributions srivastava17_veegan_f4b8cb7ff6a9404d1af92213c6bc80cc0abcb17b.png and srivastava17_veegan_6fc2f051f76c2045e3abe3f48a1fbac0ae48cd59.png can represented, respectively. Thus, we can estimate srivastava17_veegan_551df2c052b1d0efd3ea4c7f7c28606cf40eb0fe.png as

srivastava17_veegan_a07f11f3f7deeab39672108d0be98c4892e5896e.png