InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets
Summary
InfoGAN is an extension to Generative Adversarial Networks that learns disentangled representations of the latent variables within the generator network. The authors employ the variational information maximization framework to optimize a lower bound on a mutual information criterion. This MI is between a small subset of the latent variables and the output of the generator. The authors argue that this encourages these latent variables to become “disentangled”.
In turn, this allows the GANs to learn, in a completely unsupervised manner, interesting data representations such as stylistic factors on the MNIST dataset. The change to the traditional GAN architecture is minimal, since this is simply a regularized MI term added to the minimax function.
Questions
-
Why don’t they use the reverse KL in Eq. 4? This is what is normally used in Variational Bayes (why?). The reverse KL, KL(Q || P), is minimized when Q places no probability mass where P has no probability mass..
-
What justifies moving $f(x,y)$ into the third integral in Eq. 7?
-
What other potential applications of GANs are there besides generating realistic samples from $p_{data}$. How can it be used for general density estimation? (This seems like a big question)
-
Are the features learned by the conv nets any different between InfoGAN and GAN? (What is the potential importance of this?)
-
Does InfoGAN produce sharper/more realistic images than GANs? Seems like the answer would be no, but hard to quantify this