Disentangling Latent Space for VAE by Label Relevant/Irrelevant Dimensions

Zheng, Zhilin; Sun, Li

Computer Science > Computer Vision and Pattern Recognition

arXiv:1812.09502v4 (cs)

[Submitted on 22 Dec 2018 (v1), last revised 15 Mar 2019 (this version, v4)]

Title:Disentangling Latent Space for VAE by Label Relevant/Irrelevant Dimensions

Authors:Zhilin Zheng, Li Sun

View PDF

Abstract:VAE requires the standard Gaussian distribution as a prior in the latent space. Since all codes tend to follow the same prior, it often suffers the so-called "posterior collapse". To avoid this, this paper introduces the class specific distribution for the latent code. But different from CVAE, we present a method for disentangling the latent space into the label relevant and irrelevant dimensions, $\bm{\mathrm{z}}_s$ and $\bm{\mathrm{z}}_u$, for a single input. We apply two separated encoders to map the input into $\bm{\mathrm{z}}_s$ and $\bm{\mathrm{z}}_u$ respectively, and then give the concatenated code to the decoder to reconstruct the input. The label irrelevant code $\bm{\mathrm{z}}_u$ represent the common characteristics of all inputs, hence they are constrained by the standard Gaussian, and their encoder is trained in amortized variational inference way, like VAE. While $\bm{\mathrm{z}}_s$ is assumed to follow the Gaussian mixture distribution in which each component corresponds to a particular class. The parameters for the Gaussian components in $\bm{\mathrm{z}}_s$ encoder are optimized by the label supervision in a global stochastic way. In theory, we show that our method is actually equivalent to adding a KL divergence term on the joint distribution of $\bm{\mathrm{z}}_s$ and the class label $c$, and it can directly increase the mutual information between $\bm{\mathrm{z}}_s$ and the label $c$. Our model can also be extended to GAN by adding a discriminator in the pixel domain so that it produces high quality and diverse images.

Comments:	Accepted by CVPR 2019
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:1812.09502 [cs.CV]
	(or arXiv:1812.09502v4 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.1812.09502

Submission history

From: Zhilin Zheng [view email]
[v1] Sat, 22 Dec 2018 11:09:50 UTC (1,698 KB)
[v2] Fri, 28 Dec 2018 05:19:34 UTC (1,699 KB)
[v3] Fri, 4 Jan 2019 13:36:24 UTC (1,699 KB)
[v4] Fri, 15 Mar 2019 12:01:42 UTC (1,699 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Disentangling Latent Space for VAE by Label Relevant/Irrelevant Dimensions

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Disentangling Latent Space for VAE by Label Relevant/Irrelevant Dimensions

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators