작성자: 민규식, 백종윤

Paper

1. Abstract

Supervision 없이 독립적인 데이터의 interpretable factorised representation을 학습하는 것이 중요
Beta VAE → VAE의 변형으로 조절가능한 hyperparameter beta를 통해 latent channel capacity와 independence constraints with reconstruction accuracy 사이의 균형을 맞춤
사용한 데이터: celebA, face, chairs
대조군 알고리즘: Unsupervised (InfoGAN), Semi-Supervised (DC-IGN)
beta VAE는 학습이 stable하고 data에 대한 가정이 거의 없으며 하나의 hyperparameter beta만 튜닝하면 된다!

2. Introduction

특정 task나 data에 잘 맞는 representation을 찾으면 해당 모델의 학습의 성공 확률이나 강인성이 크게 증가
Object identity, scale, position, lighting, color 등의 특성에 대한 disentangled representation을 찾는 것이 중요

위의 예시는 CelebA에서 Latent variable을 조절하면서 도출한 결과!
- Beta VAE (beta=250), VAE (beta=1), InfoGAN의 disentangling 성능을 비교한 결과
- Latent variable 중 하나의 값만 조절하고 나머지 값들은 고정시켜놓음 (하나의 값은 -3 ~ 3의 범위를 가지면서 값을 연속적으로 조절)
- 위의 결과를 보면 beta-VAE와 InfoGAN만 Disentangled factor를 학습
- VAE는 entangled representation 학습 (회전, 감정, 안경 등이 동시에 바뀜)
- 위의 결과를 통해 beta VAE는 data로부터 variation의 독집적인 latent factors를 학습한다는 것을 알 수 있음

위의 결과는 3D Chair에 대한 학습 결과
- BetaVAE (beta=5), VAE (beta=1), InfoGAN, DC-IGN의 학습 결과
- VAE의 경우 entangled representation 학습 → chair width나 azimuth, leg style이 동시에 바뀜
- InfoGAN과 beta VAE는 unlabelled factor 들을 찾아냄
- 하지만 InfoGAN의 경우 leg style에 대한 factor는 찾아내지 못함
Beta VAE는 유일한 parameter beta를 사용