Skip to content

clementgilli/CCVAE

Repository files navigation

Hybrid CCVAE

An extension of the Characteristic Capturing VAE (CCVAE) framework.
Adapted to handle hybrid attribute spaces (Binary & Multi-class) for independent control over gender and age.


1. Overview

This repository presents a modification of the Characteristic Capturing VAE (CCVAE) framework, originally designed for binary attributes (e.g., CelebA). We extend the architecture to handle a hybrid attribute space consisting of binary labels and multi-class labels.

The model is trained on the UTKFace dataset to learn a disentangled latent representation.

Attribute Scope

  • Binary Attributes: Gender (Male / Female).
  • Multi-Class Attributes: Age groups (Child, Young Adult, Adult, Senior).
  • Unsupervised Factors: Style (Pose, Lighting, Background).

2. Key Features

  • Hybrid Conditional Prior: Implements distinct embedding strategies for binary and multi-class labels to shape the latent space effectively.
  • Fixed Anchors Initialization: Latent clusters are initialized with fixed centers. This enforces strict separation and ordering in the latent space, preventing class overlap.
  • Class-Weighted Loss: A targeted weighting strategy addresses the severe class imbalance in UTKFace (e.g., fewer seniors/children) without the need for oversampling.

3. Results

Conditional Generation

The model successfully generates diverse samples conditioned on specific age and gender combinations while preserving image quality.

Children Adult Women
Adult Men Elderly Men

Latent Space Structure (t-SNE)

t-SNE projections of the characteristic latent space ($z_c$) on the test set reveal the structure learned by the model.

Age Latent Space Gender Latent Space
The "Child" class forms a distinct cluster. Adult classes show a natural continuum. The "Elderly" class is proximal to adults. Clear separation into two distinct clusters for Male and Female.

Latent Traversal

Interpolating the latent vector $z_c$ between the "Child" anchor and the "Elderly" anchor demonstrates smooth aging transitions.


4. Dataset Setup

  1. Download the UTKFace dataset (Aligned & Cropped version).
  2. Extract the images to the data directory:
    data/UTKFace/
    

5. Usage

Basic Training

Run the training with default parameters:

python -m src.training_hybrid

Advanced Configuration

You can specify parameters such as batch size, device, and the fraction of supervised data:

python -m src.training_hybrid \
    --batch_size 128 \
    --sup_frac 0.5 \
    --device cuda \
    --num_workers 4
Argument Type Default Description
--batch_size int 256 Number of samples per batch.
--sup_frac float 1.0 Fraction of supervised data to use (1.0 = fully supervised).
--device str cuda Compute device (cpu, cuda, or mps).
--num_workers int 4 Number of subprocesses for data loading.

6. References

If you find this code useful, please refer to the original paper:

Capturing Label Characteristics in VAEs
Tom Joy, Sebastian M. Schmon, Philip H.S. Torr, N. Siddharth, Tom Rainforth (ICLR 2021)

Click to copy BibTeX
@inproceedings{joy2021capturing,
    title={Capturing Label Characteristics in {VAE}s},
    author={Tom Joy and Sebastian M. Schmon and Philip H.S. Torr and N. Siddharth and Tom Rainforth},
    booktitle={International Conference on Learning Representations},
    year={2021},
    url={[https://openreview.net/forum?id=w5-iJ9-wS6D](https://openreview.net/forum?id=w5-iJ9-wS6D)}
}

Dataset Reference:

UTKFace Large Scale Face Dataset

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors