An extension of the Characteristic Capturing VAE (CCVAE) framework.
Adapted to handle hybrid attribute spaces (Binary & Multi-class) for independent control over gender and age.
This repository presents a modification of the Characteristic Capturing VAE (CCVAE) framework, originally designed for binary attributes (e.g., CelebA). We extend the architecture to handle a hybrid attribute space consisting of binary labels and multi-class labels.
The model is trained on the UTKFace dataset to learn a disentangled latent representation.
- Binary Attributes: Gender (Male / Female).
- Multi-Class Attributes: Age groups (Child, Young Adult, Adult, Senior).
- Unsupervised Factors: Style (Pose, Lighting, Background).
- Hybrid Conditional Prior: Implements distinct embedding strategies for binary and multi-class labels to shape the latent space effectively.
- Fixed Anchors Initialization: Latent clusters are initialized with fixed centers. This enforces strict separation and ordering in the latent space, preventing class overlap.
- Class-Weighted Loss: A targeted weighting strategy addresses the severe class imbalance in UTKFace (e.g., fewer seniors/children) without the need for oversampling.
The model successfully generates diverse samples conditioned on specific age and gender combinations while preserving image quality.
| Children | Adult Women |
|---|---|
![]() |
![]() |
| Adult Men | Elderly Men |
|---|---|
![]() |
![]() |
t-SNE projections of the characteristic latent space (
Interpolating the latent vector
- Download the UTKFace dataset (Aligned & Cropped version).
- Extract the images to the data directory:
data/UTKFace/
Run the training with default parameters:
python -m src.training_hybridYou can specify parameters such as batch size, device, and the fraction of supervised data:
python -m src.training_hybrid \
--batch_size 128 \
--sup_frac 0.5 \
--device cuda \
--num_workers 4| Argument | Type | Default | Description |
|---|---|---|---|
--batch_size |
int |
256 |
Number of samples per batch. |
--sup_frac |
float |
1.0 |
Fraction of supervised data to use (1.0 = fully supervised). |
--device |
str |
cuda |
Compute device (cpu, cuda, or mps). |
--num_workers |
int |
4 |
Number of subprocesses for data loading. |
If you find this code useful, please refer to the original paper:
Capturing Label Characteristics in VAEs
Tom Joy, Sebastian M. Schmon, Philip H.S. Torr, N. Siddharth, Tom Rainforth (ICLR 2021)
Click to copy BibTeX
@inproceedings{joy2021capturing,
title={Capturing Label Characteristics in {VAE}s},
author={Tom Joy and Sebastian M. Schmon and Philip H.S. Torr and N. Siddharth and Tom Rainforth},
booktitle={International Conference on Learning Representations},
year={2021},
url={[https://openreview.net/forum?id=w5-iJ9-wS6D](https://openreview.net/forum?id=w5-iJ9-wS6D)}
}Dataset Reference:






