Improving GANs Stability and Quality via Consistent Latent Representation and Reconstruction
Accepted by European Conference on Computer Vision (ECCV) 2024
Shengke Sun, Ziqian Luan, Zhanshan Zhao, Shijie Luo and Shuzhen Han
What is CLR-GAN?
Generative Adversarial Networks(GANs) have received considerable attention due to its outstanding ability to generate images.
However, training a GAN is hard since the game between the Generator(G) and the Discriminator(D) is unfair.
Towards making the competition fairer, we propose a new perspective of training GANs, named Consistent Latent Representation and Reconstruction(CLR-GAN).
In this paradigm, we treat the G and D as an inverse process, the discriminator has an additional task to restore the pre-defined latent code while the generator also needs to reconstruct the real input, thus obtaining a relationship between the latent space of G and the out-features of D.
Based on this prior, we can put D and G on an equal position during training using a new criterion. Experimental results on various datasets and architectures prove our paradigm can make GANs more stable and generate better quality images.
Usage of CLR-GAN Code
Preparing datasets
Training new networks
Datasets are stored as uncompressed ZIP archives containing uncompressed PNG files and a metadata file dataset.json for labels.
Custom datasets can be created from a folder containing images; see python --help for more information. Alternatively, the folder can also be used directly as a dataset, without running it through first, but doing so may lead to suboptimal performance.
Legacy TFRecords datasets are not supported — see below for instructions on how to convert them.
Step 1: Download the Flickr-Faces-HQ dataset as TFRecords.
Step 2: Extract images from TFRecords using from the TensorFlow version of StyleGAN2-ADA:
# Using from TensorFlow version at
python ../stylegan2-ada/ unpack \
    --tfrecord_dir=~/ffhq-dataset/tfrecords/ffhq --output_dir=/tmp/ffhq-unpacked
Step 3: Create ZIP archive using from this repository:
# Original 1024x1024 resolution.
python --source=/tmp/ffhq-unpacked --dest=~/datasets/

# Scaled down 256x256 resolution.
# Note: --resize-filter=box is required to reproduce FID scores shown in the
# paper.  If you don't need to match exactly, it's better to leave this out
# and default to Lanczos.  See
python --source=/tmp/ffhq-unpacked --dest=~/datasets/ \ 
    --width=256 --height=256 --resize-filter=box
AFHQ: Download the AFHQ dataset and create ZIP archive:
python --source=~/downloads/afhq/train/cat --dest=~/datasets/
python --source=~/downloads/afhq/train/dog --dest=~/datasets/
python --source=~/downloads/afhq/train/wild --dest=~/datasets/
LSUN: Download the desired categories from the LSUN project page and convert to ZIP archive:
python --source=~/downloads/lsun/raw/cat_lmdb --dest=~/datasets/ \
    --transform=center-crop --width=256 --height=256 --max_images=200000

python --source=~/downloads/lsun/raw/car_lmdb --dest=~/datasets/ \
    --transform=center-crop-wide --width=512 --height=384 --max_images=200000
In its most basic form, training new networks boils down to:
python --outdir=~/training-runs --data=~/ --gpus=1 --dry-run
python --outdir=~/training-runs --data=~/ --gpus=1
The first command is optional; it validates the arguments, prints out the training configuration, and exits. The second command kicks off the actual training.
In this example, the results are saved to a newly created directory ~/training-runs/<ID>-mydataset-auto1 , controlled by --outdir. The training exports network pickles (network-snapshot-<INT>.pkl) and example images (fakes<INT>.png) at regular intervals (controlled by --snap). For each pickle, it also evaluates FID (controlled by --metrics) and logs the resulting scores in metric-fid50k_full.jsonl (as well as TFEvents if TensorBoard is installed).
The name of the output directory reflects the training configuration. For example, 00000-mydataset-auto1 indicates that the base configuration was auto1, meaning that the hyperparameters were selected automatically for training on one GPU. The base configuration is controlled by --cfg:
Base config Description
auto (default) Automatically select reasonable defaults based on resolution and GPU count. Serves as a good starting point for new datasets but does not necessarily lead to optimal results.
paper256 Reproduce results for FFHQ and LSUN Church at 256x256 using 1, 2, 4, or 8 GPUs.
paper512 Reproduce results for AFHQ-Cat at 512x512 using 1, 2, 4, or 8 GPUs.
The training configuration can be further customized with additional command line options:
  • --aug=noaug disables ADA.
  • --cond=1 enables class-conditional training (requires a dataset with labels).
  • --mirror=1 amplifies the dataset with x-flips. Often beneficial, even with ADA.
  • --resume=ffhq1024 --snap=10 performs transfer learning from FFHQ trained at 1024x1024.
  • --resume=~/training-runs/<NAME>/network-snapshot-<INT>.pkl resumes a previous training run.
  • --gamma=10 overrides R1 gamma. We recommend trying a couple of different values for each new dataset.
  • --aug=ada --target=0.7 adjusts ADA target value (default: 0.6).
  • --augpipe=blit enables pixel blitting but disables all other augmentations.
  • --augpipe=bgcfnc enables all available augmentations (blit, geom, color, filter, noise, cutout).
Please refer to python --help for the full list.
Shengke Sun
Ziqian Luan
Zhanshan Zhao
Shijie Luo
Shuzhen Han