Datasets are stored as uncompressed ZIP archives containing uncompressed PNG files
and a metadata file
dataset.json
for labels.
Custom datasets can be created from a folder containing images;
see
python dataset_tool.py --help
for more information.
Alternatively, the folder can also be used directly as a dataset,
without running it through
dataset_tool.py
first, but doing so may lead
to suboptimal performance.
Legacy TFRecords datasets are not supported — see below for instructions on how to convert them.
FFHQ:
Step 1: Download the
Flickr-Faces-HQ dataset as TFRecords.
Step 2: Extract images from TFRecords using
dataset_tool.py
from the
TensorFlow version of StyleGAN2-ADA:
# Using dataset_tool.py from TensorFlow version at
# https://github.com/NVlabs/stylegan2-ada/
python ../stylegan2-ada/dataset_tool.py unpack \
--tfrecord_dir=~/ffhq-dataset/tfrecords/ffhq --output_dir=/tmp/ffhq-unpacked
Step 3: Create ZIP archive using dataset_tool.py from this repository:
# Original 1024x1024 resolution.
python dataset_tool.py --source=/tmp/ffhq-unpacked --dest=~/datasets/ffhq.zip
# Scaled down 256x256 resolution.
#
# Note: --resize-filter=box is required to reproduce FID scores shown in the
# paper. If you don't need to match exactly, it's better to leave this out
# and default to Lanczos. See https://github.com/NVlabs/stylegan2-ada-pytorch/issues/283#issuecomment-1731217782
python dataset_tool.py --source=/tmp/ffhq-unpacked --dest=~/datasets/ffhq256x256.zip \
--width=256 --height=256 --resize-filter=box
AFHQ: Download the AFHQ dataset and create ZIP archive:
python dataset_tool.py --source=~/downloads/afhq/train/cat --dest=~/datasets/afhqcat.zip
python dataset_tool.py --source=~/downloads/afhq/train/dog --dest=~/datasets/afhqdog.zip
python dataset_tool.py --source=~/downloads/afhq/train/wild --dest=~/datasets/afhqwild.zip
LSUN: Download the desired categories from the LSUN project page and convert to ZIP archive:
python dataset_tool.py --source=~/downloads/lsun/raw/cat_lmdb --dest=~/datasets/lsuncat200k.zip \
--transform=center-crop --width=256 --height=256 --max_images=200000
python dataset_tool.py --source=~/downloads/lsun/raw/car_lmdb --dest=~/datasets/lsuncar200k.zip \
--transform=center-crop-wide --width=512 --height=384 --max_images=200000
In its most basic form, training new networks boils down to:
python train.py --outdir=~/training-runs --data=~/mydataset.zip --gpus=1 --dry-run
python train.py --outdir=~/training-runs --data=~/mydataset.zip --gpus=1
The first command is optional; it validates the arguments, prints out the training configuration, and exits.
The second command kicks off the actual training.
In this example, the results are saved to a newly created directory
~/training-runs/<ID>-mydataset-auto1
, controlled by
--outdir
. The training exports network pickles (
network-snapshot-<INT>.pkl
)
and example images (
fakes<INT>.png
) at regular intervals
(controlled by
--snap
). For each pickle, it also evaluates FID (controlled by
--metrics
)
and logs the resulting scores in
metric-fid50k_full.jsonl
(as well as TFEvents if TensorBoard is installed).
The name of the output directory reflects the training configuration. For example,
00000-mydataset-auto1
indicates that the base configuration was
auto1
,
meaning that the hyperparameters were selected automatically for training on one GPU.
The base configuration is controlled by
--cfg
:
Base config |
Description |
auto (default) |
Automatically select reasonable defaults based on resolution and GPU count.
Serves as a good starting point for new datasets but does not necessarily lead to optimal results. |
paper256 |
Reproduce results for FFHQ and LSUN Church at 256x256 using 1, 2, 4, or 8 GPUs. |
paper512 |
Reproduce results for AFHQ-Cat at 512x512 using 1, 2, 4, or 8 GPUs. |
The training configuration can be further customized with additional command line options:
--aug=noaug
disables ADA.
--cond=1
enables class-conditional training (requires a dataset with labels).
--mirror=1
amplifies the dataset with x-flips. Often beneficial, even with ADA.
--resume=ffhq1024 --snap=10
performs transfer learning from FFHQ trained at 1024x1024.
--resume=~/training-runs/<NAME>/network-snapshot-<INT>.pkl
resumes a previous training run.
--gamma=10
overrides R1 gamma. We recommend trying a couple of different values for each new dataset.
--aug=ada --target=0.7
adjusts ADA target value (default: 0.6).
--augpipe=blit
enables pixel blitting but disables all other augmentations.
--augpipe=bgcfnc
enables all available augmentations (blit, geom, color, filter, noise, cutout).
Please refer to
python train.py --help
for the full list.