This 🖥️📦 Repository corresponds to our 📚📄 Paper towards Biases in datasets for AI-Generated Images Detection. As discussed detailed in the paper, experiments are examined on the GenImage dataset.
⬇️ We provide an easy GenImage download here (~500GB): DOWNLOAD. Furthermore, we removed corrupted files in the GenImage download and added a metadata CSV. This CSV is needed for our training and validation code and contains additional information like content classes of each image which is not part of the original dataset.
Use our download-script like this, since the web interface doesn’t allow downloading all files at one:
python download_genimage.py <--continue> <--destination {path}>
--continue
: Optional. Skip files if they already exist. Default is to start a new download.--destination {path}
: Optional. Specify a custom directory where the files should be downloaded. Default is ./GenImage_download
cat GenImage.z* > ../GenImage_restored.zip
ℹ️ NOTE: By now, there’s an easy GenImage download on Google Drive. We recommend downloading the GenImage dataset there and only downloading the metadata.csv from our dataverse. ℹ️
We provide Code for training and validating ResNet50 and Swin-T detectors. This aims to show that:
Same as in the original GenImage paper, we use forks from timm and Swin-Transformer. We just changed the dataset (create_dataset.py) to be more suitable for our experiments. This dataset uses get_data.py for selecting the right data from the csv file and get_transform.py for transformations like JPEG-compression that are applied before the original transformations/augmentations. More details for how to start experiments can be found in the corresponding detector folders.
To do inference on own datasets, you have to create a CSV file and slightly adjust get_data.py as we did for the ffhq dataset.
Cross-Generator Performance when training ResNet50 on constrained dataset
Difference to when training on raw dataset
Cross-Generator Performance when training Swin-T on constrained dataset
Difference to when training on raw dataset