Validating indicators

We upload here a zip file that contains the following items:

the instructions for loading the models;
the information that we tracked during the attacks, for enabling computing the indicators; and
the values of the indicators obtained from the tracked information.

These are provided at least for the models that are implemented with pure PyTorch and Keras. We will later decide whether to include the results for the model implemented in SecML, as it would require to re-implement the defense with other tools.

The instructions for running the attacks, as well as the hyperparameters we used for the experiments, can be found in our paper.

Loading the models

The instructions are already part of the repository, and can be found at the following URLs:

distillation model: https://github.com/pralab/IndicatorsOfAttackFailure/blob/master/src/models/distillation/load_distillation.py
kwta model: https://github.com/pralab/IndicatorsOfAttackFailure/blob/master/src/models/kwta/load_kwta.py
ensemble diversity: https://github.com/pralab/IndicatorsOfAttackFailure/blob/master/src/models/ensemble_diversity/load_ensemble.py

They also include the links for downloading the corresponding models.

Attack tracking info

The information stored during the optimization of the attacks is provided as a pickle, containing the list of 10 samples. For each item of the list, the following information is stored:

the input sample (key x)
the original label of the input sample (key y)
the target label of the attack (key y_target)
the predicted label after the attack (key y_adv)
the output scores along the optimization path (key scores_path)
the norms of the gradients along the optimization path (key grad_norms)
the value of the attacker's loss along the optimization path (key attacker_loss)
the predicted labels after K different restarts (one label for each restart), if any restart was used (key restart_labels)
the predicted labels after using the input directly on the target model, if another model was used for creating the attacks (key transfer_labels)

Values of the indicators

Additionally, we provide csv files containing the indicators for each of the input samples, in the same order as the list in the pickled file.

This allows validating the method, as developers can:

reproduce results to ensure the information tracked from the attacks is consistent with the one used by the indicators; and
validate the implemented indicators by passing the tracked information provided, and ensuring the outputs of the indicators are the same as the ones presented here.

Sample data

The following zip contains the sample data described for the three models mentioned above, including results for the PGD attacks, the PGD* attacks, and the APGD attacks.

iof_data.zip

Additional questions? Feel free to open an issue! Feedback is welcome!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Validating indicators

Loading the models

Attack tracking info

Values of the indicators

Sample data

Uh oh!

Clone this wiki locally