-
Notifications
You must be signed in to change notification settings - Fork 4
Validating indicators
We upload here a zip file that contains the following items:
- the instructions for loading the models;
- the information that we tracked during the attacks, for enabling computing the indicators; and
- the values of the indicators obtained from the tracked information.
These are provided at least for the models that are implemented with pure PyTorch and Keras. We will later decide whether to include the results for the model implemented in SecML, as it would require to re-implement the defense with other tools.
The instructions for running the attacks, as well as the hyperparameters we used for the experiments, can be found in our paper.
The instructions are already part of the repository, and can be found at the following URLs:
- distillation model: https://github.com/pralab/IndicatorsOfAttackFailure/blob/master/src/models/distillation/load_distillation.py
- kwta model: https://github.com/pralab/IndicatorsOfAttackFailure/blob/master/src/models/kwta/load_kwta.py
- ensemble diversity: https://github.com/pralab/IndicatorsOfAttackFailure/blob/master/src/models/ensemble_diversity/load_ensemble.py
They also include the links for downloading the corresponding models.
The information stored during the optimization of the attacks is provided as a pickle, containing the list of 10 samples. For each item of the list, the following information is stored:
- the input sample (key
x) - the original label of the input sample (key
y) - the target label of the attack (key
y_target) - the predicted label after the attack (key
y_adv) - the output scores along the optimization path (key
scores_path) - the norms of the gradients along the optimization path (key
grad_norms) - the value of the attacker's loss along the optimization path (key
attacker_loss) - the predicted labels after K different restarts (one label for each restart), if any restart was used (key
restart_labels) - the predicted labels after using the input directly on the target model, if another model was used for creating the attacks (key
transfer_labels)
Additionally, we provide csv files containing the indicators for each of the input samples, in the same order as the list in the pickled file.
This allows validating the method, as developers can:
- reproduce results to ensure the information tracked from the attacks is consistent with the one used by the indicators; and
- validate the implemented indicators by passing the tracked information provided, and ensuring the outputs of the indicators are the same as the ones presented here.
The following zip contains the sample data described for the three models mentioned above, including results for the PGD attacks, the PGD* attacks, and the APGD attacks.
Additional questions? Feel free to open an issue! Feedback is welcome!