This repository contains the source code of our work the paper titled "AutoML: Exploration v.s. Exploitation". The research publication is pending by Hassan Eldeeb, Abdelrhman Eldallal.
In this repository, you will find the scripts used to generate, parse and plot the figures of the experiment. In addition to the figures that is not included in the paper.
Building a machine learning (ML) pipeline in an automated wayis a crucial and complex task as it is constrained with the avail-able time budget and resources. This encouraged the researchcommunity to introduce several solutions to utilize the availabletime and resources. A lot of work is done to suggest the mostpromising classifiers for a given dataset using sundry of tech-niques including meta-learning based techniques. This gives theautoML framework the chance to spend more time exploitingthose classifiers and tuning their hyper-parameters. In this paper,we empirically study the hypothesis of improving the pipelineperformance by exploiting the most promising classifiers withinthe limited time budget. We also study the effect of increasingthe time budget over the pipeline performance. The empiricalresults acrossautoSKLearn,TPOTandATM, show that exploitingthe most promising classifiers does not achieve a statisticallybetter performance than exploring the entire search space. Thesame conclusion is also applied for long time budgets.
| Dataset ID | Dataset URL | # features | # instances | # classes |
|---|---|---|---|---|
| audiology | https://www.openml.org/d/999 | 70 | 226 | 2 |
| arrhythmia | https://www.openml.org/d/999 | 280 | 452 | 2 |
| AP_Breast_Lung | https://www.openml.org/d/1150 | 10937 | 470 | 2 |
| openml_phpJNxH0q | https://www.openml.org/d/15 | 10 | 699 | 2 |
| Vowel | https://www.openml.org/d/1016 | 14 | 990 | 2 |
| dataset_31_credit-g | https://www.openml.org/d/31 | 21 | 1000 | 2 |
| gina_agnostic | https://www.openml.org/d/1038 | 971 | 3468 | 2 |
| hiva_agnostic | https://www.openml.org/d/1039 | 1618 | 4229 | 2 |
| phpZrCzJR | https://www.openml.org/d/1039 | 37 | 5100 | 2 |
| MagicTelescope | https://www.openml.org/d/1120 | 12 | 19020 | 2 |
| electricity-normalized | https://www.openml.org/d/151 | 9 | 45312 | 2 |
| AirlinesCodrnaAdult | https://www.openml.org/d/1240 | 30 | 1076790 | 2 |
| eye_movements | https://www.openml.org/d/1044 | 28 | 10936 | 3 |
| connect-4 | https://www.openml.org/d/40668 | 43 | 67557 | 3 |
| solar-flare_1 | https://www.openml.org/d/40686 | 13 | 315 | 5 |
| wine-quality-red | https://www.openml.org/d/40691 | 12 | 1599 | 6 |
| pokerhand-normalized | https://www.openml.org/d/155 | 11 | 829201 | 10 |
| umistfacescropped | https://www.openml.org/d/41084 | 10305 | 575 | 20 |
| KDDCup99 | https://www.openml.org/d/1113 | 42 | 494020 | 23 |
| Amazon | https://www.openml.org/d/1150 | 10001 | 1500 | 50 |
#importing packages
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import itertools
from IPython.display import Image
import copy
import warnings
warnings.simplefilter("ignore")
from IPython.display import display, HTML
class Figure:
def __init__(self):
self.d20 = ["AirlinesCodrnaAdult",
"Amazon",
"AP_Breast_Lung",
"arrhythmia",
"audiology",
"connect-4",
"dataset_31_credit-g",
"electricity-normalized",
"eye_movements",
"gina_agnostic",
"hiva_agnostic",
"KDDCup99",
"MagicTelescope",
"openml_phpJNxH0q",
"phpZrCzJR",
"pokerhand-normalized",
"solar-flare_1",
"umistfacescropped",
"vowel",
"wine-quality-red"]
self.d10 = ["dataset_39_ecoli",
"synthetic_control",
"avila-tr",
"phpGUrE90",
"dataset_60_waveform-5000",
"dataset_186_satimage",
"dataset_40_sonar",
"phpmPOD5A",
"AP_Omentum_Ovary",
"phprAeXmK"]
skout=r"C:\Users\HassanEldeeb\Documents\GitHub\AutoMLBenchmarking\logs_search_space/skout.xlsx"
self.skout = pd.read_excel(skout,
null_values=['', 'NA', 'NAN', 'NaN', 'Nan', 'NA\n', 'None'])
self.skout = self.skout[['dataset', 'time_budget', 'methods', 'f1score']]
self.skout.methods = self.skout.methods.replace("['adaboost', 'bernoulli_nb', 'decision_tree', 'extra_trees', 'gaussian_nb', 'gradient_boosting', 'k_nearest_neighbors', 'lda', 'liblinear_svc', 'libsvm_svc', 'multinomial_nb', 'passive_aggressive', 'qda', 'random_forest', 'sgd']", "fc")
self.skout.methods = self.skout.methods.replace("['decision_tree', 'libsvm_svc', 'random_forest']", "3c")
self.skout.methods = self.skout.methods.replace("['libsvm_svc']", "1c")
self.skout.methods = self.skout.methods.replace("['decision_tree']", "1c")
self.skout.methods = self.skout.methods.replace("['random_forest']", "1c")
self.skout = self.clean(self.skout, is_30=True)
atm=r"C:\Users\HassanEldeeb\Documents\GitHub\AutoMLBenchmarking\logs_search_space/atmout.xlsx"
self.atm = pd.read_excel(atm,
null_values=['', 'NA', 'NAN', 'NaN', 'Nan', 'NA\n', 'None'])
self.atm = self.atm[['dataset', 'time_budget', 'methods', 'f1score']]
self.atm.methods = self.atm.methods.replace("['logreg', 'svm', 'sgd', 'dt', 'et', 'rf', 'gnb', 'mnb', 'bnb', 'gp', 'pa', 'knn', 'mlp', 'ada']", "fc")
self.atm.methods = self.atm.methods.replace("['rf', 'dt', 'svm']", "3c")
self.atm.methods = self.atm.methods.replace("['logreg', 'dt', 'knn']", "def")
self.atm.methods = self.atm.methods.replace("['svm']", "1c")
self.atm.methods = self.atm.methods.replace("['dt']", "1c")
self.atm.methods = self.atm.methods.replace("['rf']", "1c")
self.atm = self.clean(self.atm)
tpot=r"C:\Users\HassanEldeeb\Documents\GitHub\AutoMLBenchmarking\logs_search_space/tpot.xlsx"
self.tpot = pd.read_excel(tpot,
null_values=['', 'NA', 'NAN', 'NaN', 'Nan', 'NA\n', 'None'])
self.tpot = self.tpot[['dataset', 'time_budget', 'methods', 'f1score']]
self.tpot.methods = self.tpot.methods.replace("default", "fc")
self.tpot.methods = self.tpot.methods.replace("3C", "3c")
self.tpot.methods = self.tpot.methods.replace("SVC", "1c")
self.tpot.methods = self.tpot.methods.replace("DT", "1c")
self.tpot.methods = self.tpot.methods.replace("RF", "1c")
self.tpot = self.clean(self.tpot)
def clean(self, df, is_30 = False):
df = df[df.f1score != 0]
# if is_30:
# datasets = self.d20 + self.d10
# else:
datasets = self.d20
df = df[df.dataset.isin(datasets)]
for d in datasets:
for t in [10, 30, 60]:
for ss in ['fc', '3c', '1c']:
if df[(df.methods == ss) & (df.time_budget == t) & (df.dataset == d)].shape[0] == 0:
new_row = {'dataset': d, 'time_budget': t, 'methods':ss, 'f1score':0}
df = df.append(new_row, ignore_index=True)
return df
def get_sheet(self):
return self.sheet
def check(self):
return pd.DataFrame({ 'diff': self.diff, 'va11': self.var1, 'var2': self.var2 })
def compare_acc_scatter(self, var1, var2, x_label, y_label, title,
legend_missing='Failed Run', legend_negative='-ve Dif',
legend_zero='Zero Dif', legend_positive='+ve Dif',
fig_size = (8, 8), legend_ncols=4, legend_x_shift=0, y_limit=None,
x_axis_grid=False, divide_data=False):
font_size=16
self.var1, self.var2 = var1.reset_index(drop=True).fillna(0), var2.reset_index(drop=True).fillna(0)
argsort = (self.var1 - self.var2).argsort()
self.var1 = self.var1[argsort].reset_index(drop=True)
self.var2 = self.var2[argsort].reset_index(drop=True)
self.diff = (self.var1 - self.var2).reset_index(drop=True)
y_limit = [1.1* min(self.diff), 1.1 * max(self.diff)]
if y_limit[0] == 0:
y_limit[0] = -0.1 * y_limit[1]
fig, ax = plt.subplots(figsize=fig_size)
dot_size=100
yy = self.diff[(self.diff <= -0.01) & (self.var1 != 0.0) & (self.var2 != 0)]
print('Average loss = {} from {} datasets'.format(round(100 * yy.mean(), 1), yy.size))
ax.scatter(x=yy.index,
y=yy,
color= 'red',
marker ='v',
label = 'Negative',
s=dot_size)
yy = self.diff[(self.diff > -0.01) & (self.diff < 0.01) & (self.var1 != 0.0) & (self.var2 != 0)]
print(' {} datasets have the same performance'.format(yy.size))
ax.scatter(x=yy.index,
y=yy,
color= 'blue',
marker ='.',
label = 'Same',
s=dot_size*3)
yy = self.diff[(self.diff >= 0.01) & (self.var1 != 0.0) & (self.var2 != 0)]
print('Average gain = {} from {} datasets'.format(round(100 * yy.mean(), 1), yy.size))
ax.scatter(x=yy.index,
y=yy,
color= 'green',
marker ='^',
label = 'Positive',
s=dot_size)
yy = self.diff[(self.var1 == 0.0) | (self.var2 == 0)]
ax.scatter(x=yy.index,
y=yy,
color= 'darkorange',
marker ='x',
label = 'Failed',
s=dot_size)
l = ax.legend( ncol=legend_ncols, bbox_to_anchor=(legend_x_shift, 1), loc='lower left', fontsize=font_size)
plt.xlabel(x_label, fontsize=font_size*1.2)
plt.ylabel(y_label, fontsize=font_size*1.2)
ax.yaxis.grid() # horizontal lines
if x_axis_grid:
ax.xaxis.grid()
plt.ylim(y_limit)
plt.xlim([-1,1+self.var1.shape[0]])
plt.xticks(np.arange(0, 1+self.var1.shape[0], 10), fontsize=font_size)
plt.yticks(fontsize=font_size)
plt.title(label = title, pad = 40, fontsize=font_size)
plt.tight_layout()
plt.savefig('./search_space_figs/' + title.replace(' ', '_') + '.pdf', format='pdf')
plt.show()
returnfig = Figure()
var1 = fig.skout[(fig.skout.methods == 'fc') & (fig.skout.time_budget==10)].groupby(['dataset', 'time_budget', 'methods']).mean().f1score
var2 = fig.skout[(fig.skout.methods == '1c') & (fig.skout.time_budget==10)].groupby(['dataset', 'time_budget', 'methods']).mean().f1score
fig.compare_acc_scatter(var1, var2, 'Data set', 'F1 Score difference FC-3C', 'F1 Difference between fc and 3c for 10 minutes - AutoSKLearn', fig_size = (9, 4))Average loss = -2.2 from 3 datasets
1 datasets have the same performance
Average gain = 8.5 from 16 datasets
fig = Figure()
var1 = fig.skout[(fig.skout.methods == '3c') & (fig.skout.time_budget==10)].groupby(['dataset', 'time_budget', 'methods']).mean().f1score
var2 = fig.skout[(fig.skout.methods == '1c') & (fig.skout.time_budget==10)].groupby(['dataset', 'time_budget', 'methods']).mean().f1score
fig.compare_acc_scatter(var1, var2, 'Data set', 'F1 Score difference FC-3C', 'F1 Difference between fc and 3c for 10 minutes - AutoSKLearn', fig_size = (9, 4))Average loss = -1.7 from 2 datasets
5 datasets have the same performance
Average gain = 9.5 from 13 datasets
fig = Figure()
var1 = fig.skout[(fig.skout.methods == 'fc') & (fig.skout.time_budget==10)].groupby(['dataset', 'time_budget', 'methods']).mean().f1score
var2 = fig.skout[(fig.skout.methods == '3c') & (fig.skout.time_budget==10)].groupby(['dataset', 'time_budget', 'methods']).mean().f1score
fig.compare_acc_scatter(var1, var2, 'Data set', 'F1 Score difference FC-3C', 'F1 Difference between fc and 3c for 10 minutes - AutoSKLearn', fig_size = (9, 4))Average loss = -2.9 from 5 datasets
8 datasets have the same performance
Average gain = 2.9 from 7 datasets
var1 = fig.skout[(fig.skout.methods == 'fc') & (fig.skout.time_budget==30)].groupby(['dataset', 'time_budget', 'methods']).mean().f1score
var2 = fig.skout[(fig.skout.methods == '1c') & (fig.skout.time_budget==30)].groupby(['dataset', 'time_budget', 'methods']).mean().f1score
fig.compare_acc_scatter(var1, var2, 'Data set', 'F1 Score difference FC-3C', 'F1 Difference between fc and 3c for 30 minutes - AutoSKLearn', fig_size = (9, 4))
df = pd.DataFrame({'dataset': fig.skout[(fig.skout.methods == 'fc') & (fig.skout.time_budget==30)].groupby(['dataset', 'time_budget', 'methods']).mean().index.get_level_values(0).values, 'var1': var1.values, 'var2': var2.values, 'diff': var1.values - var2.values})Average loss = -4.8 from 1 datasets
5 datasets have the same performance
Average gain = 6.1 from 14 datasets
var1 = fig.skout[(fig.skout.methods == '3c') & (fig.skout.time_budget==30)].groupby(['dataset', 'time_budget', 'methods']).mean().f1score
var2 = fig.skout[(fig.skout.methods == '1c') & (fig.skout.time_budget==30)].groupby(['dataset', 'time_budget', 'methods']).mean().f1score
fig.compare_acc_scatter(var1, var2, 'Data set', 'F1 Score difference FC-3C', 'F1 Difference between fc and 3c for 30 minutes - AutoSKLearn', fig_size = (9, 4))
df = pd.DataFrame({'dataset': fig.skout[(fig.skout.methods == 'fc') & (fig.skout.time_budget==30)].groupby(['dataset', 'time_budget', 'methods']).mean().index.get_level_values(0).values, 'var1': var1.values, 'var2': var2.values, 'diff': var1.values - var2.values})Average loss = -2.8 from 3 datasets
4 datasets have the same performance
Average gain = 5.3 from 13 datasets
var1 = fig.skout[(fig.skout.methods == 'fc') & (fig.skout.time_budget==30)].groupby(['dataset', 'time_budget', 'methods']).mean().f1score
var2 = fig.skout[(fig.skout.methods == '3c') & (fig.skout.time_budget==30)].groupby(['dataset', 'time_budget', 'methods']).mean().f1score
fig.compare_acc_scatter(var1, var2, 'Data set', 'F1 Score difference FC-3C', 'F1 Difference between fc and 3c for 30 minutes - AutoSKLearn', fig_size = (9, 4))
df = pd.DataFrame({'dataset': fig.skout[(fig.skout.methods == 'fc') & (fig.skout.time_budget==30)].groupby(['dataset', 'time_budget', 'methods']).mean().index.get_level_values(0).values, 'var1': var1.values, 'var2': var2.values, 'diff': var1.values - var2.values})Average loss = -1.6 from 1 datasets
11 datasets have the same performance
Average gain = 2.7 from 8 datasets
var1 = fig.skout[(fig.skout.methods == 'fc') & (fig.skout.time_budget==60)].groupby(['dataset', 'time_budget', 'methods']).mean().f1score
var2 = fig.skout[(fig.skout.methods == '1c') & (fig.skout.time_budget==60)].groupby(['dataset', 'time_budget', 'methods']).mean().f1score
fig.compare_acc_scatter(var1, var2, 'Data set', 'F1 Score difference FC-3C', 'F1 Difference between fc and 3c for 60 minutes - AutoSKLearn', fig_size = (9, 4))Average loss = -3.1 from 2 datasets
2 datasets have the same performance
Average gain = 6.2 from 16 datasets
var1 = fig.skout[(fig.skout.methods == '3c') & (fig.skout.time_budget==60)].groupby(['dataset', 'time_budget', 'methods']).mean().f1score
var2 = fig.skout[(fig.skout.methods == '1c') & (fig.skout.time_budget==60)].groupby(['dataset', 'time_budget', 'methods']).mean().f1score
fig.compare_acc_scatter(var1, var2, 'Data set', 'F1 Score difference FC-3C', 'F1 Difference between fc and 3c for 60 minutes - AutoSKLearn', fig_size = (9, 4))Average loss = -3.0 from 3 datasets
4 datasets have the same performance
Average gain = 6.4 from 13 datasets
var1 = fig.skout[(fig.skout.methods == 'fc') & (fig.skout.time_budget==60)].groupby(['dataset', 'time_budget', 'methods']).mean().f1score
var2 = fig.skout[(fig.skout.methods == '3c') & (fig.skout.time_budget==60)].groupby(['dataset', 'time_budget', 'methods']).mean().f1score
fig.compare_acc_scatter(var1, var2, 'Data set', 'F1 Score difference FC-3C', 'F1 Difference between fc and 3c for 60 minutes - AutoSKLearn', fig_size = (9, 4))Average loss = -1.8 from 5 datasets
8 datasets have the same performance
Average gain = 4.1 from 7 datasets
var1 = fig.tpot[(fig.tpot.methods == 'fc') & (fig.tpot.time_budget==10)].groupby(['dataset', 'time_budget', 'methods']).mean().f1score
var2 = fig.tpot[(fig.tpot.methods == '1c') & (fig.tpot.time_budget==10)].groupby(['dataset', 'time_budget', 'methods']).mean().f1score
fig.compare_acc_scatter(var1, var2, 'Data set', 'F1 Score difference FC-3C', 'F1 Difference between fc and 3c for 10 minutes - TPOT', fig_size = (9, 4))Average loss = -7.0 from 1 datasets
1 datasets have the same performance
Average gain = 13.3 from 12 datasets
var1 = fig.tpot[(fig.tpot.methods == '3c') & (fig.tpot.time_budget==10)].groupby(['dataset', 'time_budget', 'methods']).mean().f1score
var2 = fig.tpot[(fig.tpot.methods == '1c') & (fig.tpot.time_budget==10)].groupby(['dataset', 'time_budget', 'methods']).mean().f1score
fig.compare_acc_scatter(var1, var2, 'Data set', 'F1 Score difference FC-3C', 'F1 Difference between fc and 3c for 10 minutes - TPOT', fig_size = (9, 4))Average loss = nan from 0 datasets
5 datasets have the same performance
Average gain = 14.2 from 9 datasets
var1 = fig.tpot[(fig.tpot.methods == 'fc') & (fig.tpot.time_budget==10)].groupby(['dataset', 'time_budget', 'methods']).mean().f1score
var2 = fig.tpot[(fig.tpot.methods == '3c') & (fig.tpot.time_budget==10)].groupby(['dataset', 'time_budget', 'methods']).mean().f1score
fig.compare_acc_scatter(var1, var2, 'Data set', 'F1 Score difference FC-3C', 'F1 Difference between fc and 3c for 10 minutes - TPOT', fig_size = (9, 4))Average loss = -7.1 from 3 datasets
1 datasets have the same performance
Average gain = 4.6 from 9 datasets
var1 = fig.tpot[(fig.tpot.methods == 'fc') & (fig.tpot.time_budget==30)].groupby(['dataset', 'time_budget', 'methods']).mean().f1score
var2 = fig.tpot[(fig.tpot.methods == '1c') & (fig.tpot.time_budget==30)].groupby(['dataset', 'time_budget', 'methods']).mean().f1score
fig.compare_acc_scatter(var1, var2, 'Data set', 'F1 Score difference FC-3C', 'F1 Difference between fc and 3c for 30 minutes - TPOT', fig_size = (9, 4))Average loss = -38.5 from 1 datasets
2 datasets have the same performance
Average gain = 15.1 from 14 datasets
var1 = fig.tpot[(fig.tpot.methods == '3c') & (fig.tpot.time_budget==30)].groupby(['dataset', 'time_budget', 'methods']).mean().f1score
var2 = fig.tpot[(fig.tpot.methods == '1c') & (fig.tpot.time_budget==30)].groupby(['dataset', 'time_budget', 'methods']).mean().f1score
fig.compare_acc_scatter(var1, var2, 'Data set', 'F1 Score difference FC-3C', 'F1 Difference between fc and 3c for 30 minutes - TPOT', fig_size = (9, 4))Average loss = -2.1 from 1 datasets
2 datasets have the same performance
Average gain = 16.1 from 10 datasets
var1 = fig.tpot[(fig.tpot.methods == 'fc') & (fig.tpot.time_budget==30)].groupby(['dataset', 'time_budget', 'methods']).mean().f1score
var2 = fig.tpot[(fig.tpot.methods == '3c') & (fig.tpot.time_budget==30)].groupby(['dataset', 'time_budget', 'methods']).mean().f1score
fig.compare_acc_scatter(var1, var2, 'Data set', 'F1 Score difference FC-3C', 'F1 Difference between fc and 3c for 30 minutes - TPOT', fig_size = (9, 4))Average loss = -3.4 from 3 datasets
2 datasets have the same performance
Average gain = 6.4 from 8 datasets
var1 = fig.tpot[(fig.tpot.methods == 'fc') & (fig.tpot.time_budget==60)].groupby(['dataset', 'time_budget', 'methods']).mean().f1score
var2 = fig.tpot[(fig.tpot.methods == '1c') & (fig.tpot.time_budget==60)].groupby(['dataset', 'time_budget', 'methods']).mean().f1score
fig.compare_acc_scatter(var1, var2, 'Data set', 'F1 Score difference FC-3C', 'F1 Difference between fc and 3c for 60 minutes - TPOT', fig_size = (9, 4))Average loss = -29.1 from 1 datasets
2 datasets have the same performance
Average gain = 16.6 from 15 datasets
var1 = fig.tpot[(fig.tpot.methods == '3c') & (fig.tpot.time_budget==60)].groupby(['dataset', 'time_budget', 'methods']).mean().f1score
var2 = fig.tpot[(fig.tpot.methods == '1c') & (fig.tpot.time_budget==60)].groupby(['dataset', 'time_budget', 'methods']).mean().f1score
fig.compare_acc_scatter(var1, var2, 'Data set', 'F1 Score difference FC-3C', 'F1 Difference between fc and 3c for 60 minutes - TPOT', fig_size = (9, 4))Average loss = nan from 0 datasets
3 datasets have the same performance
Average gain = 15.9 from 12 datasets
var1 = fig.tpot[(fig.tpot.methods == 'fc') & (fig.tpot.time_budget==60)].groupby(['dataset', 'time_budget', 'methods']).mean().f1score
var2 = fig.tpot[(fig.tpot.methods == '3c') & (fig.tpot.time_budget==60)].groupby(['dataset', 'time_budget', 'methods']).mean().f1score
fig.compare_acc_scatter(var1, var2, 'Data set', 'F1 Score difference FC-3C', 'F1 Difference between fc and 3c for 60 minutes - TPOT', fig_size = (9, 4))Average loss = -4.0 from 2 datasets
3 datasets have the same performance
Average gain = 6.4 from 10 datasets
var1 = fig.atm[(fig.atm.methods == 'fc') & (fig.atm.time_budget==10)].groupby(['dataset', 'time_budget', 'methods']).mean().f1score
var2 = fig.atm[(fig.atm.methods == '1c') & (fig.atm.time_budget==10)].groupby(['dataset', 'time_budget', 'methods']).mean().f1score
fig.compare_acc_scatter(var1, var2, 'Data set', 'F1 Score difference FC-3C', 'F1 Difference between fc and 3c for 10 minutes - ATM', fig_size = (9, 4))Average loss = -14.1 from 3 datasets
3 datasets have the same performance
Average gain = 7.3 from 8 datasets
var1 = fig.atm[(fig.atm.methods == '3c') & (fig.atm.time_budget==10)].groupby(['dataset', 'time_budget', 'methods']).mean().f1score
var2 = fig.atm[(fig.atm.methods == '1c') & (fig.atm.time_budget==10)].groupby(['dataset', 'time_budget', 'methods']).mean().f1score
fig.compare_acc_scatter(var1, var2, 'Data set', 'F1 Score difference FC-3C', 'F1 Difference between fc and 3c for 10 minutes - ATM', fig_size = (9, 4))Average loss = -3.0 from 2 datasets
3 datasets have the same performance
Average gain = 5.0 from 3 datasets
var1 = fig.atm[(fig.atm.methods == 'fc') & (fig.atm.time_budget==10)].groupby(['dataset', 'time_budget', 'methods']).mean().f1score
var2 = fig.atm[(fig.atm.methods == '3c') & (fig.atm.time_budget==10)].groupby(['dataset', 'time_budget', 'methods']).mean().f1score
fig.compare_acc_scatter(var1, var2, 'Data set', 'F1 Score difference FC-3C', 'F1 Difference between fc and 3c for 10 minutes - ATM', fig_size = (9, 4))Average loss = nan from 0 datasets
2 datasets have the same performance
Average gain = 6.1 from 6 datasets
var1 = fig.atm[(fig.atm.methods == 'fc') & (fig.atm.time_budget==30)].groupby(['dataset', 'time_budget', 'methods']).mean().f1score
var2 = fig.atm[(fig.atm.methods == '1c') & (fig.atm.time_budget==30)].groupby(['dataset', 'time_budget', 'methods']).mean().f1score
fig.compare_acc_scatter(var1, var2, 'Data set', 'F1 Score difference FC-3C', 'F1 Difference between fc and 3c for 30 minutes - ATM', fig_size = (9, 4))Average loss = -10.9 from 3 datasets
3 datasets have the same performance
Average gain = 3.5 from 8 datasets
var1 = fig.atm[(fig.atm.methods == '3c') & (fig.atm.time_budget==30)].groupby(['dataset', 'time_budget', 'methods']).mean().f1score
var2 = fig.atm[(fig.atm.methods == '1c') & (fig.atm.time_budget==30)].groupby(['dataset', 'time_budget', 'methods']).mean().f1score
fig.compare_acc_scatter(var1, var2, 'Data set', 'F1 Score difference FC-3C', 'F1 Difference between fc and 3c for 30 minutes - ATM', fig_size = (9, 4))Average loss = -4.5 from 3 datasets
4 datasets have the same performance
Average gain = 3.9 from 6 datasets
var1 = fig.atm[(fig.atm.methods == 'fc') & (fig.atm.time_budget==30)].groupby(['dataset', 'time_budget', 'methods']).mean().f1score
var2 = fig.atm[(fig.atm.methods == '3c') & (fig.atm.time_budget==30)].groupby(['dataset', 'time_budget', 'methods']).mean().f1score
fig.compare_acc_scatter(var1, var2, 'Data set', 'F1 Score difference FC-3C', 'F1 Difference between fc and 3c for 30 minutes - ATM', fig_size = (9, 4))Average loss = -6.8 from 4 datasets
6 datasets have the same performance
Average gain = 2.4 from 3 datasets
var1 = fig.atm[(fig.atm.methods == 'fc') & (fig.atm.time_budget==60)].groupby(['dataset', 'time_budget', 'methods']).mean().f1score
var2 = fig.atm[(fig.atm.methods == '1c') & (fig.atm.time_budget==60)].groupby(['dataset', 'time_budget', 'methods']).mean().f1score
fig.compare_acc_scatter(var1, var2, 'Data set', 'F1 Score difference FC-3C', 'F1 Difference between fc and 3c for 60 minutes - ATM', fig_size = (9, 4))Average loss = -11.5 from 2 datasets
5 datasets have the same performance
Average gain = 4.1 from 9 datasets
var1 = fig.atm[(fig.atm.methods == '3c') & (fig.atm.time_budget==60)].groupby(['dataset', 'time_budget', 'methods']).mean().f1score
var2 = fig.atm[(fig.atm.methods == '1c') & (fig.atm.time_budget==60)].groupby(['dataset', 'time_budget', 'methods']).mean().f1score
fig.compare_acc_scatter(var1, var2, 'Data set', 'F1 Score difference FC-3C', 'F1 Difference between fc and 3c for 60 minutes - ATM', fig_size = (9, 4))Average loss = -14.7 from 2 datasets
3 datasets have the same performance
Average gain = 4.0 from 9 datasets
var1 = fig.atm[(fig.atm.methods == 'fc') & (fig.atm.time_budget==60)].groupby(['dataset', 'time_budget', 'methods']).mean().f1score
var2 = fig.atm[(fig.atm.methods == '3c') & (fig.atm.time_budget==60)].groupby(['dataset', 'time_budget', 'methods']).mean().f1score
fig.compare_acc_scatter(var1, var2, 'Data set', 'F1 Score difference FC-3C', 'F1 Difference between fc and 3c for 60 minutes - ATM', fig_size = (9, 4))Average loss = -6.0 from 2 datasets
8 datasets have the same performance
Average gain = 14.5 from 2 datasets
var1 = fig.skout[(fig.skout.methods == 'fc') & (fig.skout.time_budget==30)].groupby(['dataset', 'time_budget', 'methods']).mean().f1score
var2 = fig.skout[(fig.skout.methods == 'fc') & (fig.skout.time_budget==10)].groupby(['dataset', 'time_budget', 'methods']).mean().f1score
fig.compare_acc_scatter(var1, var2, 'Data set', 'F1 Score difference 30-10', 'F1 Difference for FC Search Space (30-10) - AutoSKLearn', fig_size = (9, 4))Average loss = -2.7 from 3 datasets
9 datasets have the same performance
Average gain = 3.6 from 8 datasets
var1 = fig.skout[(fig.skout.methods == '3c') & (fig.skout.time_budget==30)].groupby(['dataset', 'time_budget', 'methods']).mean().f1score
var2 = fig.skout[(fig.skout.methods == '3c') & (fig.skout.time_budget==10)].groupby(['dataset', 'time_budget', 'methods']).mean().f1score
fig.compare_acc_scatter(var1, var2, 'Data set', 'F1 Score difference 30-10', 'F1 Difference for FC Search Space (30-10) - AutoSKLearn', fig_size = (9, 4))Average loss = -4.2 from 3 datasets
12 datasets have the same performance
Average gain = 4.3 from 5 datasets
var1 = fig.skout[(fig.skout.methods == '1c') & (fig.skout.time_budget==30)].groupby(['dataset', 'time_budget', 'methods']).mean().f1score
var2 = fig.skout[(fig.skout.methods == '1c') & (fig.skout.time_budget==10)].groupby(['dataset', 'time_budget', 'methods']).mean().f1score
fig.compare_acc_scatter(var1, var2, 'Data set', 'F1 Score difference 30-10', 'F1 Difference for FC Search Space (30-10) - AutoSKLearn', fig_size = (9, 4))Average loss = -1.5 from 2 datasets
11 datasets have the same performance
Average gain = 10.1 from 7 datasets
var1 = fig.skout[(fig.skout.methods == 'fc') & (fig.skout.time_budget==60)].groupby(['dataset', 'time_budget', 'methods']).mean().f1score
var2 = fig.skout[(fig.skout.methods == 'fc') & (fig.skout.time_budget==10)].groupby(['dataset', 'time_budget', 'methods']).mean().f1score
fig.compare_acc_scatter(var1, var2, 'Data set', 'F1 Score difference 60-10', 'F1 Difference for FC Search Space (60-10) - AutoSKLearn', fig_size = (9, 4))Average loss = -3.6 from 4 datasets
9 datasets have the same performance
Average gain = 4.4 from 7 datasets
var1 = fig.skout[(fig.skout.methods == '3c') & (fig.skout.time_budget==60)].groupby(['dataset', 'time_budget', 'methods']).mean().f1score
var2 = fig.skout[(fig.skout.methods == '3c') & (fig.skout.time_budget==10)].groupby(['dataset', 'time_budget', 'methods']).mean().f1score
fig.compare_acc_scatter(var1, var2, 'Data set', 'F1 Score difference 60-10', 'F1 Difference for FC Search Space (60-10) - AutoSKLearn', fig_size = (9, 4))Average loss = -4.2 from 6 datasets
7 datasets have the same performance
Average gain = 4.4 from 7 datasets
var1 = fig.skout[(fig.skout.methods == '1c') & (fig.skout.time_budget==60)].groupby(['dataset', 'time_budget', 'methods']).mean().f1score
var2 = fig.skout[(fig.skout.methods == '1c') & (fig.skout.time_budget==10)].groupby(['dataset', 'time_budget', 'methods']).mean().f1score
fig.compare_acc_scatter(var1, var2, 'Data set', 'F1 Score difference 60-10', 'F1 Difference for FC Search Space (60-10) - AutoSKLearn', fig_size = (9, 4))Average loss = -3.0 from 4 datasets
8 datasets have the same performance
Average gain = 8.0 from 8 datasets
var1 = fig.skout[(fig.skout.methods == 'fc') & (fig.skout.time_budget==60)].groupby(['dataset', 'time_budget', 'methods']).mean().f1score
var2 = fig.skout[(fig.skout.methods == 'fc') & (fig.skout.time_budget==30)].groupby(['dataset', 'time_budget', 'methods']).mean().f1score
fig.compare_acc_scatter(var1, var2, 'Data set', 'F1 Score difference 60-30', 'F1 Difference for FC Search Space (60-30) - AutoSKLearn', fig_size = (9, 4))Average loss = -3.3 from 6 datasets
9 datasets have the same performance
Average gain = 3.3 from 5 datasets
var1 = fig.skout[(fig.skout.methods == '3c') & (fig.skout.time_budget==60)].groupby(['dataset', 'time_budget', 'methods']).mean().f1score
var2 = fig.skout[(fig.skout.methods == '3c') & (fig.skout.time_budget==30)].groupby(['dataset', 'time_budget', 'methods']).mean().f1score
fig.compare_acc_scatter(var1, var2, 'Data set', 'F1 Score difference 60-30', 'F1 Difference for FC Search Space (60-30) - AutoSKLearn', fig_size = (9, 4))Average loss = -4.9 from 4 datasets
10 datasets have the same performance
Average gain = 3.0 from 6 datasets
var1 = fig.skout[(fig.skout.methods == '1c') & (fig.skout.time_budget==60)].groupby(['dataset', 'time_budget', 'methods']).mean().f1score
var2 = fig.skout[(fig.skout.methods == '1c') & (fig.skout.time_budget==30)].groupby(['dataset', 'time_budget', 'methods']).mean().f1score
fig.compare_acc_scatter(var1, var2, 'Data set', 'F1 Score difference 60-30', 'F1 Difference for FC Search Space (60-30) - AutoSKLearn', fig_size = (9, 4))Average loss = -4.3 from 5 datasets
13 datasets have the same performance
Average gain = 1.6 from 2 datasets
var1 = fig.tpot[(fig.tpot.methods == 'fc') & (fig.tpot.time_budget==30)].groupby(['dataset', 'time_budget', 'methods']).mean().f1score
var2 = fig.tpot[(fig.tpot.methods == 'fc') & (fig.tpot.time_budget==10)].groupby(['dataset', 'time_budget', 'methods']).mean().f1score
fig.compare_acc_scatter(var1, var2, 'Data set', 'F1 Score difference 30-10', 'F1 Difference for FC Search Space (30-10) - TPOT', fig_size = (9, 4))Average loss = -5.8 from 2 datasets
7 datasets have the same performance
Average gain = 3.9 from 5 datasets
var1 = fig.tpot[(fig.tpot.methods == '3c') & (fig.tpot.time_budget==30)].groupby(['dataset', 'time_budget', 'methods']).mean().f1score
var2 = fig.tpot[(fig.tpot.methods == '3c') & (fig.tpot.time_budget==10)].groupby(['dataset', 'time_budget', 'methods']).mean().f1score
fig.compare_acc_scatter(var1, var2, 'Data set', 'F1 Score difference 30-10', 'F1 Difference for FC Search Space (30-10) - TPOT', fig_size = (9, 4))Average loss = -1.8 from 1 datasets
7 datasets have the same performance
Average gain = 2.7 from 2 datasets
var1 = fig.tpot[(fig.tpot.methods == '1c') & (fig.tpot.time_budget==30)].groupby(['dataset', 'time_budget', 'methods']).mean().f1score
var2 = fig.tpot[(fig.tpot.methods == '1c') & (fig.tpot.time_budget==10)].groupby(['dataset', 'time_budget', 'methods']).mean().f1score
fig.compare_acc_scatter(var1, var2, 'Data set', 'F1 Score difference 30-10', 'F1 Difference for FC Search Space (30-10) - TPOT', fig_size = (9, 4))Average loss = -11.3 from 3 datasets
13 datasets have the same performance
Average gain = 1.5 from 2 datasets
var1 = fig.tpot[(fig.tpot.methods == 'fc') & (fig.tpot.time_budget==60)].groupby(['dataset', 'time_budget', 'methods']).mean().f1score
var2 = fig.tpot[(fig.tpot.methods == 'fc') & (fig.tpot.time_budget==10)].groupby(['dataset', 'time_budget', 'methods']).mean().f1score
fig.compare_acc_scatter(var1, var2, 'Data set', 'F1 Score difference 60-10', 'F1 Difference for FC Search Space (60-10) - TPOT', fig_size = (9, 4))Average loss = -6.5 from 3 datasets
7 datasets have the same performance
Average gain = 6.2 from 4 datasets
var1 = fig.tpot[(fig.tpot.methods == '3c') & (fig.tpot.time_budget==60)].groupby(['dataset', 'time_budget', 'methods']).mean().f1score
var2 = fig.tpot[(fig.tpot.methods == '3c') & (fig.tpot.time_budget==10)].groupby(['dataset', 'time_budget', 'methods']).mean().f1score
fig.compare_acc_scatter(var1, var2, 'Data set', 'F1 Score difference 60-10', 'F1 Difference for FC Search Space (60-10) - TPOT', fig_size = (9, 4))Average loss = -2.7 from 2 datasets
6 datasets have the same performance
Average gain = 2.5 from 4 datasets
var1 = fig.tpot[(fig.tpot.methods == '1c') & (fig.tpot.time_budget==60)].groupby(['dataset', 'time_budget', 'methods']).mean().f1score
var2 = fig.tpot[(fig.tpot.methods == '1c') & (fig.tpot.time_budget==10)].groupby(['dataset', 'time_budget', 'methods']).mean().f1score
fig.compare_acc_scatter(var1, var2, 'Data set', 'F1 Score difference 60-10', 'F1 Difference for FC Search Space (60-10) - TPOT', fig_size = (9, 4))Average loss = -14.3 from 4 datasets
13 datasets have the same performance
Average gain = 1.2 from 1 datasets
var1 = fig.tpot[(fig.tpot.methods == 'fc') & (fig.tpot.time_budget==60)].groupby(['dataset', 'time_budget', 'methods']).mean().f1score
var2 = fig.tpot[(fig.tpot.methods == 'fc') & (fig.tpot.time_budget==30)].groupby(['dataset', 'time_budget', 'methods']).mean().f1score
fig.compare_acc_scatter(var1, var2, 'Data set', 'F1 Score difference 60-30', 'F1 Difference for FC Search Space (60-30) - TPOT', fig_size = (9, 4))Average loss = -2.6 from 3 datasets
7 datasets have the same performance
Average gain = 3.9 from 7 datasets
var1 = fig.tpot[(fig.tpot.methods == '3c') & (fig.tpot.time_budget==60)].groupby(['dataset', 'time_budget', 'methods']).mean().f1score
var2 = fig.tpot[(fig.tpot.methods == '3c') & (fig.tpot.time_budget==30)].groupby(['dataset', 'time_budget', 'methods']).mean().f1score
fig.compare_acc_scatter(var1, var2, 'Data set', 'F1 Score difference 60-30', 'F1 Difference for FC Search Space (60-30) - TPOT', fig_size = (9, 4))Average loss = -1.4 from 1 datasets
10 datasets have the same performance
Average gain = 1.5 from 2 datasets
var1 = fig.tpot[(fig.tpot.methods == '1c') & (fig.tpot.time_budget==60)].groupby(['dataset', 'time_budget', 'methods']).mean().f1score
var2 = fig.tpot[(fig.tpot.methods == '1c') & (fig.tpot.time_budget==30)].groupby(['dataset', 'time_budget', 'methods']).mean().f1score
fig.compare_acc_scatter(var1, var2, 'Data set', 'F1 Score difference 60-30', 'F1 Difference for FC Search Space (60-30) - TPOT', fig_size = (9, 4))Average loss = -11.5 from 2 datasets
17 datasets have the same performance
Average gain = nan from 0 datasets
var1 = fig.atm[(fig.atm.methods == 'fc') & (fig.atm.time_budget==30)].groupby(['dataset', 'time_budget', 'methods']).mean().f1score
var2 = fig.atm[(fig.atm.methods == 'fc') & (fig.atm.time_budget==10)].groupby(['dataset', 'time_budget', 'methods']).mean().f1score
fig.compare_acc_scatter(var1, var2, 'Data set', 'F1 Score difference 30-10', 'F1 Difference for FC Search Space (30-10) - ATM', fig_size = (9, 4))Average loss = -8.7 from 5 datasets
4 datasets have the same performance
Average gain = 3.9 from 4 datasets
var1 = fig.atm[(fig.atm.methods == '3c') & (fig.atm.time_budget==30)].groupby(['dataset', 'time_budget', 'methods']).mean().f1score
var2 = fig.atm[(fig.atm.methods == '3c') & (fig.atm.time_budget==10)].groupby(['dataset', 'time_budget', 'methods']).mean().f1score
fig.compare_acc_scatter(var1, var2, 'Data set', 'F1 Score difference 30-10', 'F1 Difference for FC Search Space (30-10) - ATM', fig_size = (9, 4))Average loss = -3.9 from 2 datasets
1 datasets have the same performance
Average gain = 4.9 from 5 datasets
var1 = fig.atm[(fig.atm.methods == '1c') & (fig.atm.time_budget==30)].groupby(['dataset', 'time_budget', 'methods']).mean().f1score
var2 = fig.atm[(fig.atm.methods == '1c') & (fig.atm.time_budget==10)].groupby(['dataset', 'time_budget', 'methods']).mean().f1score
fig.compare_acc_scatter(var1, var2, 'Data set', 'F1 Score difference 30-10', 'F1 Difference for FC Search Space (30-10) - ATM', fig_size = (9, 4))Average loss = -5.6 from 4 datasets
8 datasets have the same performance
Average gain = 4.5 from 7 datasets
var1 = fig.atm[(fig.atm.methods == 'fc') & (fig.atm.time_budget==60)].groupby(['dataset', 'time_budget', 'methods']).mean().f1score
var2 = fig.atm[(fig.atm.methods == 'fc') & (fig.atm.time_budget==10)].groupby(['dataset', 'time_budget', 'methods']).mean().f1score
fig.compare_acc_scatter(var1, var2, 'Data set', 'F1 Score difference 60-10', 'F1 Difference for FC Search Space (60-10) - ATM', fig_size = (9, 4))Average loss = -6.1 from 5 datasets
5 datasets have the same performance
Average gain = 13.7 from 3 datasets
var1 = fig.atm[(fig.atm.methods == '3c') & (fig.atm.time_budget==60)].groupby(['dataset', 'time_budget', 'methods']).mean().f1score
var2 = fig.atm[(fig.atm.methods == '3c') & (fig.atm.time_budget==10)].groupby(['dataset', 'time_budget', 'methods']).mean().f1score
fig.compare_acc_scatter(var1, var2, 'Data set', 'F1 Score difference 60-10', 'F1 Difference for FC Search Space (60-10) - ATM', fig_size = (9, 4))Average loss = -28.8 from 1 datasets
2 datasets have the same performance
Average gain = 5.0 from 5 datasets
var1 = fig.atm[(fig.atm.methods == '1c') & (fig.atm.time_budget==60)].groupby(['dataset', 'time_budget', 'methods']).mean().f1score
var2 = fig.atm[(fig.atm.methods == '1c') & (fig.atm.time_budget==10)].groupby(['dataset', 'time_budget', 'methods']).mean().f1score
fig.compare_acc_scatter(var1, var2, 'Data set', 'F1 Score difference 60-10', 'F1 Difference for FC Search Space (60-10) - ATM', fig_size = (9, 4))Average loss = -2.7 from 4 datasets
6 datasets have the same performance
Average gain = 2.6 from 9 datasets
var1 = fig.atm[(fig.atm.methods == 'fc') & (fig.atm.time_budget==60)].groupby(['dataset', 'time_budget', 'methods']).mean().f1score
var2 = fig.atm[(fig.atm.methods == 'fc') & (fig.atm.time_budget==30)].groupby(['dataset', 'time_budget', 'methods']).mean().f1score
fig.compare_acc_scatter(var1, var2, 'Data set', 'F1 Score difference 60-30', 'F1 Difference for FC Search Space (60-30) - ATM', fig_size = (9, 4))Average loss = -4.7 from 1 datasets
7 datasets have the same performance
Average gain = 4.3 from 5 datasets
var1 = fig.atm[(fig.atm.methods == '3c') & (fig.atm.time_budget==60)].groupby(['dataset', 'time_budget', 'methods']).mean().f1score
var2 = fig.atm[(fig.atm.methods == '3c') & (fig.atm.time_budget==30)].groupby(['dataset', 'time_budget', 'methods']).mean().f1score
fig.compare_acc_scatter(var1, var2, 'Data set', 'F1 Score difference 60-30', 'F1 Difference for FC Search Space (60-30) - ATM', fig_size = (9, 4))Average loss = -6.6 from 5 datasets
4 datasets have the same performance
Average gain = 2.4 from 3 datasets
var1 = fig.atm[(fig.atm.methods == '1c') & (fig.atm.time_budget==60)].groupby(['dataset', 'time_budget', 'methods']).mean().f1score
var2 = fig.atm[(fig.atm.methods == '1c') & (fig.atm.time_budget==30)].groupby(['dataset', 'time_budget', 'methods']).mean().f1score
fig.compare_acc_scatter(var1, var2, 'Data set', 'F1 Score difference 60-30', 'F1 Difference for FC Search Space (60-30) - ATM', fig_size = (9, 4))Average loss = -4.4 from 6 datasets
7 datasets have the same performance
Average gain = 4.7 from 6 datasets





















































