Skip to content

Time limit ignored on Linux #6

@mnaylor5

Description

@mnaylor5

Hi Jimmy,

I'm trying to illustrate GOSDT with the diabetes dataset located here, and it seems that the time limit is being ignored. I've tried with continuous features, as well as discretizing on my own, but I can't seem to get anything to return in the amount of time I would expect. I'm running the following code in a Jupyter notebook on a Debian Linux instance with 8 cores and 30GB RAM, so I wouldn't suspect a hardware issue (RAM particularly is hovering around ~1GB used). This example took ~23 minutes on my machine.

## --- env setup --- ##
import pandas as pd
from sklearn.model_selection import train_test_split
import sys
sys.path.append('../GeneralizedOptimalSparseDecisionTrees/python/') # location of cloned GOSDT repo
from model.gosdt import GOSDT 

## --- load data (directly from Kaggle location) --- ##
diabetes = pd.read_csv('diabetes.csv')

## --- same training/test split I'm using --- ##
train, _ = train_test_split(diabetes, random_state=0, test_size=0.2)
X = train.drop(columns="Outcome")
y = train['Outcome']

## --- specify and fit model --- ##
hyperparams = {
    "regularization": 0.1,
    "time_limit":10,
    "precision_limit":0.1,
    "worker_limit":8,
    "verbose": True
}

model = GOSDT(hyperparams)
model.fit(X, pd.DataFrame(y))
print(model.time / 60) # 23.861

A potentially separate issue is that I have only been able to get trees with a single split and two terminal nodes - this is perhaps due in part to the time limit issue not allowing any regularization lower than ~0.1, but I wanted to see if you could offer any advice. Here is the tree I'm getting, pretty much regardless of which combination of hyperparameters I use:

if 144 <= Glucose then:
    predicted class: 1
    misclassification penalty: 0.065
    complexity penalty: 0.1

else if Glucose < 144 then:
    predicted class: 0
    misclassification penalty: 0.191
    complexity penalty: 0.1

Thank you!
-Mitch

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions