Skip to content

KeyError if timeout occurs while evaluating metrics #332

@frances-h

Description

@frances-h

Environment Details

Please indicate the following details about the environment in which you found the bug:

  • SDGym version:
  • Python version:
  • Operating System:

Error Description

When using SDGym with timeouts, a KeyError can occur if the timeout triggers while computing scores for the dataset.

Timeout running GaussianCopulaSynthesizer on dataset adult;
Timeout running GaussianCopulaSynthesizer on dataset alarm;
Timeout running GaussianCopulaSynthesizer on dataset asia;
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/frances/Documents/SDGym/sdgym/benchmark.py", line 791, in benchmark_single_table
    scores = _run_jobs(multi_processing_config, job_args_list, show_progress)
  File "/Users/frances/Documents/SDGym/sdgym/benchmark.py", line 527, in _run_jobs
    scores = pd.concat(scores, ignore_index=True)
  File "/Users/frances/.pyenv/versions/3.10.9/envs/3.10/lib/python3.10/site-packages/pandas/core/reshape/concat.py", line 382, in concat
    op = _Concatenator(
  File "/Users/frances/.pyenv/versions/3.10.9/envs/3.10/lib/python3.10/site-packages/pandas/core/reshape/concat.py", line 445, in __init__
    objs, keys = self._clean_keys_and_objs(objs, keys)
  File "/Users/frances/.pyenv/versions/3.10.9/envs/3.10/lib/python3.10/site-packages/pandas/core/reshape/concat.py", line 504, in _clean_keys_and_objs
    objs_list = list(objs)
  File "/Users/frances/.pyenv/versions/3.10.9/envs/3.10/lib/python3.10/site-packages/tqdm/std.py", line 1181, in __iter__
    for obj in iterable:
  File "/Users/frances/Documents/SDGym/sdgym/benchmark.py", line 468, in _run_job
    scores = _format_output(
  File "/Users/frances/Documents/SDGym/sdgym/benchmark.py", line 394, in _format_output
    scores.insert(len(scores.columns), score['metric'], score['normalized_score'])
KeyError: 'normalized_score'

Steps to reproduce

(May be machine dependent since it requires the timeout to occur while computing scores for the dataset)

import sdgym

sdgym.benchmark_single_table(synthesizers=['GaussianCopulaSynthesizer'], timeout=30)

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions