Skip to content

Should we apply NewRowSynthesis by default? #188

@npatki

Description

@npatki

Version: 0.8.0 (in developement)

Problem Description

Currently, we are applying the SDMetrics NewRowSynthesis by default in the benchmark_single_table script. The motivation was to capture whether new synthetic data is being created at all -- or whether the rows are being re-used as in DataIdentity.

But in practice, the NewRowSynthesis metric may not be too robust. It may error out on a large # of columns, and leading to generally longer benchmarking runs.

Expected behavior

We should consider the behavior of the default NewRowSynthesis metric that we apply:

  1. We could disable it. That is, by default set sdmetrics=None
  2. We could fix the underlying issues with it in the SDMetrics library. Perhaps that can achieved by subsetting or some other means.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions