Skip to content

[BUG] Crash occurs during type inference on empty DataFrame #86

Description

@rudraditya21

When running infer_types on an empty DataFrame, the logic in type_infer/rule_based/core.py (lines 33–94) fails because population_size is 0. The logging statement at line 41 performs a division by population_size, causing a ZeroDivisionError.

Even if that is guarded, the subsequent identifier pass still breaks: get_identifier_description is called with an empty column and immediately accesses data[0], which raises an IndexError on empty input.

Steps To Reproduce

import pandas as pd
from type_infer.api import infer_types

df = pd.DataFrame()
print(infer_types(df))

Output:

INFO:type_infer-21891:Analyzing a sample of 0
Traceback (most recent call last):
  File "/Users/apple/Desktop/type_infer/./issue_test/main.py", line 5, in <module>
    print(infer_types(df))
          ^^^^^^^^^^^^^^^
  File "/Users/apple/Desktop/type_infer/type_infer/api.py", line 38, in infer_types
    return engine.infer(data)
           ^^^^^^^^^^^^^^^^^^
  File "/Users/apple/Desktop/type_infer/type_infer/rule_based/core.py", line 41, in infer
    f'from a total population of {population_size}, this is equivalent to {round(sample_size * 100 / population_size, 1)}% of your data.')  # noqa
                                                                                 ~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~
ZeroDivisionError: division by zero

Expected Output:

Empty inputs should be handled properly. The function should either return an empty or an invalid TypeInformation, or raise a ValueError explaining that type_infer cannot run on an empty DataFrame.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions