Skip to content

AutoML is not working properly with large volume of data(30 Million rows with 150 features) #5747

@aniketaitawade

Description

@aniketaitawade

Sparkling Water Version

3.5

Issue description

Expected behavior:
Sparkling water can train individual models like XGBoost then it should also run for automl api.
Observed behavior:
Sparkling water can train individual models like XGBoost but fail to run with automl api.

Programming language used

Python

Programming language version

3.11

What environment are you running Sparkling Water on?

Cloud Managed Spark (like Databricks, AWS Glue)

Environment version info

15.4 LTS (includes Apache Spark 3.5.0, Scala 2.12)

Brief cluster specification

Runtime 15.4.x-scala2.12, 1 Driver with 64 GB Memory, 8 Cores, 7 Workers with 64 GB Memory 8 Cores

Relevant log output

Dont have any error logs as process continues for long time.

Code to reproduce the issue

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions