Skip to content

Ray memory usage not respecting defaults? #235

@aodj-snjallgogn

Description

@aodj-snjallgogn

I was using swifter to do a groupy/apply and it was going through Ray, and spawning 32 workers, however this was just OOMing. I thought I might be able to set the default to run on fewer workers, and so did the import and set npartitions to 8, however this seemed to have no effect.

I then tried df.swifter.set_npartitions(npartitions=8).groupby("ticket_id", group_keys=False).apply(create_ticket_object) also to no avail.

I saw the comment in the documentation about how the call to set_defaults needs to occur before the DataFrame is instantiated, and it occurred to me that since I'm using duckdb to query a number of csvs it might be creating the DataFrame in a different manner when I run something like this

df = duckdb.sql(
    """
    SELECT *
    FROM read_csv(
        ?,
        delim = ',',
        quote = '"',
        header = true,
        skip = 1,
        null_padding = true,
        parallel = false, -- we need this because the data is quoted
        all_varchar = true,
        max_line_size = 10000000
    );
    """,
    params=(f"{path}/*.csv",),
).df()

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions