-
Notifications
You must be signed in to change notification settings - Fork 106
Open
Description
I was using swifter to do a groupy/apply and it was going through Ray, and spawning 32 workers, however this was just OOMing. I thought I might be able to set the default to run on fewer workers, and so did the import and set npartitions to 8, however this seemed to have no effect.
I then tried df.swifter.set_npartitions(npartitions=8).groupby("ticket_id", group_keys=False).apply(create_ticket_object) also to no avail.
I saw the comment in the documentation about how the call to set_defaults needs to occur before the DataFrame is instantiated, and it occurred to me that since I'm using duckdb to query a number of csvs it might be creating the DataFrame in a different manner when I run something like this
df = duckdb.sql(
"""
SELECT *
FROM read_csv(
?,
delim = ',',
quote = '"',
header = true,
skip = 1,
null_padding = true,
parallel = false, -- we need this because the data is quoted
all_varchar = true,
max_line_size = 10000000
);
""",
params=(f"{path}/*.csv",),
).df()Metadata
Metadata
Assignees
Labels
No labels