-
Notifications
You must be signed in to change notification settings - Fork 11
DuckDB read/write integration #220
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
DuckDB read/write integration #220
Conversation
422bc49 to
7378065
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These functions all look familiar. Are these arrow rewrites of ones that exist as pandas versions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
align_categories is:
-
helper/align_categories is rewrite of:
plateau/plateau/io_components/utils.py
Line 295 in 7190d1b
def align_categories(dfs, categoricals): -
cast_categoricals_to_dictionary is a take on
Line 269 in 7190d1b
empty_df = empty_df.astype(dict.fromkeys(categoricals, "category")) -
Just noticed that empty_table_from_schema can be a one-liner conversion
Co-authored-by: Uwe L. Korn <[email protected]>
Remove prints.
aaa2305 to
1cdae26
Compare
|
@xhochy do you think we can get this merged today? |
| def build_indices(self, columns: Iterable[str]): | ||
| """This builds the indices for this metapartition for the given |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should restore the docstring.
| return sorted(secondary_indices) | ||
|
|
||
|
|
||
| def group_table_by_partition_keys(table: pa.Table, partition_on: list[str]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the only place where polars is used. Let's figure out how to do it without it.
Add initial support for duckdb (and pyarrow) s.t. no conversion to
pandas.DataFrameis performed.