Skip to content

Conversation

@thomasmarwitz
Copy link
Contributor

@thomasmarwitz thomasmarwitz commented Apr 9, 2025

Add initial support for duckdb (and pyarrow) s.t. no conversion to pandas.DataFrame is performed.

  • Docs
  • Changelog entry

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These functions all look familiar. Are these arrow rewrites of ones that exist as pandas versions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

align_categories is:

@thomasmarwitz
Copy link
Contributor Author

@xhochy do you think we can get this merged today?

@xhochy xhochy reopened this Jul 17, 2025
Comment on lines 1112 to -997
def build_indices(self, columns: Iterable[str]):
"""This builds the indices for this metapartition for the given
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should restore the docstring.

return sorted(secondary_indices)


def group_table_by_partition_keys(table: pa.Table, partition_on: list[str]):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the only place where polars is used. Let's figure out how to do it without it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants