Skip to content

Support pandas 3.0#1745

Open
filippsatverily wants to merge 15 commits into
cdisc-org:mainfrom
filippsatverily:filipps/pandas_3_upgrade
Open

Support pandas 3.0#1745
filippsatverily wants to merge 15 commits into
cdisc-org:mainfrom
filippsatverily:filipps/pandas_3_upgrade

Conversation

@filippsatverily
Copy link
Copy Markdown
Contributor

@filippsatverily filippsatverily commented May 28, 2026

Summary

Upgrades cdisc-rules-engine to support pandas 3.0. Stacked on top of #1713 (relax dependency constraints) — please merge that first.

  • Drop the pandas <3.0 upper bound and add pytz (no longer a pandas transitive dep)
  • Replace applymap() with map() (removed in pandas 3.0)
  • Replace inplace=True mutation patterns (pandas 3.0 Copy-on-Write)
  • Handle pandas 3.0 default StringDtype in comparison operators
  • Handle extension arrays in DaskDataset.__setitem__
  • Remove unsupported dd.DataFrame type annotation in parquet_reader
  • Remove method= and downcast= kwargs removed in pandas 3.0
  • Replace Dask GroupBy .apply(set) path in Distinct operation

Moves dependency constraints to pyproject.toml.
Makes requirements.txt a lockfile.
Fixes an incompatibility caused by click 8.3.0, which passes the default value as-is.
Fixes an incompatibility caused by pyreadstat 1.2.9, which changed original_variable_type from 'NULL' to None
Works around an behavior change in jsonpath-ng 1.8.0 where Child.str gets wrapped in parenthesis.
Fixes tokenization errors when using dask 2024.8.1+. Starting with this version, dask enforces that tokens remain stable across pickle round-trips (dask/dask#11320). Capturing self in a lambda fails this check because instance objects can have non-deterministic pickle representations. Since calculate_variable_value_length is already a static method, replacing self with the class name is enough to remove the capture.
Dask 2025.4.0 optimizes multiple DataFrames together, which exposes division mismatches and causes dask to throw an error. This change removes a source of repartitioning, preserving the divisions when assigning a pandas series to a dask dataframe
Fixes a unit test to support pandas 2.2.0+. The pandas release fixes an sorting bug with pandas-dev/pandas#54611. This commit changes the expected results accordingly.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants