Skip to content

[SPARK-54938][PYTHON][TEST][FOLLOW-UP] Fix test_pyarrow_array_type_inference for pandas >= 3#55125

Closed
zhengruifeng wants to merge 1 commit intoapache:masterfrom
zhengruifeng:fix_inference_p3
Closed

[SPARK-54938][PYTHON][TEST][FOLLOW-UP] Fix test_pyarrow_array_type_inference for pandas >= 3#55125
zhengruifeng wants to merge 1 commit intoapache:masterfrom
zhengruifeng:fix_inference_p3

Conversation

@zhengruifeng
Copy link
Copy Markdown
Contributor

@zhengruifeng zhengruifeng commented Apr 1, 2026

What changes were proposed in this pull request?

pandas 3.x changed default string dtype to use pyarrow-backed storage, causing pa.array() to infer large_string instead of string for string Series. Conditionally expect large_string on pandas >= 3.

Why are the changes needed?

to resolve failure in https://github.com/apache/spark/actions/runs/23819581811/job/69428367355

Does this PR introduce any user-facing change?

No, test-only

How was this patch tested?

manually check

pandas==3.0.1

In [3]: import pyarrow as pa

In [4]: import pandas as pd

In [5]: ser = pd.Series(["a", "b", "c"], dtype=pd.StringDtype())

In [6]: pa.array(ser)
Out[6]:
<pyarrow.lib.LargeStringArray object at 0x103455d80>
[
  "a",
  "b",
  "c"
]

In [7]: pa.array(ser, pa.string())
Out[7]:
<pyarrow.lib.StringArray object at 0x1095546a0>
[
  "a",
  "b",
  "c"
]

pandas==2.3.3

In [7]: ser = pd.Series(["a", "b", "c"], dtype=pd.StringDtype())

In [8]: pa.array(ser)
Out[8]:
<pyarrow.lib.StringArray object at 0x10ae16620>
[
  "a",
  "b",
  "c"
]

In [9]: pa.array(ser, pa.string())
Out[9]:
<pyarrow.lib.StringArray object at 0x10ae14a00>
[
  "a",
  "b",
  "c"
]

Was this patch authored or co-authored using generative AI tooling?

Co-authored-by: Claude code (Opus 4.6)

pandas 3.x changed default string dtype to use pyarrow-backed storage,
causing pa.array() to infer large_string instead of string for string
Series. Conditionally expect large_string on pandas >= 3.

Co-authored-by: Isaac
@zhengruifeng
Copy link
Copy Markdown
Contributor Author

cc @Yicong-Huang

@zhengruifeng zhengruifeng changed the title [SPARK-54938][TEST][FOLLOW-UP] Fix test_pyarrow_array_type_inference for pandas >= 3 [SPARK-54938][PYTHON][TEST][FOLLOW-UP] Fix test_pyarrow_array_type_inference for pandas >= 3 Apr 1, 2026
@zhengruifeng
Copy link
Copy Markdown
Contributor Author

thanks, merged to master

@zhengruifeng zhengruifeng deleted the fix_inference_p3 branch April 1, 2026 06:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants