-
Notifications
You must be signed in to change notification settings - Fork 4k
Description
Describe the enhancement requested
Feature Request
There’s currently no utf8_zfill kernel in pyarrow.compute, so Python’s str.zfill() behavior can't be reproduced efficiently with Arrow arrays.
While fixing pandas-dev/pandas#61485, I noticed Series.str.zfill() breaks when used on ArrowDtype(pa.string()) because the backend expects a string-padding kernel like utf8_rjust, but nothing exists for zfill. For now, it has to fall back to element-wise Python ops which aren't ideal
Reproduction
import pandas as pd
import pyarrow as pa
s = pd.Series(["A", "AB", "ABC"], dtype=pd.ArrowDtype(pa.string()))
s.str.zfill(3) # Currently falls back to Python and works via slow path
Expected behavior would be
'A' → '00A'
'AB' → '0AB'
'ABC' → 'ABC' (no change since it's already 3 chars)
What we need
A kernel like pc.utf8_zfill(array, width) that mimics Python’s str.zfill():
-
Pad strings with '0' from the left to reach width
-
Optional enhancement: handle signs (+, -) same as Python
Why it matters
This will help pandas fully support .str.zfill() for Arrow-backed string arrays, similar to how utf8_rjust, binary_join, etc., already work natively. It'll avoid falling back to slower Python paths and ensure parity with standard Python string behavior
Notes
I’ve temporarily added a TODO in the pandas code to switch over once this is available.
Component(s)
Python