You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have searched the [pandas] tag on StackOverflow for similar questions.
I have asked my usage related question on StackOverflow.
Link to question on StackOverflow
None
Question about pandas
ssue Description
TL;DR: Since pandas 2.0+, .tolist() and similar methods return NumPy types instead of native Python types, severely impacting user experience and data readability.
Problem Example
Before (pandas 1.x):
pythondf.index.tolist()
Poor Readability: Results are cluttered with np.int64(), np.float64() wrappers
Debugging Nightmare: Harder to quickly scan and understand data
Display Issues: When printing or logging, output is unnecessarily verbose
User Confusion: Many users don't understand why they're seeing NumPy types
Breaking Change: Existing code expectations broken without clear migration path
Current Workarounds Are Painful
Users now need to write additional code for basic operations:
python# Instead of simple:
indices = df.index.tolist()
We need:
indices = [int(x) for x in df.index.tolist()]
The Core Problem
DataFrames are meant for data analysis and exploration. The primary use case is human-readable data inspection, not performance-critical numerical computation at the .tolist() level.
Suggested Solutions
Add a parameter: .tolist(native_types=True) (default True for user-facing methods)
Separate methods: Keep .tolist() for NumPy types, add .tolist_clean() for Python types
Configuration option: Allow users to set pandas behavior globally
Revert the change: Prioritize user experience over marginal performance gains
Why This Matters
Pandas' strength has always been its ease of use and intuitive behavior. This change sacrifices user experience for performance gains that most users don't need when calling .tolist().
The goal of data analysis is insight, not fighting with data types.
Request
Please consider reverting this behavior or providing a simple, built-in solution. The current situation forces every pandas user to write boilerplate code for basic data inspection.
Thank you for maintaining this incredible library. I hope we can find a solution that balances performance with the user-friendly experience that makes pandas great.
Environment:
pandas: 2.2.3
numpy: 1.26.4
Impact: All DataFrame operations returning lists
The text was updated successfully, but these errors were encountered:
It appears that you have a good grasp of the issue. IIRC this has been reported/discussed before but I can't find it at this time.
Breaking Change: Existing code expectations broken without clear migration path
I do not agree that from the pandas perspective this is true. Numpy made a change to their repr and pandas continues to return Numpy types as before, only the repr has changed and that should not really be considered a pandas issue.
However, to be fair, many users were probably unaware before that their lists contained numpy types and not Python types which would have perhaps been a more logical design choice. If pandas had however changed the return type this would have been a breaking change.
Please consider reverting this behavior or providing a simple, built-in solution.
IIRC other discussions have suggested making this breaking change in a future release in the the return type of some operations for which a return of standard Python objects would be appropriate. This seems reasonable to me.
Even though I'm sure this is a duplicate issue, I'll leave it open until I can find the other issues or until someone else point us in the right direction.
@mroeschke IIRC you did some PRs at some point related to this to fix ci?
Research
I have searched the [pandas] tag on StackOverflow for similar questions.
I have asked my usage related question on StackOverflow.
Link to question on StackOverflow
None
Question about pandas
ssue Description
TL;DR: Since pandas 2.0+, .tolist() and similar methods return NumPy types instead of native Python types, severely impacting user experience and data readability.
Problem Example
Before (pandas 1.x):
pythondf.index.tolist()
Returns: [0, 1, 2, 3, 4] # Clean, readable
Now (pandas 2.x):
pythondf.index.tolist()
Returns: [np.int64(0), np.int64(1), np.int64(2), np.int64(3), np.int64(4)] # Verbose, confusing
Impact on User Experience
Poor Readability: Results are cluttered with np.int64(), np.float64() wrappers
Debugging Nightmare: Harder to quickly scan and understand data
Display Issues: When printing or logging, output is unnecessarily verbose
User Confusion: Many users don't understand why they're seeing NumPy types
Breaking Change: Existing code expectations broken without clear migration path
Current Workarounds Are Painful
Users now need to write additional code for basic operations:
python# Instead of simple:
indices = df.index.tolist()
We need:
indices = [int(x) for x in df.index.tolist()]
The Core Problem
DataFrames are meant for data analysis and exploration. The primary use case is human-readable data inspection, not performance-critical numerical computation at the .tolist() level.
Suggested Solutions
Add a parameter: .tolist(native_types=True) (default True for user-facing methods)
Separate methods: Keep .tolist() for NumPy types, add .tolist_clean() for Python types
Configuration option: Allow users to set pandas behavior globally
Revert the change: Prioritize user experience over marginal performance gains
Why This Matters
Pandas' strength has always been its ease of use and intuitive behavior. This change sacrifices user experience for performance gains that most users don't need when calling .tolist().
The goal of data analysis is insight, not fighting with data types.
Request
Please consider reverting this behavior or providing a simple, built-in solution. The current situation forces every pandas user to write boilerplate code for basic data inspection.
Thank you for maintaining this incredible library. I hope we can find a solution that balances performance with the user-friendly experience that makes pandas great.
Environment:
pandas: 2.2.3
numpy: 1.26.4
Impact: All DataFrame operations returning lists
The text was updated successfully, but these errors were encountered: