Skip to content

QST: Subject: User Experience Issue - NumPy Types in DataFrame Results Breaking Readability #61607

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
2 tasks done
COderHop opened this issue Jun 8, 2025 · 1 comment
Open
2 tasks done
Labels
Needs Triage Issue that has not been reviewed by a pandas team member Usage Question

Comments

@COderHop
Copy link

COderHop commented Jun 8, 2025

Research

  • I have searched the [pandas] tag on StackOverflow for similar questions.

  • I have asked my usage related question on StackOverflow.

Link to question on StackOverflow

None

Question about pandas

ssue Description
TL;DR: Since pandas 2.0+, .tolist() and similar methods return NumPy types instead of native Python types, severely impacting user experience and data readability.
Problem Example
Before (pandas 1.x):
pythondf.index.tolist()

Returns: [0, 1, 2, 3, 4] # Clean, readable

Now (pandas 2.x):
pythondf.index.tolist()

Returns: [np.int64(0), np.int64(1), np.int64(2), np.int64(3), np.int64(4)] # Verbose, confusing

Impact on User Experience

Poor Readability: Results are cluttered with np.int64(), np.float64() wrappers
Debugging Nightmare: Harder to quickly scan and understand data
Display Issues: When printing or logging, output is unnecessarily verbose
User Confusion: Many users don't understand why they're seeing NumPy types
Breaking Change: Existing code expectations broken without clear migration path

Current Workarounds Are Painful
Users now need to write additional code for basic operations:
python# Instead of simple:
indices = df.index.tolist()

We need:

indices = [int(x) for x in df.index.tolist()]
The Core Problem
DataFrames are meant for data analysis and exploration. The primary use case is human-readable data inspection, not performance-critical numerical computation at the .tolist() level.
Suggested Solutions

Add a parameter: .tolist(native_types=True) (default True for user-facing methods)
Separate methods: Keep .tolist() for NumPy types, add .tolist_clean() for Python types
Configuration option: Allow users to set pandas behavior globally
Revert the change: Prioritize user experience over marginal performance gains

Why This Matters
Pandas' strength has always been its ease of use and intuitive behavior. This change sacrifices user experience for performance gains that most users don't need when calling .tolist().
The goal of data analysis is insight, not fighting with data types.
Request
Please consider reverting this behavior or providing a simple, built-in solution. The current situation forces every pandas user to write boilerplate code for basic data inspection.
Thank you for maintaining this incredible library. I hope we can find a solution that balances performance with the user-friendly experience that makes pandas great.

Environment:

pandas: 2.2.3
numpy: 1.26.4
Impact: All DataFrame operations returning lists

@COderHop COderHop added Usage Question Needs Triage Issue that has not been reviewed by a pandas team member labels Jun 8, 2025
@simonjayhawkins
Copy link
Member

Thanks @COderHop for the report.

It appears that you have a good grasp of the issue. IIRC this has been reported/discussed before but I can't find it at this time.

Breaking Change: Existing code expectations broken without clear migration path

I do not agree that from the pandas perspective this is true. Numpy made a change to their repr and pandas continues to return Numpy types as before, only the repr has changed and that should not really be considered a pandas issue.

However, to be fair, many users were probably unaware before that their lists contained numpy types and not Python types which would have perhaps been a more logical design choice. If pandas had however changed the return type this would have been a breaking change.

Please consider reverting this behavior or providing a simple, built-in solution.

IIRC other discussions have suggested making this breaking change in a future release in the the return type of some operations for which a return of standard Python objects would be appropriate. This seems reasonable to me.

Even though I'm sure this is a duplicate issue, I'll leave it open until I can find the other issues or until someone else point us in the right direction.

@mroeschke IIRC you did some PRs at some point related to this to fix ci?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needs Triage Issue that has not been reviewed by a pandas team member Usage Question
Projects
None yet
Development

No branches or pull requests

2 participants