-
-
Notifications
You must be signed in to change notification settings - Fork 32.1k
gh-135474: Specialize arithmetic only on compact ints #135479
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good.
As a refinement, we can remove the additional check in the action micro-ops, narrowing the test in the guard micro-ops.
I'll start a benchmark run, just to check.
@@ -582,9 +582,10 @@ dummy_func( | |||
PyObject *right_o = PyStackRef_AsPyObjectBorrow(right); | |||
assert(PyLong_CheckExact(left_o)); | |||
assert(PyLong_CheckExact(right_o)); | |||
DEOPT_IF(!_PyLong_BothAreCompact((PyLongObject *)left_o, (PyLongObject *)right_o)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you narrow the checks in _GUARD_TOS_INT
and _GUARD_NOS_INT
to check for compact ints, we can drop the check here. That way we keep the guard and action in separate micro-ops.
Likewise for +
and -
. The only complexity is that optimizer_bytecodes.c
will need updating to handle compact ints differently.
This change is fine for now though, if you want to do the more sophisticated change in another PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I can do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On second thought, it's best to leave this to another PR. We need to introduce a new type altogether (one separate from the sym_set_type(&PyLong_Type), and I want to keep this PR's change confined to longobject.c).
So the benchmark results are in: it's consistently faster on most platforms, ignoring the strange fluctuations on macOS https://github.com/faster-cpython/benchmarking-public/tree/main/results/bm-20250614-3.15.0a0-4019a15 |
📚 Documentation preview 📚: https://cpython-previews--135479.org.readthedocs.build/