Skip to content

JIT/AOT to interpreter calls support #116353

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

janvorli
Copy link
Member

@janvorli janvorli commented Jun 5, 2025

We only had support for calling interpreted void returning method with no arguments by the JIT/AOT. This change enables support for passing all possible kinds of arguments and returning all types.

It reuses the CallStubGenerator that was implemented for the interpreter to native code usage and adds support for the other direction to it in mostly trivial manner.
In addition to that, assembler routines for storing argument values to the interpreter stack were added.
The CallStubGenerator generates a list of routines to copy the arguments from CPU registers and stack to the interpreter stack. The last one makes call to the ExecuteInterpretedMethod and then puts the result into appropriate registers.
For functions that return result via a return buffer, the buffer is passed to the ExecuteInterpretedMethod so that the IR opcode to return valuetype stores it directly to the return buffer.

The ARM64 for Apple OSes is the most optimized version, as it is the primary target where the performance matters the most. It eliminates argument registers saving to stack on the fast path when we already have the call stub.

@janvorli janvorli added this to the 10.0.0 milestone Jun 5, 2025
@janvorli janvorli self-assigned this Jun 5, 2025
@Copilot Copilot AI review requested due to automatic review settings June 5, 2025 22:33
@janvorli janvorli requested review from BrzVlad and kg as code owners June 5, 2025 22:33
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces comprehensive support for interpreter-to-JIT and JIT-to-interpreter calls, extending argument passing and return value handling to all kinds of method signatures. Key changes include new test methods in the interpreter tests, adjustments in call stub generation and assembly routines, and updates to thread-local and ABI-related constants.

Reviewed Changes

Copilot reviewed 17 out of 17 changed files in this pull request and generated no comments.

Show a summary per file
File Description
src/tests/JIT/interpreter/Interpreter.cs Added several new test methods for various calling conventions to exercise interpreter/JIT interop.
src/coreclr/vm/threads.* Modified thread-local definitions and moved the interpreter thread context member from private to public.
src/coreclr/vm/prestub.cpp Updated the signature and return value handling of ExecuteInterpretedMethod.
src/coreclr/vm/interpexec.cpp Added CreateNativeToInterpreterCallStub with adjustments to generate call stubs and ensure thread context.
src/coreclr/vm/callstubgenerator.h Extended the call stub generator with new routines and a parameter to distinguish call directions.
asmconstants.h, AsmHelpers.asm, and PAL assembly macros Updated ABI constants and assembly routines to support the new calling conventions.
src/coreclr/interpreter/interpretershared.h Updated the InterpMethod structure to include the call stub pointer.
Comments suppressed due to low confidence (1)

src/coreclr/vm/interpexec.cpp:66

  • [nitpick] The AllocMemTracker variable is redeclared here, shadowing the one declared on line 56. To improve clarity and avoid potential confusion, consider reusing the outer 'amTracker' instead of redeclaring it.
        AllocMemTracker amTracker;

Copy link
Contributor

Tagging subscribers to this area: @BrzVlad, @janvorli, @kg
See info in area-owners.md if you want to be subscribed.

Until now, we only had support for calling interpreted void returning
method with no arguments by the JIT/AOT. This change enables support for
passing all possible kinds of arguments and returning all types.

It reuses the `CallStubGenerator` that was implemented for the interpreter
to native code usage and adds support for the other direction to it in
mostly trivial manner.
In addition to that, assembler routines for storing argument values to
the interpreter stack were added.
The `CallStubGenerator` generates a list of routines to copy the
arguments from CPU registers and stack to the interpreter stack.
The last one makes call to the `ExecuteInterpretedMethod` and then puts
the result into appropriate registers.
For functions that return result via a return buffer, the buffer is
passed to the `ExecuteInterpretedMethod` so that the IR opcode to return
valuetype stores it directly to the return buffer.

The ARM64 for Apple OSes is the most optimized version, as it is the
primary target where the performance matters the most. It eliminates
argument registers saving to stack on the fast path when we already have
the call stub.
@janvorli janvorli force-pushed the jit-to-interpreter-calls branch from 8ed6861 to df8d547 Compare June 6, 2025 00:24
; Interpreter-TODO: consider not storing the argument registers in the PROLOG_WITH_TRANSITION_BLOCK and
; store / restore them only around this call. We would still need to save RCX and RDX as they can contain
; the return buffer address.
call CreateNativeToInterpreterCallStub
Copy link
Member

@jkotas jkotas Jun 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not see how the arguments get reported to the GC while CreateNativeToInterpreterCallStub is executing. Is it a GC hole?

Would it be better to compile the transition helper in PreStubWorker? PreStubWorker gets called whenever we are about the execute a method from JIT/AOT code that is not compiled/prepared yet, and it has all the right infrastructure to protect arguments. It would be best to avoid duplicating it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point and compiling it in the PreStubWorker would definitely be better.


LEAF_ENTRY Store_XMM2_XMM3, _TEXT
movsd real8 ptr [r10], xmm2
movsd real8 ptr [r10 + 8], xmm3
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aren't xmm registers 16 bytes?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing what's going on here is there's an array of individual pointers at r10 for the location of the data for each xmm reg but wanted to make sure

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case we are handling the XMM registers for the case where the 16 byte XMM register is just being used for an 8 byte floating point value. I'm a bit curious if this works well for floats as well as doubles, but the concept is sound. (I could see that perhaps our interpreter stack models all floating point values as doubles, which will introduce floating point differences from what the JIT does in certain cases, but it is pretty close.

More interestingly though I don't see handling for Vector64/Vector128 on Arm64 which should be treated as HFA values. I don't think this all needs to be handled before the PR is merged, but we should have it on our worklist.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The xmm registers are 16 bytes long, but for argument passing purposes, they can only carry either float or double arguments, so we use max 8 bytes out of it. See https://learn.microsoft.com/en-us/cpp/build/x64-calling-convention?view=msvc-170.
For Unix, it is the same case. The calling convention would allow larger data types passed in all 16 bytes, but that's not used in .NET (In C, it can pass arguments of types __float128, _Decimal128 and __m128)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would love to see one comment in here somewhere to the effect of 'we only use up to the first 8 bytes of the xmm registers due to the x64 calling convention' just so anyone looking at this can go 'oh, that makes sense' and look up the calling convention for more info (i.e. that vectors are passed by-reference)

Otherwise, looks good, thanks for the explanation.

Copy link
Member

@davidwrighton davidwrighton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, this looks good. I suspect we haven't fleshed out all the calling convention gremlins, but this should get most of it.


LEAF_ENTRY Store_XMM2_XMM3, _TEXT
movsd real8 ptr [r10], xmm2
movsd real8 ptr [r10 + 8], xmm3
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case we are handling the XMM registers for the case where the 16 byte XMM register is just being used for an 8 byte floating point value. I'm a bit curious if this works well for floats as well as doubles, but the concept is sound. (I could see that perhaps our interpreter stack models all floating point values as doubles, which will introduce floating point differences from what the JIT does in certain cases, but it is pretty close.

More interestingly though I don't see handling for Vector64/Vector128 on Arm64 which should be treated as HFA values. I don't think this all needs to be handled before the PR is merged, but we should have it on our worklist.

@@ -28,6 +30,7 @@ struct InterpMethod
CORINFO_METHOD_HANDLE methodHnd;
int32_t allocaSize;
void** pDataItems;
CallStubHeader *pCallStub;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should have a comment indicating its for use for calling from the jitted calling convention into the interpreter

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add one.

@janvorli
Copy link
Member Author

janvorli commented Jun 6, 2025

I'm a bit curious if this works well for floats as well as doubles
@davidwrighton the interpreter distinguishes between floats and doubles, it knows what each slot contains. So in case the argument is float, we just write in 8 bytes, but the interpreter only ever uses the low 4 bytes.

@janvorli
Copy link
Member Author

janvorli commented Jun 6, 2025

More interestingly though I don't see handling for Vector64/Vector128 on Arm64 which should be treated as HFA values.

Good point. I think that the only one that would not work is the Vector128. For that one, we will need to add routines for both storing and loading q0..q7. The Vector64 should be handled by the regular d0..d7 storing / loading routines.

@@ -432,6 +432,8 @@ PCODE MethodDesc::PrepareILBasedCode(PrepareCodeConfig* pConfig)
#ifdef FEATURE_INTERPRETER
if (pConfig->IsInterpreterCode())
{
CreateNativeToInterpreterCallStub(pCode);
Copy link
Member

@jkotas jkotas Jun 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this going to create the stub only for the methods that are actually called from native code, or for all interpreted methods? If it is the later, it sounds wasteful.

@@ -405,3 +411,25 @@ C_FUNC(\Name\()_End):
free_stack 56
POP_CALLEE_SAVED_REGISTERS
.endm

.macro INLINE_GET_TLS_VAR Var
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is duplicated a few lines above in INLINE_GET_ALLOC_CONTEXT_BASE. Move the INLINE_GET_TLS_VAR macro up and switch INLINE_GET_ALLOC_CONTEXT_BASE to use it?

// Loads the address of a thread-local variable into the target register,
// which cannot be x0.
// Preserves registers except for xip0 and xip1 on Apple
.macro INLINE_GET_TLS_VAR target, var
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants