-
Notifications
You must be signed in to change notification settings - Fork 5k
JIT/AOT to interpreter calls support #116353
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces comprehensive support for interpreter-to-JIT and JIT-to-interpreter calls, extending argument passing and return value handling to all kinds of method signatures. Key changes include new test methods in the interpreter tests, adjustments in call stub generation and assembly routines, and updates to thread-local and ABI-related constants.
Reviewed Changes
Copilot reviewed 17 out of 17 changed files in this pull request and generated no comments.
Show a summary per file
File | Description |
---|---|
src/tests/JIT/interpreter/Interpreter.cs | Added several new test methods for various calling conventions to exercise interpreter/JIT interop. |
src/coreclr/vm/threads.* | Modified thread-local definitions and moved the interpreter thread context member from private to public. |
src/coreclr/vm/prestub.cpp | Updated the signature and return value handling of ExecuteInterpretedMethod. |
src/coreclr/vm/interpexec.cpp | Added CreateNativeToInterpreterCallStub with adjustments to generate call stubs and ensure thread context. |
src/coreclr/vm/callstubgenerator.h | Extended the call stub generator with new routines and a parameter to distinguish call directions. |
asmconstants.h, AsmHelpers.asm, and PAL assembly macros | Updated ABI constants and assembly routines to support the new calling conventions. |
src/coreclr/interpreter/interpretershared.h | Updated the InterpMethod structure to include the call stub pointer. |
Comments suppressed due to low confidence (1)
src/coreclr/vm/interpexec.cpp:66
- [nitpick] The AllocMemTracker variable is redeclared here, shadowing the one declared on line 56. To improve clarity and avoid potential confusion, consider reusing the outer 'amTracker' instead of redeclaring it.
AllocMemTracker amTracker;
Tagging subscribers to this area: @BrzVlad, @janvorli, @kg |
Until now, we only had support for calling interpreted void returning method with no arguments by the JIT/AOT. This change enables support for passing all possible kinds of arguments and returning all types. It reuses the `CallStubGenerator` that was implemented for the interpreter to native code usage and adds support for the other direction to it in mostly trivial manner. In addition to that, assembler routines for storing argument values to the interpreter stack were added. The `CallStubGenerator` generates a list of routines to copy the arguments from CPU registers and stack to the interpreter stack. The last one makes call to the `ExecuteInterpretedMethod` and then puts the result into appropriate registers. For functions that return result via a return buffer, the buffer is passed to the `ExecuteInterpretedMethod` so that the IR opcode to return valuetype stores it directly to the return buffer. The ARM64 for Apple OSes is the most optimized version, as it is the primary target where the performance matters the most. It eliminates argument registers saving to stack on the fast path when we already have the call stub.
8ed6861
to
df8d547
Compare
src/coreclr/vm/amd64/AsmHelpers.asm
Outdated
; Interpreter-TODO: consider not storing the argument registers in the PROLOG_WITH_TRANSITION_BLOCK and | ||
; store / restore them only around this call. We would still need to save RCX and RDX as they can contain | ||
; the return buffer address. | ||
call CreateNativeToInterpreterCallStub |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not see how the arguments get reported to the GC while CreateNativeToInterpreterCallStub
is executing. Is it a GC hole?
Would it be better to compile the transition helper in PreStubWorker
? PreStubWorker
gets called whenever we are about the execute a method from JIT/AOT code that is not compiled/prepared yet, and it has all the right infrastructure to protect arguments. It would be best to avoid duplicating it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point and compiling it in the PreStubWorker
would definitely be better.
|
||
LEAF_ENTRY Store_XMM2_XMM3, _TEXT | ||
movsd real8 ptr [r10], xmm2 | ||
movsd real8 ptr [r10 + 8], xmm3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
aren't xmm registers 16 bytes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm guessing what's going on here is there's an array of individual pointers at r10 for the location of the data for each xmm reg but wanted to make sure
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case we are handling the XMM registers for the case where the 16 byte XMM register is just being used for an 8 byte floating point value. I'm a bit curious if this works well for floats as well as doubles, but the concept is sound. (I could see that perhaps our interpreter stack models all floating point values as doubles, which will introduce floating point differences from what the JIT does in certain cases, but it is pretty close.
More interestingly though I don't see handling for Vector64/Vector128 on Arm64 which should be treated as HFA values. I don't think this all needs to be handled before the PR is merged, but we should have it on our worklist.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The xmm registers are 16 bytes long, but for argument passing purposes, they can only carry either float or double arguments, so we use max 8 bytes out of it. See https://learn.microsoft.com/en-us/cpp/build/x64-calling-convention?view=msvc-170.
For Unix, it is the same case. The calling convention would allow larger data types passed in all 16 bytes, but that's not used in .NET (In C, it can pass arguments of types __float128, _Decimal128 and __m128)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would love to see one comment in here somewhere to the effect of 'we only use up to the first 8 bytes of the xmm registers due to the x64 calling convention' just so anyone looking at this can go 'oh, that makes sense' and look up the calling convention for more info (i.e. that vectors are passed by-reference)
Otherwise, looks good, thanks for the explanation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, this looks good. I suspect we haven't fleshed out all the calling convention gremlins, but this should get most of it.
|
||
LEAF_ENTRY Store_XMM2_XMM3, _TEXT | ||
movsd real8 ptr [r10], xmm2 | ||
movsd real8 ptr [r10 + 8], xmm3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case we are handling the XMM registers for the case where the 16 byte XMM register is just being used for an 8 byte floating point value. I'm a bit curious if this works well for floats as well as doubles, but the concept is sound. (I could see that perhaps our interpreter stack models all floating point values as doubles, which will introduce floating point differences from what the JIT does in certain cases, but it is pretty close.
More interestingly though I don't see handling for Vector64/Vector128 on Arm64 which should be treated as HFA values. I don't think this all needs to be handled before the PR is merged, but we should have it on our worklist.
@@ -28,6 +30,7 @@ struct InterpMethod | |||
CORINFO_METHOD_HANDLE methodHnd; | |||
int32_t allocaSize; | |||
void** pDataItems; | |||
CallStubHeader *pCallStub; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should have a comment indicating its for use for calling from the jitted calling convention into the interpreter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add one.
|
Good point. I think that the only one that would not work is the Vector128. For that one, we will need to add routines for both storing and loading q0..q7. The Vector64 should be handled by the regular d0..d7 storing / loading routines. |
@@ -432,6 +432,8 @@ PCODE MethodDesc::PrepareILBasedCode(PrepareCodeConfig* pConfig) | |||
#ifdef FEATURE_INTERPRETER | |||
if (pConfig->IsInterpreterCode()) | |||
{ | |||
CreateNativeToInterpreterCallStub(pCode); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this going to create the stub only for the methods that are actually called from native code, or for all interpreted methods? If it is the later, it sounds wasteful.
@@ -405,3 +411,25 @@ C_FUNC(\Name\()_End): | |||
free_stack 56 | |||
POP_CALLEE_SAVED_REGISTERS | |||
.endm | |||
|
|||
.macro INLINE_GET_TLS_VAR Var |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is duplicated a few lines above in INLINE_GET_ALLOC_CONTEXT_BASE
. Move the INLINE_GET_TLS_VAR
macro up and switch INLINE_GET_ALLOC_CONTEXT_BASE
to use it?
// Loads the address of a thread-local variable into the target register, | ||
// which cannot be x0. | ||
// Preserves registers except for xip0 and xip1 on Apple | ||
.macro INLINE_GET_TLS_VAR target, var |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here
We only had support for calling interpreted void returning method with no arguments by the JIT/AOT. This change enables support for passing all possible kinds of arguments and returning all types.
It reuses the
CallStubGenerator
that was implemented for the interpreter to native code usage and adds support for the other direction to it in mostly trivial manner.In addition to that, assembler routines for storing argument values to the interpreter stack were added.
The
CallStubGenerator
generates a list of routines to copy the arguments from CPU registers and stack to the interpreter stack. The last one makes call to theExecuteInterpretedMethod
and then puts the result into appropriate registers.For functions that return result via a return buffer, the buffer is passed to the
ExecuteInterpretedMethod
so that the IR opcode to return valuetype stores it directly to the return buffer.The ARM64 for Apple OSes is the most optimized version, as it is the primary target where the performance matters the most. It eliminates argument registers saving to stack on the fast path when we already have the call stub.