Skip to content

visionOS XR module#109975

Open
rsanchezsaez wants to merge 1 commit intogodotengine:masterfrom
rsanchezsaez:apple/visionos-xr
Open

visionOS XR module#109975
rsanchezsaez wants to merge 1 commit intogodotengine:masterfrom
rsanchezsaez:apple/visionos-xr

Conversation

@rsanchezsaez
Copy link
Copy Markdown
Contributor

@rsanchezsaez rsanchezsaez commented Aug 26, 2025

PR co-authored by @huisedenanhai and @stuartcarnie

Dear Godot community,

This PR is a followup to #105628 and #109974 (I encourage you to read the descriptions of those PR to get the full context of this contribution).

This change adds a visionOS XR module to Godot, which allows building immersive experiences on visionOS.

This PR builds on top of #109974, so that one should be merged first.

As always, we have tried to align with Godot's quality, style, and engine direction, to minimize any disruption coming from these changes. We'll be more than happy to work with the community on iterating this patch.

cc @BastiaanOlij @stuartcarnie @bruvzg @clayjohn

Technical Discussion

We have developed and tested these changes using Xcode 26.

The visionOS XR module only supports the Mobile renderer for now, the Forward+ renderer is not supported.

To use the visionOS XR module you must set the new application/app_role export setting to Immersive. You can choose if you want passthrough or not by the new application/immersion_style export setting.

Then, initialize the visionOS XR module in a script:

    var interface = XRServer.find_interface("visionOS")
    if interface and interface.initialize():
        var viewport : Viewport = get_viewport()
        viewport.use_xr = true
        viewport.vrs_mode = Viewport.VRS_XR
        viewport.use_hdr_2d = true

Some implementation details:

  • The visionOS platform now has two different execution paths implemented by the GodotWindowScene and CompositorServicesImmersiveSpace scenes in app_visionos.swift. The application/app_role export setting controls which scene is used.
  • The visionOS XR interface tries to be as close to the OpenXR interface as possible, to keep main renderer code changes to a minimum. It adopts Compositor Services and ARKit APIs, which is how you render Metal content on visionOS.
  • We obtain the head pose twice, once in from the main thread in process() so the camera and scripts can use it if needed, and another from the render thread in pre_render(), which is needed for rendering. The object cannot easily be shared between threads.
  • The projection matrices returned by visionOS have an inverse depth correction applied (visionOS uses the [0, 1] z space, but Godot expects the [-1, 1] z space until the rendering step).
  • The rasterizationRateMap (the structure that supports foveation on visionOS) is provided through the get_vrs_texture() function, using the new XR_VRS_TEXTURE_FORMAT_RASTERIZATION_RATE_MAP texture type. It's' passed through the renderer when creating passes/subpasses, to be ultimately set by the Metal driver.
  • Apple Vision Pro's' minimum supported near plane is 0.1. The visionOS XR module's get_projection_for_view() function will fail with an error message if the camera has a lower zNear.
  • The Metal driver has a new dummy SurfaceCompositorServices, which replaces SurfaceLayer when running in immersive mode. The reason for this is that the Compositor Services API needs to do a cp_drawable_encode_present() step with the MTLCommandBuffer used by the renderer, and this seemed the most natural way of overriding the present() call normally done by the Metal driver.

Testing

We have been testing this PR in windowed mode with the Platformer demo project. We have verified the project continues to work on iOS and visionOS, both with the Mobile and Forward+ renderers using the Metal rendering driver.

I'm also including a minimum example of a project rendering an immersive scene on visionOS. Currently the example has no interactivity, but you can move your head and look around the scene.

It looks like this:

ScreenRecording_08-25-2025.16-14-23_1_out.mp4

Missing Functionality

  • High quality video captures as detailed here are not currently supported. We will submit a followup PR to support this after this one is merged.
  • Pinch input, hand tracking and PSVR2 controller support are not part of this PR. Only classic Xbox/PS4/PS5 game controller input is supported.

Next steps

We have working pinch input, hand tracking, and PSVR2 controller tracking ready. We'll submit these as incremental PRs, to keep isolated functionality in each of the PRs for easier review. Once these areas are done, we believe porting any existing VR games to visionOS could be relatively easy.

Copy link
Copy Markdown
Contributor

@dsnopek dsnopek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

I skimmed the changes to XRInterface and provided some notes how these could maybe be unified with the way other platforms handle the same features. I'd personally prefer not to have virtual methods that are only for a single platform, which is something we've managed to avoid up until now

Comment thread servers/xr/xr_interface.h Outdated
Comment thread servers/xr/xr_interface.h Outdated
Comment on lines +144 to +146
virtual Rect2i get_viewport_for_view(uint32_t p_view) {
return Rect2i();
} /* get each view viewport rect, used for visionOS VR */
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like it may be like get_render_region() but needs to be different per view? If so, it might be better to add a p_view argument to get_render_region() (and provide compatibility methods in OpenXRInterfaceExtension), so that we have a single virtual method for this for all platforms, rather than get_viewport_for_view() which is just for VisionOS

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to work on this next. A question: conceptually, what's the difference between get_render_region() and get_render_target_size()?

Copy link
Copy Markdown
Contributor

@BastiaanOlij BastiaanOlij Aug 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_render_target_size should return the size of the render target used. The assumption is that we use the same resolution for both eyes and we use a layered texture, one layer per eye.

By default Godot will create this render target and render to each layer using the multiview extension (or equivelent), the XRInterface can however supply it's own render target texture if it requires additional setup (and since this often is part of a swapchain, updates on each frame to the destination currently in use).

We assume layers are always used due to relying on multiview, it seems stereo rendering where a single layer texture is used alongside with regions for a side by side layout was already going the way of the dodo at the time. Does Apple Vision Pro expect this capability?

get_render_region optionally returns a region within that texture that we're using so we can create dynamic resolution rendering without having to do expensive recreations of swapchains and such. This is generally only used if dynamic resolution is supported and enabled (through something like XR_META_recommended_layer_resolution).

If VisionPro just expects us to render to a fixed resolution texture, I would only use get_render_target_size.

Copy link
Copy Markdown
Contributor Author

@rsanchezsaez rsanchezsaez Aug 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation, that makes sense.

Actually, I think the proper solution for visionOS would be moving what's currently in get_viewport_for_view() to get_render_region_for_view() as @dsnopek suggested, because the viewport must be set to the logical resolution, which is bigger than the actual texture size as returned by get_render_target_size() (to account for foveation).

Copy link
Copy Markdown
Contributor

@dsnopek dsnopek Aug 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the viewport must be set to the logical resolution, which is bigger than the actual texture size as returned by get_render_target_size() (to account for foveation).

Hm, so the render region is bigger than the render target size? Or the other way around?

The render target size is supposed to be the pixel dimensions of the texture we're rendering to, and, generally, the render region is smaller. This is to allow rendering to just a subset of the texture - it's basically what you'd pass to glViewport() (in OpenGL) or vkCmdSetViewport() (in Vukan) - I don't know the Metal equivalent. I'm not a rendering expert, but I don't know that it can make sense for the viewport (as used in the aforementioned OpenGL and Vulkan functions) to be bigger than the underlying texture?

Anyway, it would be help to get some more clarification on what the Rect2i will be used for in this case

Copy link
Copy Markdown
Contributor

@stuartcarnie stuartcarnie Aug 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think part of the reason is that the region is not only used to set the viewports, but it's also used in other parts of the pipeline as viewport_size is passed at the end of draw_list_begin() to draw_graph.add_draw_list_begin() and _draw_list_start()

That's correct. The render region that is passed to the draw list is distinct from setting viewports, as it specifies the area that may be affected by any initial attachment operations, such as CLEAR, which will only affect the specified region. After that, viewports can be set to limit the area the draw calls can affect.

In Metal, that controls the renderTargetWidth and renderTargetHeight of the MTLRenderPassDescriptor. The width is extended from x=0 to the total with, since it is not a region.

And it is similar for D3D12 and Vulkan. For example, in Vulkan, it affects the render area of the subpass:

render_pass_begin.renderArea.offset.x = p_rect.position.x;
render_pass_begin.renderArea.offset.y = p_rect.position.y;
render_pass_begin.renderArea.extent.width = p_rect.size.x;
render_pass_begin.renderArea.extent.height = p_rect.size.y;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, it's starting to sound like the get_viewport_for_view() is for something distinct from get_render_region().

Does anyone know if (and how) this concept exists on other platforms? I've skimmed the links provided above, and this seems related to VRS, but I don't think we need this data for VRS on other platforms.

In my opinion, if this is something totally unique to VisionOS, then this shouldn't be on XRInterface, and instead should be communicated directly to Metal via some back channel

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dsnopek Yeah, if this is a concept on visionOS only, it makes sense not to add it to xr_interface. Do you have any suggestions for the cleanest way of backchannel this? Move this method to visionos_xr_interface and access it directly from there?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure exactly - this probably needs some input from the rendering team.

But I will say that, if possible, you'd probably want code from the visionos platform reaching into the renderer, rather than renderer code reaching into the visionos platform. That's generally the preferred data flow, but I'm not sure if that'd be possible, or what the rendering team would say

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is p_view argument really needed? If I understand correctly, you have different viewports for each eyes only if you configure the CompositorLayer layout to .shared.

In app_visionos.swift, you have configure the CompositorLayer layout to .layered. After some testing, I found .layered is the only option that supportes foveat rendering and runs without crash in current implementation.

I logged the viewport returned by this method. It always return the same value for both eyes.

@rsanchezsaez rsanchezsaez requested a review from a team as a code owner August 27, 2025 02:09
Comment thread servers/rendering/rendering_server.h Outdated
Comment thread servers/rendering/renderer_rd/forward_mobile/render_forward_mobile.cpp Outdated
@rsanchezsaez rsanchezsaez changed the title visionOS VR module visionOS XR module Sep 17, 2025
@rsanchezsaez rsanchezsaez requested review from a team as code owners October 2, 2025 22:33
@rsanchezsaez rsanchezsaez force-pushed the apple/visionos-xr branch 2 times, most recently from a0451d2 to 94f2dc0 Compare October 4, 2025 01:59
void set_head_pose_from_arkit(bool p_use_drawable);

public:
static StringName name() { return "visionOS"; }
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why have this name() function? It doesn't seem to be any different than get_name(), other than not being virtual. If the goal is to avoid calling the virtual function, the StringName could just be put in a static variable?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The objective was to provide the find_interface helper function, but I agree this makes more sense as a static variable. I'll update it.

	static StringName name() { return "visionOS"; }
	static Ref<VisionOSXRInterface> find_interface() {
		return XRServer::get_singleton()->find_interface(name());
	}

Comment thread servers/xr/xr_interface.h Outdated
Comment on lines +144 to +146
virtual Rect2i get_viewport_for_view(uint32_t p_view) {
return Rect2i();
} /* get each view viewport rect, used for visionOS VR */
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, it's starting to sound like the get_viewport_for_view() is for something distinct from get_render_region().

Does anyone know if (and how) this concept exists on other platforms? I've skimmed the links provided above, and this seems related to VRS, but I don't think we need this data for VRS on other platforms.

In my opinion, if this is something totally unique to VisionOS, then this shouldn't be on XRInterface, and instead should be communicated directly to Metal via some back channel

Copy link
Copy Markdown
Contributor

@BastiaanOlij BastiaanOlij left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey Ricardo,

I finally had some proper time to look into this, though I lost a bunch of time getting everything setup and up and running and ran afoul with an issue with my Apple developer account (sigh) so I haven't been able to do a full end to end test yet.

Did go through all the code finally, started just putting some remarks on code but gave up on that as the real stuff is all joined together.
Also this is a big brain dump after a long day of trying to figure things out so probably plenty of things that need further feedback and throwing back and forth.

Overall I think we still have some ways to go. We need to find different ways to solve a few things.
As a ground rule, platform dependent code should be kept to the platform. There is a lot of vision pro specific code that is ending up in the renderer and that won't ever be acceptable to merge. We kinda knew that already so I'm probably stating the obvious.
We can get away with some constructive solutions on the Metal driver side, and around the XRInterface implementation as they are closely linked to the platform.

While we should first get things on the correct path and see if this is still the case, I also found that there are changes here that probably deserve to be in their own PR. Especially if we do need to make changes to the certain base classes, they should be separated out so the relevant teams can review them more closely.

So I think there are 3 main areas that we can focus on first. 2 relatively small ones, one big one.

HDR output

I have to admit that I'm not up to date with the work being done in this area but I do know that @allenwp is working on various things related to output to HDR capable displays, so he may be able to weight in here if he has time.

Bottom line for me is that users know they are targeting VisionPro and can turn HDR on for the main viewport and possibly preventing conversion to the wrong color space) is a function of the display server.

This way we don't need an extra API on the XR interface nor pollute the rendering code with checks.

Sky depth changes

This one puzzled me. Does it really have an issue if the depth is at zero? Again similar thing, it should not have a VisionPro overrule in this place.
We need to either expose it in some generic way, or maybe just set the depth for Multiview as it probably can't hurt for other devices.
I would also change the vertex shader to gl_Position = vec4(uv_interp, SKY_DEPTH, 1.0); or something to that effect, instead of manipulating gl_FragDepth or certain platforms will disable early z checks.

This is also a good example of something that should be in its own PR and clearly documented why we're making the change.

VRS/rasterization map

So finally the biggy, this one we need to drastically change. We can't have such big changes to the rendering server just for VisionPro.

The problem here is that this is all new to us, and we're still wrapping our heads around it, and don't really have the time to dedicate to it.

If I understand correctly the main difference is that when we're dealing with OpenGL/Vulkan, we render at a larger resolution which then gets lens distorted by the VR compositor and output to the device. To speed this up, we use VRS so we don't render every pixel in that large image, but duplicate the output of fragments the further from the center we get. Most mobile hardware does something smart by having tiles on the periphery cover larger areas and just upscale when writing to the image.

On VisionPro, our stored image isn't at the large resolution, I'm guessing its either on the output resolution, or something sightly enlarged to pack in the tiled results? I don't fully get whether it's just applying lens distortion early, or whether it's just smartly packing the tiles so it needs less space. Doesn't really matter i guess.
Then we have a virtual resolution which would be the actual texture as we'd render it before lens distortion without any VRS applied. Each tile is the same resolution but tiles rendered towards the outside cover a larger area then tiles closer to the center, and they are written into our output image.

If this assumption is close enough to the truth, I think we can rewrite this logic to make the rendering server completely ignorant of the virtual resolution. We don't need any new functions on the XR Interface, we toss away most of the changes on the renderer classes (maybe all), we just pretend that all that exists is the output resolution, because that is the resolution at which our images are being allocated.

The trick will be to make sure the driver knows when and what virtual resolution to apply, and with which rasterization map.

This is where Godots RID system becomes handy because the RID can point to a struct that has both the rasterization map, and the virtual resolution. Now we can use this RID as the VRS RID, and have the VRS RID being set be the trigger for the driver to know additional stuff needs to be done. We probably do need to wrap this in a texture RID to make the rendering server happy but thats no biggy.

Note that there are two modes (because Vulkan has two ways of doing density map VRS), one where you can set it per subpass, and one where its set for the whole pass. We don't need a new field, we can just reuse the existing one.
Then in the driver when the VRS RID is set, you can lookup the rasterization map and virtual resolution and apply it. The whole rendering engine needs to be none the wiser we're doing this extra logic, as long as the metal driver understands it.

I think you don't even need the new VRS format, just use XR_VRS_TEXTURE_FORMAT_FRAGMENT_DENSITY_MAP and return the above mentioned RID with get_vrs_texture.

Anyway, its way too late, I'm probably overlooking things, but in broad strokes, I think this is a direction we should consider and talk more about, and the solution should have a lot less impact on existing code than it has now.

Comment thread scene/3d/xr/xr_nodes.cpp Outdated
}

if (get_near() < 0.1) {
warnings.push_back(RTR("XRCamera3D doesn't support a Near value lower than 0.1 on the visionOS platform. The scene won't be displayed."));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems arbitrary, people will hold things closer to their face than 10cm and set this to a lower value on many platforms. Not sure if everyone should be bothered by a warning for this.

Does this break break stuff on visionos?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In VisionOSXRInterface::get_projection_for_view, the supported min near plane is retrieved with cp_layer_renderer_capabilities_supported_minimum_near_plane_distance. But here the magic number 0.1 occurs.

I haven't found any documentation that says about the actual value returned by layer capabilities.

The value 0.1 itself is fine. Given the thickness of VP headset is roughly 5cm. But will the actual value changes if some thinner model reveals in the future?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, @huisedenanhai is right, we cannot have a zNear value lower than cp_layer_renderer_capabilities_supported_minimum_near_plane_distance on visionOS, the API refuses to render anything if you set it so. For now, I have removed the editor warning, and added a descriptive error message explaining the issue if it happens at runtime here

Comment thread scene/3d/xr/xr_nodes.cpp Outdated
warnings.push_back(RTR("XROrigin3D requires an XRCamera3D child node."));
}

if (get_scale() != Vector3(1, 1, 1)) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should possible be in its own PR, this has trapped people before. I'd also add a reference to world_scale which was introduced to allow scaling of this sort and properly scales the projection data.

PipelineCacheRD *pipeline = &shader_data->pipelines[sky_scene_state.view_count > 1 ? SKY_VERSION_BACKGROUND_MULTIVIEW : SKY_VERSION_BACKGROUND];
SkyVersion version;
if (sky_scene_state.view_count > 1) {
#if defined(VISIONOS_ENABLED)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need a different way to solve this, having platform dependent code in the renderer is a big nono.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding a tiny depth is the solution to the blocky artifact I mentioned previously.
But I have to say, requiring application to mark full screen a tiny depth is a very bad design in CompositorService. Why we have to care about final frame depth even in full immersion mode? In Full immersion mode, the compositor already knows every pixel except upper limb passthrough will be overwritten by application output. Why cann't the Compositor work correctly when depth is zero?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it's the way the system is designed, as an optimization to avoid doing reprojection on parts of the screen when nothing is being rendered. I have removed this from the PR, and we can figure out solutions for the skybox and/or semi-transparent content separately.

Comment thread drivers/metal/metal_objects.mm Outdated
}
}

#ifdef VISIONOS_ENABLED
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinks like this, I would remove the #ifdef if we can, instead just use:

	if (subpass.rasterization_rate_map != nil) {
		desc.rasterizationRateMap = subpass.rasterization_rate_map;
	} else {
		desc.renderTargetWidth = MAX((NSUInteger)MIN(render.render_area.position.x + render.render_area.size.width, fb.size.width), 1u);
		desc.renderTargetHeight = MAX((NSUInteger)MIN(render.render_area.position.y + render.render_area.size.height, fb.size.height), 1u);
	}

And just make sure subpass.rasterization_rate_map can't ever be set unless its supported by the platform.
This way we make the code much cleaner and we just implement the features as part of the metal implementation, and just leave it up to the platforms supporting it to use these features.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. MTLRasterizationRateMap is supported on iOS 13+. It can used for some interesting optimization for mobile device. For example, reducing scene rendering rate under HUD. No need to mark it visionOS only.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks both. I ended up adopting @huisedenanhai proposed solution of storing the rasterizationRateMap in MDFrameBuffer.

@allenwp
Copy link
Copy Markdown
Contributor

allenwp commented Oct 14, 2025

Thanks for the ping, @BastiaanOlij!

HDR Output

@rsanchezsaez HDR output in Godot will be using Apple's Extended Dynamic Range (EDR) paradigm and adapted to work well across all platforms. Main development is happening in #94496 and will likely be merged soon, as it's a priority for 4.6. @stuartcarnie put together a rough draft prototype of how this might work on Mac OS and I have done a rebase or two that has resulted in the draft PR getting even more "hacky": #106814 Once we've merged the Windows HDR output, we will shift to making sure Mac OS is implemented correctly.

We will be having a meeting on Sunday to firm up the public-facing part of the HDR output API. We're coordinating in RocketChat in the rendering chat channel and you're welcome to join if you'd like, but the focus will be specifically on the user-facing HDR API and ensuring it will be reasonable for all platforms.

Does VisionOS use the same EDR and colour space paradigms as Mac OS and iOS with the same APIs?

@huisedenanhai
Copy link
Copy Markdown
Contributor

Hi @rsanchezsaez. I tried the code and it works great! I haven't look into the implementation, but I do have some suggestions from a user's perspective.

Need some documentation/warning on transparent objects and background

When in mixed immersion mode, CompositorService requires zero alpha channel if depth is zero. otherwise blocky artifact occurs.

This poses some issue when setting up background and transparent objects, as they don't write depth, but provide a non zero alpha value.

This is what it looks like when open a existing scene directly in mixed immersion mode before adjustment.

IMG_0078

Notice the blocky grey artifacts around the character, and the transparent halo object is no displayed. That's because the background has non-zero alpha value, and the halo object does not writes depth. The following image shows how the character should look like in normal window.

Screenshot 2025-10-18 at 16 25 14

Recently I got some experience on CompositorService by implementing a custom AR renderer with metal and objc. So I'm no suprised by the artifact. The solution for me it quite simple. Adjust the background alpha to zero, force the halo object as opaque, done. But this needs more clarification for normal users.

Here is how my scene looks like after fix.

IMG_0079

I suspect correctly handling transparent object rendering for visionOS might worth another PR. I only suggest some notes in documentation after this PR got merged.

Warning for near plane

Warning for near plane does not correctly update in editor after param adjustment. This is important. In current implementation, NOTHING is shown in view if near plane is less than 0.1m. That's quite awkward. Because the default value is 0.05m.

Screenshot 2025-10-18 at 16 40 59

Also is it possible to clamp the near plane to 0.1m automatically? That might be more intuitive than showing an empty space and spaming the log with warning.

@CCranney
Copy link
Copy Markdown

The solution for me it quite simple. Adjust the background alpha to zero, force the halo object as opaque, done. But this needs more clarification for normal users.

In case it helps someone else, these were the settings I changed based on the original demo used at the top of this PR to fix the same problem. It's not exactly @huisedenanhai's solution and likely has unimportant additions, but I'm just confirming that changes like this would be needed to remove the blocky halo effect from Mixed immersion experiences. I'm not as familiar with the inner workings of CompositorService, but if something like this does make it into the documentation direct instructions akin to the following could be helpful:

world environment settings:
Background -> Mode: Clear Color
Ambient Light -> Source: Color (make it grey?)
Glow: Off (?)
Fog: On (?)
Fog -> Density: 0.0015
Fog -> Sky Affect: 0.0

I've been playing around with visionOS development and identified a couple potential errors, but they seem obscure and situational enough that I'm planning on waiting until after this PR is merged to make issues for them. All in all, great work @rsanchezsaez! It's been amazing experimenting with visionOS experiences from this PR.

Comment thread servers/rendering/rendering_server.h Outdated
VIEWPORT_VRS_DISABLED,
VIEWPORT_VRS_TEXTURE,
VIEWPORT_VRS_XR,
VIEWPORT_VRS_XR_RASTERIZATION_RATE_MAP,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The enum has been unified to VIEWPORT_VRS_XR. But the enum itself has not been deleted.

Comment thread doc/classes/RenderingServer.xml Outdated
Variable rate shading texture is supplied by the primary [XRInterface]. Note that this may override the update mode.
</constant>
<constant name="VIEWPORT_VRS_MAX" value="3" enum="ViewportVRSMode">
<constant name="VIEWPORT_VRS_XR_RASTERIZATION_RATE_MAP" value="3" enum="ViewportVRSMode">
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The enum seems no longer used

Comment thread servers/rendering/rendering_server.cpp Outdated
BIND_ENUM_CONSTANT(VIEWPORT_VRS_DISABLED);
BIND_ENUM_CONSTANT(VIEWPORT_VRS_TEXTURE);
BIND_ENUM_CONSTANT(VIEWPORT_VRS_XR);
BIND_ENUM_CONSTANT(VIEWPORT_VRS_XR_RASTERIZATION_RATE_MAP);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The enum has been unified to VIEWPORT_VRS_XR. But the enum itself has not been deleted.

Comment thread platform/visionos/app_visionos.swift Outdated

let options: LayerRenderer.Capabilities.SupportedLayoutsOptions = foveationEnabled ? [.foveationEnabled] : []
let supportedLayouts = capabilities.supportedLayouts(options: options)
let layout: LayerRenderer.Layout = supportedLayouts.contains(.layered) ? .layered : .dedicated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will there be device that does not support layered mode?

I don't see the need to support layout other than .layered.

Also current implementation crashes when I force the layout to be .dedicated.

Copy link
Copy Markdown
Contributor Author

@rsanchezsaez rsanchezsaez Dec 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. .layered is the recommended mode, and new visionOS 26 features need this mode. Hence, I have changed this file to hardcode .layered.

Comment thread servers/xr/xr_interface.h Outdated
Comment on lines +144 to +146
virtual Rect2i get_viewport_for_view(uint32_t p_view) {
return Rect2i();
} /* get each view viewport rect, used for visionOS VR */
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is p_view argument really needed? If I understand correctly, you have different viewports for each eyes only if you configure the CompositorLayer layout to .shared.

In app_visionos.swift, you have configure the CompositorLayer layout to .layered. After some testing, I found .layered is the only option that supportes foveat rendering and runs without crash in current implementation.

I logged the viewport returned by this method. It always return the same value for both eyes.

Comment thread servers/rendering/renderer_compositor.h Outdated
virtual void initialize() = 0;
virtual void begin_frame(double frame_step) = 0;

virtual Error prepare_screen_for_drawing(DisplayServer::WindowID p_screen) { return OK; }
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is a simple wrapper for RD::get_singleton()->screen_prepare_for_drawing(p_screen). Seems redundent to me. Just call the inner method when needed. No need to add a virtual method to the base class.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, furthermore, I've moved this call to VisionOSXRInterface::post_draw_viewport(), to avoid platform-specific code in the main rendering path.

PipelineCacheRD *pipeline = &shader_data->pipelines[sky_scene_state.view_count > 1 ? SKY_VERSION_BACKGROUND_MULTIVIEW : SKY_VERSION_BACKGROUND];
SkyVersion version;
if (sky_scene_state.view_count > 1) {
#if defined(VISIONOS_ENABLED)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding a tiny depth is the solution to the blocky artifact I mentioned previously.
But I have to say, requiring application to mark full screen a tiny depth is a very bad design in CompositorService. Why we have to care about final frame depth even in full immersion mode? In Full immersion mode, the compositor already knows every pixel except upper limb passthrough will be overwritten by application output. Why cann't the Compositor work correctly when depth is zero?

frag_color.rgb += interleaved_gradient_noise(gl_FragCoord.xy) * params.luminance_multiplier;
#endif
#ifdef WRITE_DEPTH
gl_FragDepth = SKY_DEPTH;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. Don't adjust depth in fragment shader. That disables early-Z test. Very inefficient when user have heavy sky shader.

If you want to apply a global depth offset, you can adjust z coordinate in vertex shader.
In vertex shader, change the line

	gl_Position = vec4(uv_interp, 0.0, 1.0);

to

	gl_Position = vec4(uv_interp, SKY_DEPTH, 1.0);

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have reverted this for now for simplicity, and we can discuss other solutions separately.

Copy link
Copy Markdown
Contributor

@huisedenanhai huisedenanhai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have reviewed all codes. The implementation is good IMO. Great thanks to @rsanchezsaez. This PR uses some uncommon Apple device features, which has sparked some discussion.

MTLRasterizationRateMap

Rendering XR on Vision Pro is pretty similar to other devices. The Vision Pro gives you a texture array for color and a texture array for depth. That's all you need if you're not using foveated rendering.

The main difference is how Vision Pro handles foveated rendering using VRS (Variable Rate Shading). It uses MTLRasterizationRateMap, which is Apple's specific way of implementing VRS.

VRS comes in various forms on different platforms. Some devices provide a texture map to control the shading rate, but that only adjusts the shading rate; you still need to allocate a full-resolution buffer for pixel storage.

Apple takes this a step further. Why allocate extra pixels if they won't be rendered? Apple lets you have a distorted framebuffer, with the distortion controlled by MTLRasterizationRateMap. You can allocate a smaller, regular physical texture, where each row and column of this physical texture can have varying logical sizes on the screen. This is why a viewport can be bigger than the physical framebuffer size.

For example, let's say you want to render to a 5x5 screen, but you only expect high density in the center. You can only allocate a 3x3 texture, then stretch the border pixels to cover the whole screen.

Image

This way of VRS reduces both computation cost and memory consumption.

This solution has its own limitation. Your physical texture is still a regular grid. The MTLRasterizationRateMap only controls the logical sizes of the columns and rows. Therefore, the shading rate configuration is not as flexible as a shading rate map.

This is NOT a new feature. It is introduced since iOS 13. I think it's below godot's current minimun supported iOS version. It is available on nearly all apple devices, which actually can have more usage other than foveated rendering.

Current usage of MTLRasterizationRateMap seems hacky. It have no analog on other platform. RenderingDevice does not have an abstraction for it. @BastiaanOlij suggests to combine the physical texture and MTLRasterizationRateMap as a single texture, and hide all these details behind rendering device. I don't think this is a good idea. You WILL need explicit access to the MTLRasterizationRateMap to decode distortion in shader, especially when you want to do some screen space effect.

If RenderingDevice does not provide special construct to MTLRasterizationRateMap, I prefer the current implementation, which wraps MTLRasterizationRateMap as a special kind of texture. MTLRasterizationRateMap itself is not texture. It has its own type and APIs to be accessed in metal shader. If you want first class support for it in shader, you will need to modify the whole shader translation stack to emit correct code. That is way too much work. Instead, you can pre-decode MTLRasterizationRateMap as a LUT, with size of 2 x max(logical_width, logical_height), using a tiny metal compute shader. The LUT stores mapped physical texture coordinate from logical coordinate, one row for each xy axis. Accessing it in shader will be the same as reading a texture. I think this will be a good enough abstraction given limited time and resources.

Depth Output

CompositorService requires non-zero depth for non-zero alpha pixels. The simplest way to address this is to add an extra full-screen pass that globally sets the depth to a non-zero value. However, this approach requires further discussion.

If you perform a Metal frame capture in Xcode, you'll find some extra compute dispatches at the end of the frame. I haven't reverse-engineered what it's actually doing, but judging by the intermediate texture outputs, it looks like some kind of pixel binning.

My guess: CompositorService groups zero depth pixels together to accelerate passthrough. Zero depth tile => full passthrough with no blending. This can explain the blocky artifacts when depth and alpha are not correctly configured.

If my guess is true, we can't simply mark the entire depth buffer as non-zero. We might need a more sophisticated approach that only marks the non-empty pixels to achieve the best performance.

That said, maybe we should fix the artifact first with an additional full-screen draw, and improve performance later?

Conclusion

This PR provides a solid foundation for visionOS XR rendering support. While it does introduce some new design challenges, these problems are inherent to the device. Therefore, I suggest we merge this PR first and address other problems in subsequent work. These problems are much easier to solve than I thought to be. We do not need a seperate following PR to address them. I proposed some solutions to these problems, see my comments below.

do not rely on the user providing most of these settings (though enhancing this with auto detection features
based on the device we're running on would be cool). I'm mostly adding this as an example or base plate for
more advanced interfaces.
*/
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicated comment in modules/mobile_vr/mobile_vr_interface.h

Clancey added a commit to Clancey/godot that referenced this pull request Mar 7, 2026
… PR godotengine#109975

- Add recursive class hierarchy check for protocol conformance in
  framebuffer_create(), fixing potential foveation rate map detection
  failures on visionOS
- Support MTLCAPTURE_DESTINATION_DEVELOPER_TOOLS_ENABLE env var for
  Metal frame capture
Copy link
Copy Markdown
Contributor

@stuartcarnie stuartcarnie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have reviewed the code in iOS, visionOS, Apple Embedded and the renderer. Code looks good and changes make sense to me. I plan to test the changes on AVP hardware next to give the PR a final approval.

I have not reviewed the XR changes, as that the XR team's purview.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note

This is mostly code copied from godot_view_renderer.mm

Comment on lines +76 to +86
static bool class_conforms_to_protocol_recursive(Class p_class, Protocol *p_protocol) {
Class current = p_class;
while (current != nil) {
if (class_conformsToProtocol(current, p_protocol)) {
return true;
}
current = class_getSuperclass(current);
}
return false;
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine for now; I may end up moving it into NSObject with my patch later, if we need to use it elsewhere.

Comment on lines +547 to +552
// p_texture can contain a wrapped MTLRasterizationRateMap as returned by VisionOSXRInterface
// so we must check it responds to the allocatedSize selector first
Class cls = object_getClass((id)(void *)p_texture.id);
if (!class_respondsToSelector(cls, sel_registerName("allocatedSize"))) {
return 0;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this required still? sendMessageSafe checks if the instance responds to the selector:

template <typename _Ret, typename... _Args>
_NS_INLINE _Ret NS::Object::sendMessageSafe(const void* pObj, SEL selector, _Args... args)
{
if ((respondsToSelector(pObj, selector)) || (nullptr != methodSignatureForSelector(pObj, selector)))
{
return sendMessage<_Ret>(pObj, selector, args...);
}
if constexpr (!std::is_void<_Ret>::value)
{
return _Ret(0);
}
}

Copy link
Copy Markdown
Member

@bruvzg bruvzg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Platform changes look good. I have not checked XR specific part or renderer internals.

Needs rebase.

Copy link
Copy Markdown
Contributor

@BastiaanOlij BastiaanOlij left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry that it took awhile for me to get to this.

As before, I defer to Stuart and Bruvzg for changes on any of the Apple OS platform and metal driver changes.

Very glad to see the minimal changes on the rendering engine itself, kudos on moving all the logic into the metal drivers.

XRInterface wise this gets a huge thumbsup from me, looks like a very clean implementation.

Comment thread doc/classes/RenderingDevice.xml Outdated
Support for high dynamic range (HDR) output.
</constant>
<constant name="SUPPORTS_RASTERIZATION_RATE_MAP" value="14" enum="Features">
Support for rasterization rate maps on Apple platforms. This allows for foveated rendering, which is used by the visionOS XR module.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a nitpick but should we specifically be calling Apple out here, I'm assuming other vendors may implement their versions at some point so we may at some point be in a situation where other drivers also implement this functionality.

Comment thread modules/visionos_xr/visionos_xr_interface.h

// Only safe to be called from the render thread
uint32_t get_view_count();
Transform3D get_camera_transform();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm currently working on #116424 which will result in camera tracking and projection matrices to be accessed during process on the main thread. This for accurate and stable positioning of nodes and the ability to create additional render targets for effects (though that probably will only be used in PCVR scenarios).

Just checking if that will end up problematic for VisionOS?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once we try calling get_transform_for_view() and get_projection_for_view() from process(), it could become a problem in the separate threading model, but if you are ok with this, I'll address separately, given this PR is already on the merge queue.

Comment thread servers/rendering/rendering_device.cpp Outdated
@AThousandShips AThousandShips self-requested a review March 20, 2026 09:13
Comment thread modules/visionos_xr/doc_classes/VisionOSXRInterface.xml Outdated
Comment thread modules/visionos_xr/doc_classes/VisionOSXRInterface.xml Outdated
Comment thread modules/visionos_xr/doc_classes/VisionOSXRInterface.xml Outdated
Comment thread modules/visionos_xr/doc_classes/VisionOSXRInterface.xml Outdated
Comment thread modules/visionos_xr/visionos_xr_interface.mm Outdated
Comment thread modules/visionos_xr/visionos_xr_interface.mm Outdated
Comment thread modules/visionos_xr/visionos_xr_interface.mm Outdated
Comment thread modules/visionos_xr/visionos_xr_interface.mm Outdated
Comment thread modules/visionos_xr/visionos_xr_interface.mm Outdated
Comment thread modules/visionos_xr/visionos_xr_interface.mm Outdated
Copy link
Copy Markdown
Contributor

@stuartcarnie stuartcarnie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have run this branch on various devices and confirm XR and immersive mode works great!

I have opened the following PR to resolve merge conflicts and address all of @AThousandShips's feedback:

@rsanchezsaez if you prefer, I could do a complete rebase and squash the commits for you?

@akien-mga akien-mga requested a review from clayjohn March 23, 2026 15:24
@rsanchezsaez rsanchezsaez requested a review from a team as a code owner March 25, 2026 01:24
@rsanchezsaez
Copy link
Copy Markdown
Contributor Author

@stuartcarnie @bruvzg @BastiaanOlij @AThousandShips Thanks for the reviews and approvals! Stuart addressed @AThousandShips feedback, and rebased on top of master, and I addressed Bastiaan comments.

I added @stuartcarnie and @huisedenanhai as co-authors to the commit. Thanks for all the great work!

@stuartcarnie stuartcarnie requested a review from bruvzg March 25, 2026 01:41
Comment thread .github/CODEOWNERS
@stuartcarnie
Copy link
Copy Markdown
Contributor

stuartcarnie commented Apr 11, 2026

Comment thread .github/CODEOWNERS
Comment thread modules/visionos_xr/doc_classes/VisionOSXRInterface.xml Outdated
@stuartcarnie
Copy link
Copy Markdown
Contributor

Updated PRs to build visionOS (and all other platforms):

@stuartcarnie
Copy link
Copy Markdown
Contributor

@rsanchezsaez could you apply this patch to the platform_methods.py, as it is necessary for the visionOS build:

Index: platform_methods.py
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/platform_methods.py b/platform_methods.py
--- a/platform_methods.py	(revision 0e95347de3ed549642bec0571599defb1232b1fc)
+++ b/platform_methods.py	(date 1776222353751)
@@ -364,6 +364,7 @@
             SWIFTCFLAGS=[
                 "-resource-dir",
                 "/root/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib/swift",
+                "-enable-cross-import-overlays",
             ]
         )
 

Incidentally, xtool.sh does the same thing.

The visionOS XR module only supports the Mobile renderer for now, the Forward+ renderer is not supported.

To use the visionOS XR module you must set the new 'application/app_role' export setting to Immersive. You can choose if you want passthrough or not by the new 'application/immersion_style' export option.

Then, initialize the visionOS XR module in a script:

```
    var interface = XRServer.find_interface("visionOS")
    if interface and interface.initialize():
        var viewport : Viewport = get_viewport()
        viewport.use_xr = true
        viewport.vrs_mode = Viewport.VRS_XR
        viewport.use_hdr_2d = true
```

Implementation details:
- The visionOS platform now has two different execution paths implemented by the `GodotWindowScene` and `CompositorServicesImmersiveSpace` scenes in `app_visionos.swift`. The `application/app_role` export setting controls which scene is used.
- The visionOS XR interface tries to be as close to the OpenXR interface as possible, to keep main renderer code changes to a minimum. It adopts Compositor Services and ARKit APIs, which is how you render Metal content on visionOS.
- We obtain the head pose twice, once in `process()` in the game thread so scripts can use it if needed, and another from the render thread in set_frame(), so the rendered pose is accurate.
- The projection matrices returned by visionOS have an inverse depth correction applied (visionOS uses the [0, 1] z space, but Godot expects the [-1, 1] z space until the rendering step).
- The `rasterizationRateMap` (the structure that supports foveation on visionOS) is provided through the `get_vrs_texture()` function, using the new `XR_VRS_TEXTURE_FORMAT_RASTERIZATION_RATE_MAP` texture type. It's passed through the renderer when creating passes/subpasses, to be ultimately set by the Metal driver.
- Apple Vision Pro's' minimum supported near plane is `0.1`. There's a runtime error message that shows if you try to use a lower near plane.
- The Metal driver has a new dummy `SurfaceCompositorServices`, which replaces `SurfaceLayer` when running in immersive mode. The reason for this is that the Compositor Services API needs to do a `cp_drawable_encode_present()` step with the `MTLCommandBuffer` used by the renderer, and this seemed the most natural way of overriding the `present()` call normally done by the Metal driver.

Co-Authored-By: huisedenanhai <winser@pku.edu.cn>
Co-Authored-By: Stuart Carnie <stuart.carnie@gmail.com>
@rsanchezsaez
Copy link
Copy Markdown
Contributor Author

@stuartcarnie Done, thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.