visionOS XR module#109975
Conversation
33c345a to
adea1be
Compare
dsnopek
left a comment
There was a problem hiding this comment.
Thanks!
I skimmed the changes to XRInterface and provided some notes how these could maybe be unified with the way other platforms handle the same features. I'd personally prefer not to have virtual methods that are only for a single platform, which is something we've managed to avoid up until now
| virtual Rect2i get_viewport_for_view(uint32_t p_view) { | ||
| return Rect2i(); | ||
| } /* get each view viewport rect, used for visionOS VR */ |
There was a problem hiding this comment.
This seems like it may be like get_render_region() but needs to be different per view? If so, it might be better to add a p_view argument to get_render_region() (and provide compatibility methods in OpenXRInterfaceExtension), so that we have a single virtual method for this for all platforms, rather than get_viewport_for_view() which is just for VisionOS
There was a problem hiding this comment.
I'm going to work on this next. A question: conceptually, what's the difference between get_render_region() and get_render_target_size()?
There was a problem hiding this comment.
get_render_target_size should return the size of the render target used. The assumption is that we use the same resolution for both eyes and we use a layered texture, one layer per eye.
By default Godot will create this render target and render to each layer using the multiview extension (or equivelent), the XRInterface can however supply it's own render target texture if it requires additional setup (and since this often is part of a swapchain, updates on each frame to the destination currently in use).
We assume layers are always used due to relying on multiview, it seems stereo rendering where a single layer texture is used alongside with regions for a side by side layout was already going the way of the dodo at the time. Does Apple Vision Pro expect this capability?
get_render_region optionally returns a region within that texture that we're using so we can create dynamic resolution rendering without having to do expensive recreations of swapchains and such. This is generally only used if dynamic resolution is supported and enabled (through something like XR_META_recommended_layer_resolution).
If VisionPro just expects us to render to a fixed resolution texture, I would only use get_render_target_size.
There was a problem hiding this comment.
Thanks for the explanation, that makes sense.
Actually, I think the proper solution for visionOS would be moving what's currently in get_viewport_for_view() to get_render_region_for_view() as @dsnopek suggested, because the viewport must be set to the logical resolution, which is bigger than the actual texture size as returned by get_render_target_size() (to account for foveation).
There was a problem hiding this comment.
the viewport must be set to the logical resolution, which is bigger than the actual texture size as returned by
get_render_target_size()(to account for foveation).
Hm, so the render region is bigger than the render target size? Or the other way around?
The render target size is supposed to be the pixel dimensions of the texture we're rendering to, and, generally, the render region is smaller. This is to allow rendering to just a subset of the texture - it's basically what you'd pass to glViewport() (in OpenGL) or vkCmdSetViewport() (in Vukan) - I don't know the Metal equivalent. I'm not a rendering expert, but I don't know that it can make sense for the viewport (as used in the aforementioned OpenGL and Vulkan functions) to be bigger than the underlying texture?
Anyway, it would be help to get some more clarification on what the Rect2i will be used for in this case
There was a problem hiding this comment.
I think part of the reason is that the region is not only used to set the viewports, but it's also used in other parts of the pipeline as viewport_size is passed at the end of draw_list_begin() to draw_graph.add_draw_list_begin() and _draw_list_start()
That's correct. The render region that is passed to the draw list is distinct from setting viewports, as it specifies the area that may be affected by any initial attachment operations, such as CLEAR, which will only affect the specified region. After that, viewports can be set to limit the area the draw calls can affect.
In Metal, that controls the renderTargetWidth and renderTargetHeight of the MTLRenderPassDescriptor. The width is extended from x=0 to the total with, since it is not a region.
And it is similar for D3D12 and Vulkan. For example, in Vulkan, it affects the render area of the subpass:
godot/drivers/vulkan/rendering_device_driver_vulkan.cpp
Lines 4689 to 4692 in b962b38
There was a problem hiding this comment.
So, it's starting to sound like the get_viewport_for_view() is for something distinct from get_render_region().
Does anyone know if (and how) this concept exists on other platforms? I've skimmed the links provided above, and this seems related to VRS, but I don't think we need this data for VRS on other platforms.
In my opinion, if this is something totally unique to VisionOS, then this shouldn't be on XRInterface, and instead should be communicated directly to Metal via some back channel
There was a problem hiding this comment.
@dsnopek Yeah, if this is a concept on visionOS only, it makes sense not to add it to xr_interface. Do you have any suggestions for the cleanest way of backchannel this? Move this method to visionos_xr_interface and access it directly from there?
There was a problem hiding this comment.
I'm not sure exactly - this probably needs some input from the rendering team.
But I will say that, if possible, you'd probably want code from the visionos platform reaching into the renderer, rather than renderer code reaching into the visionos platform. That's generally the preferred data flow, but I'm not sure if that'd be possible, or what the rendering team would say
There was a problem hiding this comment.
Is p_view argument really needed? If I understand correctly, you have different viewports for each eyes only if you configure the CompositorLayer layout to .shared.
In app_visionos.swift, you have configure the CompositorLayer layout to .layered. After some testing, I found .layered is the only option that supportes foveat rendering and runs without crash in current implementation.
I logged the viewport returned by this method. It always return the same value for both eyes.
3b24eb6 to
e82fb78
Compare
a0451d2 to
94f2dc0
Compare
| void set_head_pose_from_arkit(bool p_use_drawable); | ||
|
|
||
| public: | ||
| static StringName name() { return "visionOS"; } |
There was a problem hiding this comment.
Why have this name() function? It doesn't seem to be any different than get_name(), other than not being virtual. If the goal is to avoid calling the virtual function, the StringName could just be put in a static variable?
There was a problem hiding this comment.
The objective was to provide the find_interface helper function, but I agree this makes more sense as a static variable. I'll update it.
static StringName name() { return "visionOS"; }
static Ref<VisionOSXRInterface> find_interface() {
return XRServer::get_singleton()->find_interface(name());
}
| virtual Rect2i get_viewport_for_view(uint32_t p_view) { | ||
| return Rect2i(); | ||
| } /* get each view viewport rect, used for visionOS VR */ |
There was a problem hiding this comment.
So, it's starting to sound like the get_viewport_for_view() is for something distinct from get_render_region().
Does anyone know if (and how) this concept exists on other platforms? I've skimmed the links provided above, and this seems related to VRS, but I don't think we need this data for VRS on other platforms.
In my opinion, if this is something totally unique to VisionOS, then this shouldn't be on XRInterface, and instead should be communicated directly to Metal via some back channel
fd72e79 to
c3d534e
Compare
BastiaanOlij
left a comment
There was a problem hiding this comment.
Hey Ricardo,
I finally had some proper time to look into this, though I lost a bunch of time getting everything setup and up and running and ran afoul with an issue with my Apple developer account (sigh) so I haven't been able to do a full end to end test yet.
Did go through all the code finally, started just putting some remarks on code but gave up on that as the real stuff is all joined together.
Also this is a big brain dump after a long day of trying to figure things out so probably plenty of things that need further feedback and throwing back and forth.
Overall I think we still have some ways to go. We need to find different ways to solve a few things.
As a ground rule, platform dependent code should be kept to the platform. There is a lot of vision pro specific code that is ending up in the renderer and that won't ever be acceptable to merge. We kinda knew that already so I'm probably stating the obvious.
We can get away with some constructive solutions on the Metal driver side, and around the XRInterface implementation as they are closely linked to the platform.
While we should first get things on the correct path and see if this is still the case, I also found that there are changes here that probably deserve to be in their own PR. Especially if we do need to make changes to the certain base classes, they should be separated out so the relevant teams can review them more closely.
So I think there are 3 main areas that we can focus on first. 2 relatively small ones, one big one.
HDR output
I have to admit that I'm not up to date with the work being done in this area but I do know that @allenwp is working on various things related to output to HDR capable displays, so he may be able to weight in here if he has time.
Bottom line for me is that users know they are targeting VisionPro and can turn HDR on for the main viewport and possibly preventing conversion to the wrong color space) is a function of the display server.
This way we don't need an extra API on the XR interface nor pollute the rendering code with checks.
Sky depth changes
This one puzzled me. Does it really have an issue if the depth is at zero? Again similar thing, it should not have a VisionPro overrule in this place.
We need to either expose it in some generic way, or maybe just set the depth for Multiview as it probably can't hurt for other devices.
I would also change the vertex shader to gl_Position = vec4(uv_interp, SKY_DEPTH, 1.0); or something to that effect, instead of manipulating gl_FragDepth or certain platforms will disable early z checks.
This is also a good example of something that should be in its own PR and clearly documented why we're making the change.
VRS/rasterization map
So finally the biggy, this one we need to drastically change. We can't have such big changes to the rendering server just for VisionPro.
The problem here is that this is all new to us, and we're still wrapping our heads around it, and don't really have the time to dedicate to it.
If I understand correctly the main difference is that when we're dealing with OpenGL/Vulkan, we render at a larger resolution which then gets lens distorted by the VR compositor and output to the device. To speed this up, we use VRS so we don't render every pixel in that large image, but duplicate the output of fragments the further from the center we get. Most mobile hardware does something smart by having tiles on the periphery cover larger areas and just upscale when writing to the image.
On VisionPro, our stored image isn't at the large resolution, I'm guessing its either on the output resolution, or something sightly enlarged to pack in the tiled results? I don't fully get whether it's just applying lens distortion early, or whether it's just smartly packing the tiles so it needs less space. Doesn't really matter i guess.
Then we have a virtual resolution which would be the actual texture as we'd render it before lens distortion without any VRS applied. Each tile is the same resolution but tiles rendered towards the outside cover a larger area then tiles closer to the center, and they are written into our output image.
If this assumption is close enough to the truth, I think we can rewrite this logic to make the rendering server completely ignorant of the virtual resolution. We don't need any new functions on the XR Interface, we toss away most of the changes on the renderer classes (maybe all), we just pretend that all that exists is the output resolution, because that is the resolution at which our images are being allocated.
The trick will be to make sure the driver knows when and what virtual resolution to apply, and with which rasterization map.
This is where Godots RID system becomes handy because the RID can point to a struct that has both the rasterization map, and the virtual resolution. Now we can use this RID as the VRS RID, and have the VRS RID being set be the trigger for the driver to know additional stuff needs to be done. We probably do need to wrap this in a texture RID to make the rendering server happy but thats no biggy.
Note that there are two modes (because Vulkan has two ways of doing density map VRS), one where you can set it per subpass, and one where its set for the whole pass. We don't need a new field, we can just reuse the existing one.
Then in the driver when the VRS RID is set, you can lookup the rasterization map and virtual resolution and apply it. The whole rendering engine needs to be none the wiser we're doing this extra logic, as long as the metal driver understands it.
I think you don't even need the new VRS format, just use XR_VRS_TEXTURE_FORMAT_FRAGMENT_DENSITY_MAP and return the above mentioned RID with get_vrs_texture.
Anyway, its way too late, I'm probably overlooking things, but in broad strokes, I think this is a direction we should consider and talk more about, and the solution should have a lot less impact on existing code than it has now.
| } | ||
|
|
||
| if (get_near() < 0.1) { | ||
| warnings.push_back(RTR("XRCamera3D doesn't support a Near value lower than 0.1 on the visionOS platform. The scene won't be displayed.")); |
There was a problem hiding this comment.
This seems arbitrary, people will hold things closer to their face than 10cm and set this to a lower value on many platforms. Not sure if everyone should be bothered by a warning for this.
Does this break break stuff on visionos?
There was a problem hiding this comment.
In VisionOSXRInterface::get_projection_for_view, the supported min near plane is retrieved with cp_layer_renderer_capabilities_supported_minimum_near_plane_distance. But here the magic number 0.1 occurs.
I haven't found any documentation that says about the actual value returned by layer capabilities.
The value 0.1 itself is fine. Given the thickness of VP headset is roughly 5cm. But will the actual value changes if some thinner model reveals in the future?
There was a problem hiding this comment.
Yeah, @huisedenanhai is right, we cannot have a zNear value lower than cp_layer_renderer_capabilities_supported_minimum_near_plane_distance on visionOS, the API refuses to render anything if you set it so. For now, I have removed the editor warning, and added a descriptive error message explaining the issue if it happens at runtime here
| warnings.push_back(RTR("XROrigin3D requires an XRCamera3D child node.")); | ||
| } | ||
|
|
||
| if (get_scale() != Vector3(1, 1, 1)) { |
There was a problem hiding this comment.
This should possible be in its own PR, this has trapped people before. I'd also add a reference to world_scale which was introduced to allow scaling of this sort and properly scales the projection data.
| PipelineCacheRD *pipeline = &shader_data->pipelines[sky_scene_state.view_count > 1 ? SKY_VERSION_BACKGROUND_MULTIVIEW : SKY_VERSION_BACKGROUND]; | ||
| SkyVersion version; | ||
| if (sky_scene_state.view_count > 1) { | ||
| #if defined(VISIONOS_ENABLED) |
There was a problem hiding this comment.
We need a different way to solve this, having platform dependent code in the renderer is a big nono.
There was a problem hiding this comment.
Adding a tiny depth is the solution to the blocky artifact I mentioned previously.
But I have to say, requiring application to mark full screen a tiny depth is a very bad design in CompositorService. Why we have to care about final frame depth even in full immersion mode? In Full immersion mode, the compositor already knows every pixel except upper limb passthrough will be overwritten by application output. Why cann't the Compositor work correctly when depth is zero?
There was a problem hiding this comment.
Yeah, it's the way the system is designed, as an optimization to avoid doing reprojection on parts of the screen when nothing is being rendered. I have removed this from the PR, and we can figure out solutions for the skybox and/or semi-transparent content separately.
| } | ||
| } | ||
|
|
||
| #ifdef VISIONOS_ENABLED |
There was a problem hiding this comment.
Thinks like this, I would remove the #ifdef if we can, instead just use:
if (subpass.rasterization_rate_map != nil) {
desc.rasterizationRateMap = subpass.rasterization_rate_map;
} else {
desc.renderTargetWidth = MAX((NSUInteger)MIN(render.render_area.position.x + render.render_area.size.width, fb.size.width), 1u);
desc.renderTargetHeight = MAX((NSUInteger)MIN(render.render_area.position.y + render.render_area.size.height, fb.size.height), 1u);
}
And just make sure subpass.rasterization_rate_map can't ever be set unless its supported by the platform.
This way we make the code much cleaner and we just implement the features as part of the metal implementation, and just leave it up to the platforms supporting it to use these features.
There was a problem hiding this comment.
Agree. MTLRasterizationRateMap is supported on iOS 13+. It can used for some interesting optimization for mobile device. For example, reducing scene rendering rate under HUD. No need to mark it visionOS only.
There was a problem hiding this comment.
Thanks both. I ended up adopting @huisedenanhai proposed solution of storing the rasterizationRateMap in MDFrameBuffer.
|
Thanks for the ping, @BastiaanOlij! HDR Output@rsanchezsaez HDR output in Godot will be using Apple's Extended Dynamic Range (EDR) paradigm and adapted to work well across all platforms. Main development is happening in #94496 and will likely be merged soon, as it's a priority for 4.6. @stuartcarnie put together a rough draft prototype of how this might work on Mac OS and I have done a rebase or two that has resulted in the draft PR getting even more "hacky": #106814 Once we've merged the Windows HDR output, we will shift to making sure Mac OS is implemented correctly. We will be having a meeting on Sunday to firm up the public-facing part of the HDR output API. We're coordinating in RocketChat in the rendering chat channel and you're welcome to join if you'd like, but the focus will be specifically on the user-facing HDR API and ensuring it will be reasonable for all platforms. Does VisionOS use the same EDR and colour space paradigms as Mac OS and iOS with the same APIs? |
|
Hi @rsanchezsaez. I tried the code and it works great! I haven't look into the implementation, but I do have some suggestions from a user's perspective. Need some documentation/warning on transparent objects and backgroundWhen in mixed immersion mode, CompositorService requires zero alpha channel if depth is zero. otherwise blocky artifact occurs. This poses some issue when setting up background and transparent objects, as they don't write depth, but provide a non zero alpha value. This is what it looks like when open a existing scene directly in mixed immersion mode before adjustment. Notice the blocky grey artifacts around the character, and the transparent halo object is no displayed. That's because the background has non-zero alpha value, and the halo object does not writes depth. The following image shows how the character should look like in normal window.
Recently I got some experience on CompositorService by implementing a custom AR renderer with metal and objc. So I'm no suprised by the artifact. The solution for me it quite simple. Adjust the background alpha to zero, force the halo object as opaque, done. But this needs more clarification for normal users. Here is how my scene looks like after fix. I suspect correctly handling transparent object rendering for visionOS might worth another PR. I only suggest some notes in documentation after this PR got merged. Warning for near planeWarning for near plane does not correctly update in editor after param adjustment. This is important. In current implementation, NOTHING is shown in view if near plane is less than 0.1m. That's quite awkward. Because the default value is 0.05m.
Also is it possible to clamp the near plane to 0.1m automatically? That might be more intuitive than showing an empty space and spaming the log with warning. |
In case it helps someone else, these were the settings I changed based on the original demo used at the top of this PR to fix the same problem. It's not exactly @huisedenanhai's solution and likely has unimportant additions, but I'm just confirming that changes like this would be needed to remove the blocky halo effect from Mixed immersion experiences. I'm not as familiar with the inner workings of CompositorService, but if something like this does make it into the documentation direct instructions akin to the following could be helpful: world environment settings: I've been playing around with visionOS development and identified a couple potential errors, but they seem obscure and situational enough that I'm planning on waiting until after this PR is merged to make issues for them. All in all, great work @rsanchezsaez! It's been amazing experimenting with visionOS experiences from this PR. |
| VIEWPORT_VRS_DISABLED, | ||
| VIEWPORT_VRS_TEXTURE, | ||
| VIEWPORT_VRS_XR, | ||
| VIEWPORT_VRS_XR_RASTERIZATION_RATE_MAP, |
There was a problem hiding this comment.
The enum has been unified to VIEWPORT_VRS_XR. But the enum itself has not been deleted.
| Variable rate shading texture is supplied by the primary [XRInterface]. Note that this may override the update mode. | ||
| </constant> | ||
| <constant name="VIEWPORT_VRS_MAX" value="3" enum="ViewportVRSMode"> | ||
| <constant name="VIEWPORT_VRS_XR_RASTERIZATION_RATE_MAP" value="3" enum="ViewportVRSMode"> |
There was a problem hiding this comment.
The enum seems no longer used
| BIND_ENUM_CONSTANT(VIEWPORT_VRS_DISABLED); | ||
| BIND_ENUM_CONSTANT(VIEWPORT_VRS_TEXTURE); | ||
| BIND_ENUM_CONSTANT(VIEWPORT_VRS_XR); | ||
| BIND_ENUM_CONSTANT(VIEWPORT_VRS_XR_RASTERIZATION_RATE_MAP); |
There was a problem hiding this comment.
The enum has been unified to VIEWPORT_VRS_XR. But the enum itself has not been deleted.
|
|
||
| let options: LayerRenderer.Capabilities.SupportedLayoutsOptions = foveationEnabled ? [.foveationEnabled] : [] | ||
| let supportedLayouts = capabilities.supportedLayouts(options: options) | ||
| let layout: LayerRenderer.Layout = supportedLayouts.contains(.layered) ? .layered : .dedicated |
There was a problem hiding this comment.
Will there be device that does not support layered mode?
I don't see the need to support layout other than .layered.
Also current implementation crashes when I force the layout to be .dedicated.
There was a problem hiding this comment.
Good point. .layered is the recommended mode, and new visionOS 26 features need this mode. Hence, I have changed this file to hardcode .layered.
| virtual Rect2i get_viewport_for_view(uint32_t p_view) { | ||
| return Rect2i(); | ||
| } /* get each view viewport rect, used for visionOS VR */ |
There was a problem hiding this comment.
Is p_view argument really needed? If I understand correctly, you have different viewports for each eyes only if you configure the CompositorLayer layout to .shared.
In app_visionos.swift, you have configure the CompositorLayer layout to .layered. After some testing, I found .layered is the only option that supportes foveat rendering and runs without crash in current implementation.
I logged the viewport returned by this method. It always return the same value for both eyes.
| virtual void initialize() = 0; | ||
| virtual void begin_frame(double frame_step) = 0; | ||
|
|
||
| virtual Error prepare_screen_for_drawing(DisplayServer::WindowID p_screen) { return OK; } |
There was a problem hiding this comment.
This method is a simple wrapper for RD::get_singleton()->screen_prepare_for_drawing(p_screen). Seems redundent to me. Just call the inner method when needed. No need to add a virtual method to the base class.
There was a problem hiding this comment.
You're right, furthermore, I've moved this call to VisionOSXRInterface::post_draw_viewport(), to avoid platform-specific code in the main rendering path.
| PipelineCacheRD *pipeline = &shader_data->pipelines[sky_scene_state.view_count > 1 ? SKY_VERSION_BACKGROUND_MULTIVIEW : SKY_VERSION_BACKGROUND]; | ||
| SkyVersion version; | ||
| if (sky_scene_state.view_count > 1) { | ||
| #if defined(VISIONOS_ENABLED) |
There was a problem hiding this comment.
Adding a tiny depth is the solution to the blocky artifact I mentioned previously.
But I have to say, requiring application to mark full screen a tiny depth is a very bad design in CompositorService. Why we have to care about final frame depth even in full immersion mode? In Full immersion mode, the compositor already knows every pixel except upper limb passthrough will be overwritten by application output. Why cann't the Compositor work correctly when depth is zero?
| frag_color.rgb += interleaved_gradient_noise(gl_FragCoord.xy) * params.luminance_multiplier; | ||
| #endif | ||
| #ifdef WRITE_DEPTH | ||
| gl_FragDepth = SKY_DEPTH; |
There was a problem hiding this comment.
No. Don't adjust depth in fragment shader. That disables early-Z test. Very inefficient when user have heavy sky shader.
If you want to apply a global depth offset, you can adjust z coordinate in vertex shader.
In vertex shader, change the line
gl_Position = vec4(uv_interp, 0.0, 1.0);to
gl_Position = vec4(uv_interp, SKY_DEPTH, 1.0);There was a problem hiding this comment.
I have reverted this for now for simplicity, and we can discuss other solutions separately.
There was a problem hiding this comment.
I have reviewed all codes. The implementation is good IMO. Great thanks to @rsanchezsaez. This PR uses some uncommon Apple device features, which has sparked some discussion.
MTLRasterizationRateMap
Rendering XR on Vision Pro is pretty similar to other devices. The Vision Pro gives you a texture array for color and a texture array for depth. That's all you need if you're not using foveated rendering.
The main difference is how Vision Pro handles foveated rendering using VRS (Variable Rate Shading). It uses MTLRasterizationRateMap, which is Apple's specific way of implementing VRS.
VRS comes in various forms on different platforms. Some devices provide a texture map to control the shading rate, but that only adjusts the shading rate; you still need to allocate a full-resolution buffer for pixel storage.
Apple takes this a step further. Why allocate extra pixels if they won't be rendered? Apple lets you have a distorted framebuffer, with the distortion controlled by MTLRasterizationRateMap. You can allocate a smaller, regular physical texture, where each row and column of this physical texture can have varying logical sizes on the screen. This is why a viewport can be bigger than the physical framebuffer size.
For example, let's say you want to render to a 5x5 screen, but you only expect high density in the center. You can only allocate a 3x3 texture, then stretch the border pixels to cover the whole screen.
This way of VRS reduces both computation cost and memory consumption.
This solution has its own limitation. Your physical texture is still a regular grid. The MTLRasterizationRateMap only controls the logical sizes of the columns and rows. Therefore, the shading rate configuration is not as flexible as a shading rate map.
This is NOT a new feature. It is introduced since iOS 13. I think it's below godot's current minimun supported iOS version. It is available on nearly all apple devices, which actually can have more usage other than foveated rendering.
Current usage of MTLRasterizationRateMap seems hacky. It have no analog on other platform. RenderingDevice does not have an abstraction for it. @BastiaanOlij suggests to combine the physical texture and MTLRasterizationRateMap as a single texture, and hide all these details behind rendering device. I don't think this is a good idea. You WILL need explicit access to the MTLRasterizationRateMap to decode distortion in shader, especially when you want to do some screen space effect.
If RenderingDevice does not provide special construct to MTLRasterizationRateMap, I prefer the current implementation, which wraps MTLRasterizationRateMap as a special kind of texture. MTLRasterizationRateMap itself is not texture. It has its own type and APIs to be accessed in metal shader. If you want first class support for it in shader, you will need to modify the whole shader translation stack to emit correct code. That is way too much work. Instead, you can pre-decode MTLRasterizationRateMap as a LUT, with size of 2 x max(logical_width, logical_height), using a tiny metal compute shader. The LUT stores mapped physical texture coordinate from logical coordinate, one row for each xy axis. Accessing it in shader will be the same as reading a texture. I think this will be a good enough abstraction given limited time and resources.
Depth Output
CompositorService requires non-zero depth for non-zero alpha pixels. The simplest way to address this is to add an extra full-screen pass that globally sets the depth to a non-zero value. However, this approach requires further discussion.
If you perform a Metal frame capture in Xcode, you'll find some extra compute dispatches at the end of the frame. I haven't reverse-engineered what it's actually doing, but judging by the intermediate texture outputs, it looks like some kind of pixel binning.
My guess: CompositorService groups zero depth pixels together to accelerate passthrough. Zero depth tile => full passthrough with no blending. This can explain the blocky artifacts when depth and alpha are not correctly configured.
If my guess is true, we can't simply mark the entire depth buffer as non-zero. We might need a more sophisticated approach that only marks the non-empty pixels to achieve the best performance.
That said, maybe we should fix the artifact first with an additional full-screen draw, and improve performance later?
Conclusion
This PR provides a solid foundation for visionOS XR rendering support. While it does introduce some new design challenges, these problems are inherent to the device. Therefore, I suggest we merge this PR first and address other problems in subsequent work. These problems are much easier to solve than I thought to be. We do not need a seperate following PR to address them. I proposed some solutions to these problems, see my comments below.
| do not rely on the user providing most of these settings (though enhancing this with auto detection features | ||
| based on the device we're running on would be cool). I'm mostly adding this as an example or base plate for | ||
| more advanced interfaces. | ||
| */ |
There was a problem hiding this comment.
Duplicated comment in modules/mobile_vr/mobile_vr_interface.h
… PR godotengine#109975 - Add recursive class hierarchy check for protocol conformance in framebuffer_create(), fixing potential foveation rate map detection failures on visionOS - Support MTLCAPTURE_DESTINATION_DEVELOPER_TOOLS_ENABLE env var for Metal frame capture
stuartcarnie
left a comment
There was a problem hiding this comment.
I have reviewed the code in iOS, visionOS, Apple Embedded and the renderer. Code looks good and changes make sense to me. I plan to test the changes on AVP hardware next to give the PR a final approval.
I have not reviewed the XR changes, as that the XR team's purview.
There was a problem hiding this comment.
Note
This is mostly code copied from godot_view_renderer.mm
| static bool class_conforms_to_protocol_recursive(Class p_class, Protocol *p_protocol) { | ||
| Class current = p_class; | ||
| while (current != nil) { | ||
| if (class_conformsToProtocol(current, p_protocol)) { | ||
| return true; | ||
| } | ||
| current = class_getSuperclass(current); | ||
| } | ||
| return false; | ||
| } | ||
|
|
There was a problem hiding this comment.
This is fine for now; I may end up moving it into NSObject with my patch later, if we need to use it elsewhere.
| // p_texture can contain a wrapped MTLRasterizationRateMap as returned by VisionOSXRInterface | ||
| // so we must check it responds to the allocatedSize selector first | ||
| Class cls = object_getClass((id)(void *)p_texture.id); | ||
| if (!class_respondsToSelector(cls, sel_registerName("allocatedSize"))) { | ||
| return 0; | ||
| } |
There was a problem hiding this comment.
Is this required still? sendMessageSafe checks if the instance responds to the selector:
godot/thirdparty/metal-cpp/Foundation/NSObject.hpp
Lines 238 to 250 in 03d4c49
bruvzg
left a comment
There was a problem hiding this comment.
Platform changes look good. I have not checked XR specific part or renderer internals.
Needs rebase.
BastiaanOlij
left a comment
There was a problem hiding this comment.
Sorry that it took awhile for me to get to this.
As before, I defer to Stuart and Bruvzg for changes on any of the Apple OS platform and metal driver changes.
Very glad to see the minimal changes on the rendering engine itself, kudos on moving all the logic into the metal drivers.
XRInterface wise this gets a huge thumbsup from me, looks like a very clean implementation.
| Support for high dynamic range (HDR) output. | ||
| </constant> | ||
| <constant name="SUPPORTS_RASTERIZATION_RATE_MAP" value="14" enum="Features"> | ||
| Support for rasterization rate maps on Apple platforms. This allows for foveated rendering, which is used by the visionOS XR module. |
There was a problem hiding this comment.
Just a nitpick but should we specifically be calling Apple out here, I'm assuming other vendors may implement their versions at some point so we may at some point be in a situation where other drivers also implement this functionality.
|
|
||
| // Only safe to be called from the render thread | ||
| uint32_t get_view_count(); | ||
| Transform3D get_camera_transform(); |
There was a problem hiding this comment.
I'm currently working on #116424 which will result in camera tracking and projection matrices to be accessed during process on the main thread. This for accurate and stable positioning of nodes and the ability to create additional render targets for effects (though that probably will only be used in PCVR scenarios).
Just checking if that will end up problematic for VisionOS?
There was a problem hiding this comment.
Once we try calling get_transform_for_view() and get_projection_for_view() from process(), it could become a problem in the separate threading model, but if you are ok with this, I'll address separately, given this PR is already on the merge queue.
stuartcarnie
left a comment
There was a problem hiding this comment.
I have run this branch on various devices and confirm XR and immersive mode works great!
I have opened the following PR to resolve merge conflicts and address all of @AThousandShips's feedback:
@rsanchezsaez if you prefer, I could do a complete rebase and squash the commits for you?
b990e62 to
82ef7d3
Compare
82ef7d3 to
acc0704
Compare
|
@stuartcarnie @bruvzg @BastiaanOlij @AThousandShips Thanks for the reviews and approvals! Stuart addressed @AThousandShips feedback, and rebased on top of master, and I addressed Bastiaan comments. I added @stuartcarnie and @huisedenanhai as co-authors to the commit. Thanks for all the great work! |
acc0704 to
7dc239d
Compare
7dc239d to
683be6b
Compare
|
Important visionOS templates can now be build using the official Linux build containers with the following PRs: |
683be6b to
0e95347
Compare
|
Updated PRs to build visionOS (and all other platforms): |
|
@rsanchezsaez could you apply this patch to the platform_methods.py, as it is necessary for the visionOS build: Index: platform_methods.py
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/platform_methods.py b/platform_methods.py
--- a/platform_methods.py (revision 0e95347de3ed549642bec0571599defb1232b1fc)
+++ b/platform_methods.py (date 1776222353751)
@@ -364,6 +364,7 @@
SWIFTCFLAGS=[
"-resource-dir",
"/root/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib/swift",
+ "-enable-cross-import-overlays",
]
)
Incidentally, xtool.sh does the same thing. |
The visionOS XR module only supports the Mobile renderer for now, the Forward+ renderer is not supported.
To use the visionOS XR module you must set the new 'application/app_role' export setting to Immersive. You can choose if you want passthrough or not by the new 'application/immersion_style' export option.
Then, initialize the visionOS XR module in a script:
```
var interface = XRServer.find_interface("visionOS")
if interface and interface.initialize():
var viewport : Viewport = get_viewport()
viewport.use_xr = true
viewport.vrs_mode = Viewport.VRS_XR
viewport.use_hdr_2d = true
```
Implementation details:
- The visionOS platform now has two different execution paths implemented by the `GodotWindowScene` and `CompositorServicesImmersiveSpace` scenes in `app_visionos.swift`. The `application/app_role` export setting controls which scene is used.
- The visionOS XR interface tries to be as close to the OpenXR interface as possible, to keep main renderer code changes to a minimum. It adopts Compositor Services and ARKit APIs, which is how you render Metal content on visionOS.
- We obtain the head pose twice, once in `process()` in the game thread so scripts can use it if needed, and another from the render thread in set_frame(), so the rendered pose is accurate.
- The projection matrices returned by visionOS have an inverse depth correction applied (visionOS uses the [0, 1] z space, but Godot expects the [-1, 1] z space until the rendering step).
- The `rasterizationRateMap` (the structure that supports foveation on visionOS) is provided through the `get_vrs_texture()` function, using the new `XR_VRS_TEXTURE_FORMAT_RASTERIZATION_RATE_MAP` texture type. It's passed through the renderer when creating passes/subpasses, to be ultimately set by the Metal driver.
- Apple Vision Pro's' minimum supported near plane is `0.1`. There's a runtime error message that shows if you try to use a lower near plane.
- The Metal driver has a new dummy `SurfaceCompositorServices`, which replaces `SurfaceLayer` when running in immersive mode. The reason for this is that the Compositor Services API needs to do a `cp_drawable_encode_present()` step with the `MTLCommandBuffer` used by the renderer, and this seemed the most natural way of overriding the `present()` call normally done by the Metal driver.
Co-Authored-By: huisedenanhai <winser@pku.edu.cn>
Co-Authored-By: Stuart Carnie <stuart.carnie@gmail.com>
0e95347 to
b2cd272
Compare
|
@stuartcarnie Done, thank you! |


PR co-authored by @huisedenanhai and @stuartcarnie
Dear Godot community,
This PR is a followup to #105628 and #109974 (I encourage you to read the descriptions of those PR to get the full context of this contribution).
This change adds a visionOS XR module to Godot, which allows building immersive experiences on visionOS.
This PR builds on top of #109974, so that one should be merged first.
As always, we have tried to align with Godot's quality, style, and engine direction, to minimize any disruption coming from these changes. We'll be more than happy to work with the community on iterating this patch.
cc @BastiaanOlij @stuartcarnie @bruvzg @clayjohn
Technical Discussion
We have developed and tested these changes using Xcode 26.
The visionOS XR module only supports the Mobile renderer for now, the Forward+ renderer is not supported.
To use the visionOS XR module you must set the new
application/app_roleexport setting toImmersive. You can choose if you want passthrough or not by the newapplication/immersion_styleexport setting.Then, initialize the visionOS XR module in a script:
Some implementation details:
GodotWindowSceneandCompositorServicesImmersiveSpacescenes inapp_visionos.swift. Theapplication/app_roleexport setting controls which scene is used.process()so the camera and scripts can use it if needed, and another from the render thread inpre_render(), which is needed for rendering. The object cannot easily be shared between threads.rasterizationRateMap(the structure that supports foveation on visionOS) is provided through theget_vrs_texture()function, using the newXR_VRS_TEXTURE_FORMAT_RASTERIZATION_RATE_MAPtexture type. It's' passed through the renderer when creating passes/subpasses, to be ultimately set by the Metal driver.0.1. The visionOS XR module'sget_projection_for_view()function will fail with an error message if the camera has a lowerzNear.SurfaceCompositorServices, which replacesSurfaceLayerwhen running in immersive mode. The reason for this is that the Compositor Services API needs to do acp_drawable_encode_present()step with theMTLCommandBufferused by the renderer, and this seemed the most natural way of overriding thepresent()call normally done by the Metal driver.Testing
We have been testing this PR in windowed mode with the Platformer demo project. We have verified the project continues to work on iOS and visionOS, both with the Mobile and Forward+ renderers using the Metal rendering driver.
I'm also including a minimum example of a project rendering an immersive scene on visionOS. Currently the example has no interactivity, but you can move your head and look around the scene.
It looks like this:
ScreenRecording_08-25-2025.16-14-23_1_out.mp4
Missing Functionality
Next steps
We have working pinch input, hand tracking, and PSVR2 controller tracking ready. We'll submit these as incremental PRs, to keep isolated functionality in each of the PRs for easier review. Once these areas are done, we believe porting any existing VR games to visionOS could be relatively easy.