[10.x] Metal support updates [Hold] #1795

aet · 2020-09-08T23:42:30Z

Purpose of this PR

Improve Metal support on macOS Intel/Apple Silicon.

Currently Apple GPUs don't support MSAA 8x, so add a validation pass to properly probe and clamp MSAASAmples to something supported (historically some GPUs out there don't support 2x either but only 4x, not that it should be a problem for this pipeline)
GraphicsFormat.R32_SFloat with MSAA not supported on Apple GPU, use GraphicsFormat.R16_SFloat as a fallback
With Metal, use cmd.ResolveAntiAliasedSurface when textures match. Fixes graphics issues with some GPU drivers
Modify TextureXR abstraction layer for improving VFX compatibility. Tested against Spaceship demo that now behaves correctly (main menu glitches as expected, sparkles bounce off the surfaces).
TextureXR is currently built on assumption that of course XR SPI is there too, now modified to work without XR SPI on certain targets
Modify TextureXR usage so it can be used even if GfxDevice doesn't support TextureArrays with MSAA (or isn't enabled due to OS/device fragmentation)
Enable precision variants for TEXTURE2D_ARRAY_FLOAT/TEXTURECUBE_ARRAY_FLOAT/TEXTURE2D_ARRAY_HALF/TEXTURECUBE_ARRAY_HALF, it's been supported by Unity shader compiler a long time now
Add a DeviceInfo helper class that's enabling us to do potentially GPU/Target specific runtime changes that don't make sense to be a pipeline settings option. Allow picking up compute kernels based on GPU preference.
Refactor some older Nintendo Switch/iOS assumptions to use DeviceInfo. Apple GPU optimized variant of FPTL is in use on iOS atm, but it's not yet enabled by runtime for Apple GPU on macOS.
Because of above, now multi-compiling PLATFORM_LANE_COUNT_32 variants of FPTL. If necessary, should be easy to strip away on platforms not needing one.
Add 3D texture max dimensions clamping into VolumetricLighting.cs to avoid Metal API Validation in the editor while dynamically changing the resolution
Fix a rendering issue with FPTL when running as thread32 configuration against Apple GPUs. After speaking with Morten, it sounds like we should do a cleanup pass for our GroupMemoryBarrierWithGroupSync() usage, but that can wait to another PR.
Update OSXEditor/OSXPlayer template images. Will do a separate PR for adding AMD/Apple GPU specific templates later that are running on macOS 10.15 or later, current CI OS/GPU situation doesn't do justice how little there's image reference differences compared to DX11 on either GPUs.
Remove _DebugDisplayPad0 from ShaderVariablesDebugDisplay cbuffer, was causing alignment issue and doesn't seem to be needed

Testing status

Manual Tests: What did you do?

Opened test project + Run graphic tests locally
Built a player
Checked new UI names with UX convention
Tested UI multi-edition + Undo/Redo + Prefab overrides + Alignment in Preset
C# and shader warnings (supress shader cache to see them)
Checked new resources path for the reloader (in developer mode, you have a button at end of resources that check the paths)
Other:

Automated Tests: What did you setup? (Add a screenshot or the reference image of the test please)

Yamato: (Select your branch):
https://yamato.prd.cds.internal.unity3d.com/jobs/902-Graphics

Any test projects to go with this to help reviewers?

Comments to reviewers

Notes for the reviewers you have assigned.

…6-byte aligned

…ATFORM_LANE_COUNT if they're available and device prefers it

…e, or when MSAA TextureArray support is limited

…regardless of the OS)

com.unity.render-pipelines.core/Runtime/Textures/TextureXR.cs

sebastienlagarde · 2020-09-21T07:33:11Z

TestProjects/HDRP_HybridTests/Packages/packages-lock.json

@@ -199,13 +199,6 @@
      },
      "url": "https://artifactory.prd.cds.internal.unity3d.com/artifactory/api/npm/upm-candidates"
    },
-    "com.unity.purchasing": {


@TomasKiniulis FYI

sebastienlagarde · 2020-09-21T07:33:31Z

TestProjects/UniversalHybridTest/Packages/packages-lock.json

@@ -191,13 +191,6 @@
      },
      "url": "https://artifactory.prd.cds.internal.unity3d.com/artifactory/api/npm/upm-candidates"
    },
-    "com.unity.purchasing": {


@TomasKiniulis fyi

FrancescoC-unity · 2020-09-21T07:41:37Z

com.unity.render-pipelines.high-definition/Runtime/Lighting/LightLoop/LightLoop.cs

@@ -3048,6 +3026,7 @@ static void ClearLightList(in BuildGPULightListParameters parameters, CommandBuf
            cmd.SetComputeBufferParam(parameters.clearLightListCS, parameters.clearLightListKernel, HDShaderIDs._LightListToClear, bufferToClear);
            cmd.SetComputeIntParam(parameters.clearLightListCS, HDShaderIDs._LightListEntries, bufferToClear.count);

+            // TODO: Round on DeviceInfo.optimalThreadGroupSize so we have optimal thread for ClearList kernel


In current shape of things, won't this give the wrong number of groups dispatched? We still set NR_THREADS to 32 in the shader if that's the optimal lane count.

It's a comment that this needs to be updated once PLATFORM_LANE_COUNT_32 path is enabled in ClearLightLists.compute, it's currently commented out

What I mean is that if anywhere PLATFORM_LANE_COUNT is defined != 64 this will fail. Unless we are sure nowhere that will be defined.

(This is in the shader)

#ifdef PLATFORM_LANE_COUNT
#define NR_THREADS PLATFORM_LANE_COUNT
#else
#define NR_THREADS 64 // default to 64 threads per group
#endif

sebastienlagarde · 2020-09-21T07:47:45Z

com.unity.render-pipelines.core/Runtime/Textures/MSAASamples.cs

+            if (samples == MSAASamples.None)
+                return samples;
+
+            GraphicsFormat ldrFormat = SystemInfo.GetGraphicsFormat(DefaultFormat.LDR);


Why do you use SystemInfo.GetGraphicsFormat(DefaultFormat.LDR); ? to check this.
We need to check every format that will use MSAA.
i.e ColorBufferFormat (which can be selected to be RGBA8 and RGBA16F) for example and other RT like R32 we use for depth in prepass when MSAA is enabled and other stuff. We don't have a single format here.

Is the intent of this function to only check if RGBA8 MSAA sample count is supported or I miss something?

The idea is to use any texture format to figure out what is the generic supported range of MSAA samples on your GfxDevice, it's an improvement over to current assumption.

I don't think our C# APIs have any better way to query a supported MSAA range in other way (if your GfxDevice doesn't support 8x but 2/4 only, or if some really old GPUs wouldn't support 2x). Shaders could pick up some 8x variant for MSTextures use, so this isn't something that we can silently fallback underneath SRP within GfxDevice implementation.

sebastienlagarde · 2020-09-21T07:48:47Z

com.unity.render-pipelines.core/Runtime/Textures/MSAASamples.cs

+                return samples;
+
+            GraphicsFormat ldrFormat = SystemInfo.GetGraphicsFormat(DefaultFormat.LDR);
+            MSAASamples[] probe = {MSAASamples.MSAA2x, MSAASamples.MSAA4x, MSAASamples.MSAA8x};


This is dangerous, if we add MSAA16x in the future this will not be catch, we should generate the array from the enum .
i.e var valuesAsArray = Enum.GetValues(typeof(Enumnum));

How are you suggesting to automatically handle MSAASamples.MSAAx mapping into FormatUsage.MSAAx that would still be a problem?

sebastienlagarde · 2020-09-21T07:49:52Z

com.unity.render-pipelines.core/Runtime/Textures/MSAASamples.cs

+            int minSampleCount = 0;
+            int maxSampleCount = 0;
+
+            // Some GPUs might not support MSAA2x, so probe the supported range for clamping


Maybe comment that it is for Metal API . Other platform don't have this problem

I believe some APIs allow for the implementation to give you a different sample count than you asked for, but I'm not sure we can always probe for that. Maybe Antti can clarify what this is defending against.

Apple GPUs don't support MSAA 8x, but if we're checking the highest "generic format" MSAA range supported, you might as well check what is the lowest, because historically there has been GPUs outside of Metal domain that haven't supported 2x, but 4x only.

sebastienlagarde · 2020-09-21T07:58:46Z

com.unity.render-pipelines.high-definition/Runtime/Debug/DebugDisplay.cs

@@ -40,7 +40,6 @@ unsafe struct ShaderVariablesDebugDisplay
        public int _DebugSingleShadowIndex;

        public int _DebugProbeVolumeMode;
-        public Vector3 _DebugDisplayPad0;


Question: Why are we not catching issue currently with our test. We have debug mode test.
I would like to add a graphic test to catch this for the future, what kind of test should I add? thanks

This is harmless, but takes up extra space in the constant buffer. We need a generic mechanism to validate/enforce something like "std140" from Khronos that works across all APIs (and we don't have that at the moment).

FrancescoC-unity · 2020-09-21T08:01:01Z

...ty.render-pipelines.high-definition/Runtime/Lighting/LightLoop/cleardispatchindirect.compute

@@ -1,5 +1,8 @@
 #pragma kernel ClearDispatchIndirect
 #pragma kernel ClearDrawProceduralIndirect
+
+#pragma multi_compile _ PLATFORM_LANE_COUNT_32


Given that this is static per platform, why a multi compile vs defining PLATFORM_LANE_COUNT in relevant include files?

Using multi compiles will slow down build times so I think we should avoid it if possible.

For optimizing against specific GPU models, we need to be able to pick up PLATFORM_LANE_COUNT_32 during runtime based on the actual GPU model used (AMD vs Apple GPU)

JulienIgnace-Unity · 2020-09-21T08:23:40Z

com.unity.render-pipelines.high-definition/Runtime/RenderPipeline/HDRenderPipeline.Prepass.cs

                    passData.depthAsColorBuffer = builder.WriteTexture(renderGraph.CreateTexture(new TextureDesc(Vector2.one, true, true)
-                        { colorFormat = GraphicsFormat.R32_SFloat, clearBuffer = true, clearColor = Color.black, bindTextureMS = true, enableMSAA = true, name = "DepthAsColorMSAA" }));
+                        { colorFormat = depthAsColorMSAAFormat, clearBuffer = true, clearColor = Color.black, bindTextureMS = true, enableMSAA = true, name = "DepthAsColorMSAA" }));


dimension must also be set to dimensionMSAA here.
This is currently done implicitly in the constructor of TextureDesc where slices and dimension are set based on the xrReady parameter but now that we have a specific one for MSAA we need to make it explicit.

sebastienlagarde · 2020-09-21T08:44:51Z

...render-pipelines.high-definition/Runtime/Lighting/LightLoop/lightlistbuild-clustered.compute

@@ -10,6 +10,18 @@
 #pragma kernel TileLightListGen_DepthRT_SrcBigTile_Oblique          LIGHTLISTGEN=TileLightListGen_DepthRT_SrcBigTile_Oblique            ENABLE_DEPTH_TEXTURE_BACKPLANE          USE_TWO_PASS_TILED_LIGHTING				USE_OBLIQUE_MODE
 #pragma kernel TileLightListGen_DepthRT_MSAA_SrcBigTile_Oblique     LIGHTLISTGEN=TileLightListGen_DepthRT_MSAA_SrcBigTile_Oblique       ENABLE_DEPTH_TEXTURE_BACKPLANE			MSAA_ENABLED        USE_TWO_PASS_TILED_LIGHTING			USE_OBLIQUE_MODE

+#pragma kernel TileLightListGen_NoDepthRT_LE                LIGHTLISTGEN=TileLightListGen_NoDepthRT_LE      PLATFORM_LANE_COUNT_32


We should simply use a multi compile here, like for the other one. I don't see the need for a "LE" thing here.

I guess I have a global misunderstand about what "LE" mean. it is just to have 32 vs 64? Or does it mean Lightweight?

LE was originally a short for something of LowEnd, for DeviceInfo to prefer a specific suffix if they happen to exist. I thought about renaming it as _32 as well.

ComputeGgxIblSampleData.compute has nothing to do with PLATFORM_LANE_COUNT_32, so it made sense to use _LE naming convention there to differentiate MAX_IBL_SAMPLE_CNT=89/MAX_IBL_SAMPLE_CNT=34 kernel pickup to runtime.

sebastienlagarde · 2020-09-21T08:47:39Z

...y.render-pipelines.high-definition/Runtime/Lighting/VolumetricLighting/VolumetricLighting.cs

-            int w = Mathf.RoundToInt(viewportWidth  * screenFraction);
-            int h = Mathf.RoundToInt(viewportHeight * screenFraction);
-            int d = sliceCount;
+            const int kMax3DTextureSize = 2048; // TODO: There's no SystemInfo.maxTextureSize equivalent for 3D textures


So it's bad but I we need to solve it with adding SystemInfo.maxTextureSize for 3D texture as you said. So we should at least put a warning here Debug.Warning() to inform there is an issue with the same (and that it is a limit on metal platform).

This has nothing to do with Metal, it's a generic limitation on DX11 as well. When using volumetrics dynamic resolution slider, you will easily go over the supported limit on any target. We've talked about this with @EvgeniiG before, besides checking for the maximum resolution, maybe there's some improvements that could be made into the actual dynamic resolution scaler itself? Now i was able to hit the resolution limits around 70% of the slider

JulienIgnace-Unity

Changes need to be done to the Render Graph code path.
Unfortunately this code path is not enabled by default yet and given the design, most render texture creation code is duplicated in there so msaa changes need to be replicated.
Here's the exhaustive (I hope :p ) list of the places that need to be updated:
HDRenderPipeline.RenderGraph.cs

CreateColorBuffer
ResolveMSAAColor (device resolve logic)
ResolveMotionVector (device resolve logic)

HDRenderPipeline.Debug.cs

RenderDebugViewMaterial

HDRenderPipeline.LightLoop.cs

CreateDiffuseLightingBuffer

HDRenderPipeline.Prepass.cs

CreateDepthBuffer (already done)
CreateNormalBuffer
CreateDecalPrepassBuffer
CreateMotionVectorBuffer
RenderDepthPrepass (in msaa condition where we create a R32F buffer)
ResolvePrepassBuffers (device resolve logic)

HDRenderPipeline.SubsurfaceScattering.cs

CreateSSSBuffer

Also, it's better if you test this as well with the render graph path. You can run with -rendergraph-tests command line to enable render graph by default (or hardcode enableRenderGraphTests to true in HDRenderPipeline.cs)

sebastienlagarde · 2020-09-21T08:52:40Z

...er-pipelines.high-definition/Runtime/Material/GGXConvolution/ComputeGgxIblSampleData.compute

 RWTexture2D<float4> outputResult; // [MAX_SAMPLE_CNT x UNITY_SPECCUBE_LOD_STEPS]

-#pragma kernel ComputeGgxIblSampleData
+#pragma kernel ComputeGgxIblSampleData COMPUTEGGGXIBLSAMPLEDATA=ComputeGgxIblSampleData MAX_IBL_SAMPLE_CNT=89


I guess I have a global misunderstand about what "LE" mean. it is just to have 32 vs 64? Or does it mean Lightweight?

here it is not a matter of 32 vs 64 thread size, it is for performance. Mobile and switch don't have the power to process the 89 sample of PC / console, so we use a lower number of sample.

We shouldn't change this.

For future Macs, we need both of these to be available based on GPU used during runtime.

sebastienlagarde · 2020-09-21T09:01:42Z

com.unity.render-pipelines.high-definition/Runtime/Material/SharedRTManager.cs

-                        // Set the motion vector input texture
-                        Shader.SetGlobalTexture(HDShaderIDs._MotionVectorTextureMS, m_MotionVectorsMSAART);
+                        if (m_MotionVectorsSupport)
+                            cmd.ResolveAntiAliasedSurface(m_MotionVectorsMSAART.rt, m_MotionVectorsRT.rt);


I don't see how this code can work. We can't rely on "auto" resolve for our buffer. If we use a shader for the resolve, it is because we need to do particular software operation to do the resolve (like taking the closted depth sample), we don't blindly average vector value for example.

See
D:\newSRP\Graphics\com.unity.render-pipelines.high-definition\Runtime\RenderPipeline\RenderPass\MSAA\DepthValues.shader

So it mean we are block for Metal? auto resolve will give us poor result and we can't used our software resolve? or it was just an optimization?

sebastienlagarde · 2020-09-21T09:02:44Z

com.unity.render-pipelines.high-definition/Runtime/Material/SharedRTManager.cs

+                    if (DeviceInfo.requiresExplicitMSAAResolve == false &&
+                        CompareRTHandle(msaaTarget, simpleTarget))
+                    {
+                        cmd.ResolveAntiAliasedSurface(msaaTarget.rt, simpleTarget.rt);


Same issue as above, our color resolve rely on a tonemap operation for the average, it is not just an average. We can't rely on auto resolve.

sebastienlagarde · 2020-09-21T09:06:19Z

com.unity.testing.hdrp/Scripts/SetToCameraNearPlane.cs

@@ -5,9 +5,14 @@

 public class SetToCameraNearPlane : MonoBehaviour
 {
+#if UNITY_EDITOR


I will report this in another PR, thanks for the fix

you can remove it from this PR

sebastienlagarde

Overall I would like the thought of everyone regarding the 32 vs 64 thread change in this PR.

As I can read it is static (mobile / switch use 32 and other platform use 64 for now.
So I don't see the need for all the change and extra PLATFORM_LANE_COUNT_32, in particular as it is not handled correctly everywhere.

So two solution here.

Either:

We assume PLATFORM_LANE_COUNT is define statically and we remove the various change here including the "LE".
We assume PLATFORM_LANE_COUNT is dynamic (which is true for PC, don't know for mobile), and in this case we should correctly handle it. i.e Everywhere we use PLATFORM_LANE_COUNT we do add a multicompile and we select one variant based on current platform or GPU card (we could check for example if we have NVidia string in name of GPU card and select the right variant to used based on this in addition to do it per platform. If we chose this path, we then add an option to strip the unused variant in the stripper (for example when building for PS4, we reject the various 32 thread variant.
I have looked at all the shader relying on PLATFORM_LANE_COUNT and they are all compute / shader that are low compile time (i.e it is not a forward or deferred shader), so it should be rather easily doable and shouldn't add any extra build time expect on platform like PC (and maybe mobile if there is case for non 32?).
What are people thought? would allow to have more optimized code for PC (but after that imply problem on test side). It all depends how much platform required a variant, and as of today it was mainly only PC we have done this optimization.

sebastienlagarde · 2020-09-21T09:17:21Z

Depends on thought of people we may simply simplify this PR so it work for metal and do the other architecture change (32 vs 64) on another PR.

Note: I am also concern by the auto resolve for MSAA. This one can't work with our code as we need explicit software resolve. What are our alternative here?

Lastly, this PR break our rendergraph as the code is not done for it in the PR, we can handle it for you, but we need first to solve the other problem.

JulienIgnace-Unity · 2020-09-21T11:47:43Z

com.unity.render-pipelines.core/Runtime/Textures/MSAASamples.cs

@@ -14,4 +18,52 @@ public enum MSAASamples
        /// <summary>MSAA 8X.</summary>
        MSAA8x = 8
    }
+
+    public static class MSAASamplesExtensions


There is currently no point in making this public so I think we should make it internal.

JulienIgnace-Unity · 2020-09-21T11:48:33Z

com.unity.render-pipelines.high-definition/Runtime/Utilities/DeviceInfo.cs

+
+namespace UnityEngine.Rendering.HighDefinition
+{
+    public class DeviceInfo


Same as above, there is currently no point in making this public so I think we should make it internal.

fabien-unity

@julienf-unity , @aet : what is not working with VFXGraph + non-array for RT ? Let's fix that instead of adding all these changes to TextureXR, and the extra API complexity around it.

fabien-unity · 2020-09-21T14:00:40Z

com.unity.render-pipelines.core/Runtime/Textures/TextureXR.cs

+        /// <summary>
+        /// Dimension of XR MSAA textures.
+        /// </summary>
+        public static TextureDimension dimensionMSAA


Please do no add this extra API just for Metal.
Let's fix the VFX Graph instead to deal correctly with non-array.

Ok.. so we've had these known issues in Spaceship demo's main menu (an on purpose screen glitch is gray instead of coloured) and VFX sparkles not bouncing off from the surfaces as expected, and we've had people working on this project and talking about this limitation/difference against Metal backend a year now? :)

All discussions so far that I've seen suggest that there's something in Metal that should be fixed. But we really can't move on to enabling SPI/XR stuff anytime soon, and on some platforms you shouldn't expect XR/SPI at all.

It makes sense for VFX to share TextureXR, but it's showing this problem where it's clearly depending on an API that locks you into XR/SPI assumptions, VFXGraph should work on targets without SPI too, where you still have texture arrays available.

Yes problem with this is that the "correct" fix is tricky:

We can add a type for CameraBuffers in VFX Graph that will abstract the actual TextureXR. Problem with that is that we lose some interoperability with Shader Graph (as they sont support TextureXR)... So we have a problem UX wise here.

Or we can also solve it by not having TextureXR leak outside of shader code which means SPR API to access camera buffer always return Texture2D (Was the case before but it was changed for RTX I dont exactly remember why). We could also fix the interop issues by allowing a Texture2D view on a texture2DArray first slice...

But I agree with Fabien that fixing it just by changing the type in metal is not an actual fix. Issues still remain for other platforms that dont support XR like XBox for instance.

And also a problem of when stuff can get properly fixed vs having something working in the meanwhile :) My changes are not super intrusive though within TextureXR, and it will allow fixing the same issue on other platforms without XR/SPI as needed, but of course I tried to restrict actual changes to Metal runtime only when possible

The MSAA texture array support situation is more unfortunate though. We always have features that "not everyone" depends on automatically, but even on current desktop mac gpu feature table, the availability depends too much to make it a hard requirement. For iOS it's simply not available at all

Sure, but it will add a new public API that we'll have to maintain for many years ... let's please avoid that and fix the root issue.
@julienf-unity : is the Xbox issue tracked anywhere? I'll be happy to help out with fixing VFX/Shader Graph to properly support texarray or not.

@fabien-unity how do we fix the root issue of having a Graphics Device without MSAA texture support, other than not using it when it's not available? :) It's a less common problem than limited XR/SPI availability, but equally could happen elsewhere too than just Metal. The main interest for HDRP users is deferred rendering in any case

@fabien-unity Xbox issue is tracked via a single ticket for some failing graphics tests. specifically the depth collision one. https://fogbugz.unity3d.com/f/cases/1256635/

We can have a meeting to talk about how to fix the TextureXR issue. @JulienIgnace-Unity you might want to be here.

aet · 2020-09-21T22:18:16Z

com.unity.render-pipelines.high-definition/Runtime/Utilities/DeviceInfo.cs

+            {
+                // TODO: Could conditionally enable threadExecutionWidth32/disable preferComputeKernels based on GPU
+                // SystemInfo.hasHiddenSurfaceRemovalOnGPU == true on Apple GPUs
+                requiresExplicitMSAAResolve = false;


so this PR does not yet enable PLATFORM_LANE_COUNT_32 path when running on Mac/Apple GPU, but that's the ultimate goal for this, while cleaning up existing mobile/switch situation

…abled, always treat real as float. Use explicit half precision optimizations instead.

aet requested a review from sebastienlagarde September 8, 2020 23:42

aet changed the title ~~Hdrp/metal~~ [10.x] Metal support updates Sep 9, 2020

aet force-pushed the HDRP/metal branch from 2000404 to 63a3018 Compare September 10, 2020 01:20

aet added 13 commits September 12, 2020 10:33

remove float3 _DebugDisplayPad0 to make ShaderVariablesDebugDisplay 1…

6db835b

…6-byte aligned

add clamping for max 3d texture size in volumetrics

6325488

add DeviceInfo, add an option to load optimal compute kernel based PL…

4f37b57

…ATFORM_LANE_COUNT if they're available and device prefers it

Add MSAASamples validation for supported MSAA antialiasing range

6ea691a

check if R32_SFloat with MSAA supported, use R16_SFloat as fallback

87bd76a

msaa: use autoresolved textures on Metal when GraphicsFormat match

a46452a

fix player export when exporting a single scene from the editor

2fd1bbe

remove com.unity.purchasing package

ae9aacf

update OSXPlayer/Metal templates

a4b4ec7

add half/float precision to 2DArray/CubeArray

8199d4b

VFX depends on TextureXR, so tweak it to work when no XR/SPI availabl…

44c1326

…e, or when MSAA TextureArray support is limited

for now, workaround PLATFORM_LANE_COUNT_32 code path for Apple GPUs (…

c4fde0a

…regardless of the OS)

update OSXEditor/Metal templates

11311e8

aet force-pushed the HDRP/metal branch from 63a3018 to 11311e8 Compare September 12, 2020 17:33