You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Anyone with a background in competitive programming (like OI/ICPC) knows that adding a long list of optimization headers to your code can make it run significantly faster. One of the most effective of these is fastmath.
According to the GCC documentation on Floating-Point Math, fastmath essentially allows floating-point operations to disregard strict IEEE 754 standards. This gives the CPU the freedom to perform calculations in whatever way it deems fastest. While this can affect precision to some degree, the CPU's designers know their hardware best, so letting the CPU use its own optimized methods is bound to be faster than adhering to the rigid IEEE rules.
If enabling fastmath can speed up CPUs, can it do the same for GPUs?
This extension allows us to enable fastmath for SPIR-V, unlocking faster performance.
According to the Khronos Group, to use this extension, the following OpExtension must be present in the SPIR-V module.:
OpExtension "SPV_KHR_float_controls2"
So, our first step is to insert this extension.
Next, we look at FPFastMathDefault:
FPFastMathDefault
Set the default fast math flags for instructions not themselves decorated with FPFastMathMode. This only affects instructions operating on or resulting in a type that is Target Type or an OpTypeMatrix or OpTypeVector derived from it. Target Type must be a scalar, floating-point type. Fast-Math Mode must be the of a constant instruction of 32-bit integer type containing a valid FP Fast Math Mode bitmask. Fast-Math Mode must not be a specialization-constant instruction. May be applied at most once per Target Type to any execution mode.
Then, we need to set the execution mode for a specific data type. This requires that the Fast-Math Mode must be the of a constant instruction of 32-bit integer type containing a valid FP Fast Math Mode bitmask. Therefore, we also need to create a Constant to handle this. It's also important to note that the FloatControls2 capability must be enabled to successfully set the execution mode.
This means we need to modify the SPIR-V binary directly to insert these features.
This means we need to obtain the following information from the SPIR-V binary:
The Entry Point ID
The original maximum variable ID
The ID for the UINT type
Let's examine a SPIR-V binary to see what it looks like. To keep things simple, let's start with the absval operator, which I've disassembled into a human-readable text format using spirv-dis from the SPIRV-Tools.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Introduction
Anyone with a background in competitive programming (like OI/ICPC) knows that adding a long list of optimization headers to your code can make it run significantly faster. One of the most effective of these is
fastmath.According to the GCC documentation on Floating-Point Math,
fastmathessentially allows floating-point operations to disregard strict IEEE 754 standards. This gives the CPU the freedom to perform calculations in whatever way it deems fastest. While this can affect precision to some degree, the CPU's designers know their hardware best, so letting the CPU use its own optimized methods is bound to be faster than adhering to the rigid IEEE rules.If enabling
fastmathcan speed up CPUs, can it do the same for GPUs?Nihui gave me the answer:
SPV_KHR_float_controls2
This extension allows us to enable
fastmathfor SPIR-V, unlocking faster performance.According to the Khronos Group, to use this extension, the following
OpExtensionmust be present in the SPIR-V module.:So, our first step is to insert this extension.
Next, we look at
FPFastMathDefault:Then, we need to set the execution mode for a specific data type. This requires that the Fast-Math Mode must be the of a constant instruction of 32-bit integer type containing a valid FP Fast Math Mode bitmask. Therefore, we also need to create a
Constantto handle this. It's also important to note that theFloatControls2capability must be enabled to successfully set the execution mode.This means we need to modify the SPIR-V binary directly to insert these features.
To summarize, we will need the following Opcodes:
OpCapabilityOpExtensionOpExecutionModeOpConstantBy consulting the SPIR-V Unified Specification, I found the specific parameters and IDs for these Opcodes:
OpCapabilityOpExtensionOpExecutionModeOpConstantWith this, we can outline the pseudo-code for our injection:
This means we need to obtain the following information from the SPIR-V binary:
Let's examine a SPIR-V binary to see what it looks like. To keep things simple, let's start with the
absvaloperator, which I've disassembled into a human-readable text format usingspirv-disfrom the SPIRV-Tools.From observing the code, we can find:
The
abscalculation is here:This is the program's entry point:
And the original maximum variable ID is:
Drawing inspiration from ncnn's
inject_local_xyz, we can implement the injection logic:Let's look at the result after injection:
And with that, we have successfully injected
fast_mathinto the SPIR-V module.Feel free to check out my PR: #6223
Beta Was this translation helpful? Give feedback.
All reactions