Skip to content

Conversation

@futz12
Copy link
Contributor

@futz12 futz12 commented Aug 1, 2025

As op absval
Before

; SPIR-V
; Version: 1.3
; Generator: Khronos Glslang Reference Front End; 11
; Bound: 54
; Schema: 0
               OpCapability Shader
          %1 = OpExtInstImport "GLSL.std.450"
               OpMemoryModel Logical GLSL450
               OpEntryPoint GLCompute %main "main" %gl_GlobalInvocationID
               OpExecutionMode %main LocalSize 32 1 1
               OpSource GLSL 450
               OpSourceExtension "GL_EXT_shader_8bit_storage"
               OpSourceExtension "GL_EXT_shader_explicit_arithmetic_types_int64"
               OpName %main "main"
               OpName %gi "gi"
               OpName %gl_GlobalInvocationID "gl_GlobalInvocationID"
               OpName %n "n"
               OpName %parameter "parameter"
               OpMemberName %parameter 0 "n"
               OpName %p "p"
               OpName %v "v"
               OpName %bottom_top_blob "bottom_top_blob"
               OpMemberName %bottom_top_blob 0 "bottom_top_blob_data"
               OpName %_ ""
               OpDecorate %gl_GlobalInvocationID BuiltIn GlobalInvocationId
               OpDecorate %n SpecId 0
               OpDecorate %parameter Block
               OpMemberDecorate %parameter 0 Offset 0
               OpDecorate %_runtimearr_v4float ArrayStride 16
               OpDecorate %bottom_top_blob Block
               OpMemberDecorate %bottom_top_blob 0 Offset 0
               OpDecorate %_ Binding 0
               OpDecorate %_ DescriptorSet 0
       %void = OpTypeVoid
          %3 = OpTypeFunction %void
       %uint = OpTypeInt 32 0
%_ptr_Function_uint = OpTypePointer Function %uint
     %v3uint = OpTypeVector %uint 3
%_ptr_Input_v3uint = OpTypePointer Input %v3uint
%gl_GlobalInvocationID = OpVariable %_ptr_Input_v3uint Input
     %uint_0 = OpConstant %uint 0
%_ptr_Input_uint = OpTypePointer Input %uint
          %n = OpSpecConstant %uint 0
       %bool = OpTypeBool
         %19 = OpSpecConstantOp %bool IEqual %n %uint_0
  %parameter = OpTypeStruct %uint
%_ptr_PushConstant_parameter = OpTypePointer PushConstant %parameter
          %p = OpVariable %_ptr_PushConstant_parameter PushConstant
        %int = OpTypeInt 32 1
      %int_0 = OpConstant %int 0
%_ptr_PushConstant_uint = OpTypePointer PushConstant %uint
      %float = OpTypeFloat 32
    %v4float = OpTypeVector %float 4
%_ptr_Function_v4float = OpTypePointer Function %v4float
%_runtimearr_v4float = OpTypeRuntimeArray %v4float
%bottom_top_blob = OpTypeStruct %_runtimearr_v4float
%_ptr_StorageBuffer_bottom_top_blob = OpTypePointer StorageBuffer %bottom_top_blob
          %_ = OpVariable %_ptr_StorageBuffer_bottom_top_blob StorageBuffer
%_ptr_StorageBuffer_v4float = OpTypePointer StorageBuffer %v4float
       %main = OpFunction %void None %3
          %5 = OpLabel
         %gi = OpVariable %_ptr_Function_uint Function
         %20 = OpVariable %_ptr_Function_uint Function
          %v = OpVariable %_ptr_Function_v4float Function
         %14 = OpAccessChain %_ptr_Input_uint %gl_GlobalInvocationID %uint_0
         %15 = OpLoad %uint %14
               OpStore %gi %15
         %16 = OpLoad %uint %gi
               OpSelectionMerge %22 None
               OpBranchConditional %19 %21 %31
         %21 = OpLabel
         %29 = OpAccessChain %_ptr_PushConstant_uint %p %int_0
         %30 = OpLoad %uint %29
               OpStore %20 %30
               OpBranch %22
         %31 = OpLabel
               OpStore %20 %n
               OpBranch %22
         %22 = OpLabel
         %32 = OpLoad %uint %20
         %33 = OpUGreaterThanEqual %bool %16 %32
               OpSelectionMerge %35 None
               OpBranchConditional %33 %34 %35
         %34 = OpLabel
               OpReturn
         %35 = OpLabel
         %45 = OpLoad %uint %gi
         %47 = OpAccessChain %_ptr_StorageBuffer_v4float %_ %int_0 %45
         %48 = OpLoad %v4float %47
               OpStore %v %48
         %49 = OpLoad %v4float %v
         %50 = OpExtInst %v4float %1 FAbs %49
               OpStore %v %50
         %51 = OpLoad %uint %gi
         %52 = OpLoad %v4float %v
         %53 = OpAccessChain %_ptr_StorageBuffer_v4float %_ %int_0 %51
               OpStore %53 %52
               OpReturn
               OpFunctionEnd

After

; SPIR-V
; Version: 1.3
; Generator: Khronos Glslang Reference Front End; 11
; Bound: 55
; Schema: 0
               OpCapability Shader
          %1 = OpExtInstImport "GLSL.std.450"
               OpMemoryModel Logical GLSL450
               OpCapability FloatControls2
               OpExtension "SPV_KHR_float_controls2"
               OpEntryPoint GLCompute %main "main" %gl_GlobalInvocationID
               OpExecutionMode %main FPFastMathDefault %float %uint_458752
               OpExecutionMode %main LocalSize 32 1 1
               OpSource GLSL 450
               OpSourceExtension "GL_EXT_shader_8bit_storage"
               OpSourceExtension "GL_EXT_shader_explicit_arithmetic_types_int64"
               OpName %main "main"
               OpName %gi "gi"
               OpName %gl_GlobalInvocationID "gl_GlobalInvocationID"
               OpName %n "n"
               OpName %parameter "parameter"
               OpMemberName %parameter 0 "n"
               OpName %p "p"
               OpName %v "v"
               OpName %bottom_top_blob "bottom_top_blob"
               OpMemberName %bottom_top_blob 0 "bottom_top_blob_data"
               OpName %_ ""
               OpDecorate %gl_GlobalInvocationID BuiltIn GlobalInvocationId
               OpDecorate %n SpecId 0
               OpDecorate %parameter Block
               OpMemberDecorate %parameter 0 Offset 0
               OpDecorate %_runtimearr_v4float ArrayStride 16
               OpDecorate %bottom_top_blob Block
               OpMemberDecorate %bottom_top_blob 0 Offset 0
               OpDecorate %_ Binding 0
               OpDecorate %_ DescriptorSet 0
       %void = OpTypeVoid
          %3 = OpTypeFunction %void
       %uint = OpTypeInt 32 0
%_ptr_Function_uint = OpTypePointer Function %uint
     %v3uint = OpTypeVector %uint 3
%_ptr_Input_v3uint = OpTypePointer Input %v3uint
%gl_GlobalInvocationID = OpVariable %_ptr_Input_v3uint Input
     %uint_0 = OpConstant %uint 0
%_ptr_Input_uint = OpTypePointer Input %uint
          %n = OpSpecConstant %uint 0
       %bool = OpTypeBool
         %19 = OpSpecConstantOp %bool IEqual %n %uint_0
  %parameter = OpTypeStruct %uint
%_ptr_PushConstant_parameter = OpTypePointer PushConstant %parameter
          %p = OpVariable %_ptr_PushConstant_parameter PushConstant
        %int = OpTypeInt 32 1
      %int_0 = OpConstant %int 0
%_ptr_PushConstant_uint = OpTypePointer PushConstant %uint
      %float = OpTypeFloat 32
    %v4float = OpTypeVector %float 4
%_ptr_Function_v4float = OpTypePointer Function %v4float
%_runtimearr_v4float = OpTypeRuntimeArray %v4float
%bottom_top_blob = OpTypeStruct %_runtimearr_v4float
%_ptr_StorageBuffer_bottom_top_blob = OpTypePointer StorageBuffer %bottom_top_blob
          %_ = OpVariable %_ptr_StorageBuffer_bottom_top_blob StorageBuffer
%_ptr_StorageBuffer_v4float = OpTypePointer StorageBuffer %v4float
%uint_458752 = OpConstant %uint 458752
       %main = OpFunction %void None %3
          %5 = OpLabel
         %gi = OpVariable %_ptr_Function_uint Function
         %20 = OpVariable %_ptr_Function_uint Function
          %v = OpVariable %_ptr_Function_v4float Function
         %14 = OpAccessChain %_ptr_Input_uint %gl_GlobalInvocationID %uint_0
         %15 = OpLoad %uint %14
               OpStore %gi %15
         %16 = OpLoad %uint %gi
               OpSelectionMerge %22 None
               OpBranchConditional %19 %21 %31
         %21 = OpLabel
         %29 = OpAccessChain %_ptr_PushConstant_uint %p %int_0
         %30 = OpLoad %uint %29
               OpStore %20 %30
               OpBranch %22
         %31 = OpLabel
               OpStore %20 %n
               OpBranch %22
         %22 = OpLabel
         %32 = OpLoad %uint %20
         %33 = OpUGreaterThanEqual %bool %16 %32
               OpSelectionMerge %35 None
               OpBranchConditional %33 %34 %35
         %34 = OpLabel
               OpReturn
         %35 = OpLabel
         %45 = OpLoad %uint %gi
         %47 = OpAccessChain %_ptr_StorageBuffer_v4float %_ %int_0 %45
         %48 = OpLoad %v4float %47
               OpStore %v %48
         %49 = OpLoad %v4float %v
         %50 = OpExtInst %v4float %1 FAbs %49
               OpStore %v %50
         %51 = OpLoad %uint %gi
         %52 = OpLoad %v4float %v
         %53 = OpAccessChain %_ptr_StorageBuffer_v4float %_ %int_0 %51
               OpStore %53 %52
               OpReturn
               OpFunctionEnd

@tencent-adm
Copy link
Member

tencent-adm commented Aug 1, 2025

CLA assistant check
Thank you for your submission, we really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 2 committers have signed the CLA.

✅ futz12
❌ nihui
You have signed the CLA already but the status is still pending? Let us recheck it.

@codecov-commenter
Copy link

codecov-commenter commented Aug 1, 2025

Codecov Report

❌ Patch coverage is 81.90476% with 19 lines in your changes missing coverage. Please review.
✅ Project coverage is 95.59%. Comparing base (a514cf5) to head (157ff17).
⚠️ Report is 2 commits behind head on master.

Files with missing lines Patch % Lines
src/pipelinecache.cpp 33.33% 10 Missing ⚠️
src/gpu.cpp 91.95% 7 Missing ⚠️
src/pipeline.cpp 0.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #6223      +/-   ##
==========================================
- Coverage   95.89%   95.59%   -0.30%     
==========================================
  Files         837      838       +1     
  Lines      264994   265097     +103     
==========================================
- Hits       254105   253424     -681     
- Misses      10889    11673     +784     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@github-actions
Copy link

github-actions bot commented Aug 1, 2025

The binary size change of libncnn.so (bytes)

architecture base size pr size difference
x86_64 15124728 15133360 +8632 ⚠️
armhf 6155744 6160304 +4560 ⚠️
aarch64 9453192 9453856 +664 ⚠️

@futz12 futz12 changed the title [WIP] spir-v fastmath mode spir-v fastmath mode Aug 1, 2025
@nihui nihui closed this Aug 1, 2025
@nihui nihui reopened this Aug 1, 2025
@nihui
Copy link
Member

nihui commented Aug 21, 2025

感谢你的工作,请将你在实现中的笔记和心得,遇到的困难和解决方法等,记录成文章,发表在discussion分区,这将作为知识总结 https://github.com/Tencent/ncnn/discussions

Thank you for your work. Please record your notes and experience in the implementation, difficulties encountered and solutions, etc. into an article and publish it in the discussion section. This will serve as a knowledge summary. https://github.com/Tencent/ncnn/discussions

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants