Releases: intel/intel-graphics-compiler
Releases · intel/intel-graphics-compiler
igc-1.0.6646
Fixed Issues / Improvements
- Added a key to dump out WIA information into a unique file per each function invocation,
- Added comments for stack callee function prolog to assist debugging,
- Added indirect regioning restrictions,
- Added ld cases for texture folding,
- Added legalization checks for VxH regions for int-to-fp moves,
- Added localization of live ranges to reduce accumulator usages,
- Added shader dump for spec constants,
- Added support for a new pattern in
PushAnalysis::IsStatelessCBLoad()
to detect, - Added VISA option -dumpintf to dump RA interference graph,
- Added more passes to igc_opt,
- Added reporting warnings inside IGC passes,
- Added handles to plane coefficients,
- Added reg keys for scheduled BB range in local scheduler,
- Added additional DP emulation mode,
- Allow coalescing of spill/fill in presence of stack calls,
- Allow remat even for operations using NoMaskWA,
- Appling renaming to linear scan RA spill/fill,
- Change memory semantics to relaxed for OpenCL 1.x atomics,
- Decouple VC debug options to allow emission of debug infromation without debuggable kernels,
- Emit warning about an unsupported debuggability if ZeBin is requested,
- Enabled Wa16012061344 for read suppresion issue caused by predictor,
- Enhancements in compiler output,
- Extended FCL dumps with CMFE options and inputs,
- Favoring the llvm::BasicBlock name for vISA labels,
- Fixed build break in Fedora,
- Fixed emission of debug information for implicit variable locations,
- Fixed erroneous size calculation for DW_OP_bit_piece,
- Fixes for media height support in Cisa Builder,
- Fixes for PosDep MatchMad condition,
- Fixed logic when LLVM name is the empty string,
- Fixed missing barrier when inline ASM is used in a kernel,
- Fixed non-deterministic Function->VisaModule lookup,
- Fixed PushAnalysis to not create unaligned 64bit runtime value arguments,
- Fixed the hybrid RA with spill,
- Fixed the linear scan RA time status,
- Fixed issue for multiple thread compilation of shaders,
- Fixed GEP scalarized indexes calculation in CG_LowerGEPForPrivMem pass,
- For optnone builtins, allow IGC to determine inline/noinline and stackcall/subroutine calls,
- GenISA ibfe/ubfe constant literal offset may exceed 31,
- Implemented by value argument linearization,
- Implemented IGC_ASSERT in IGC/OCLFE,
- Improved DebugInfo robustness by implementing naive error-handling,
- Lifted 4K predicate variable restriction on vISA assembly,
- Made LinearScan default in ForceFastestSIMD,
- Misc. initial edits to the file parsing code in the global scope,
- Moving BiF parsing tools to a separate file,
- Renamed some VC options to have "-vc" prefix instead of "-genx",
- Reworked setPredicateForDiscard() to not use a temporary register for flag storage,
- Select phi input in non-overlapping region,
- Support for function pointer builtins/intrinsics,
- Support for Function pointer SIMD Variants,
- Support for uniformly typed read,
- Unify conditions for llvm::JumpThreading usage,
- Updated copyright headers,
- Updated DPEmu,
- Updated the indirect call info check in SWSB,
- VC: backend can lower lzd64,
- VC: debug info fixes for non-standalone kernels,
- VC: legacy messages legalization to vc-codegen,
- vISA: add helper function for calla check,
- vISA: add HWConformity::fixCalla for HW restriction,
- Simplifying ldrawvector to ldrawindex when we have a case where only one element is being used and we know the offset is a constant integer value,
- Other minor fixes and improvements.
Dependencies revisions
- intel/opencl-clang@c8cd72e
- KhronosGroup/SPIRV-LLVM-Translator@424e375 (for opencl-clang)
- intel/vc-intrinsics@7ee152a
- KhronosGroup/SPIRV-LLVM-Translator@ab5e12a (for VectorCompiler)
- llvm/[email protected]
Ubuntu 18.04 binary packages for LLVM10/Clang10 are included.
igc-1.0.6410
Fixed Issues / Improvements
- Consider a WA table entry before inserting a flush sampler instruction
- Location expressions improvements
- Do not split arithmetic instructions in IGC as vISA will handle it
- Backing out Simple push algorithm Optimization
- Fix reg number issue in translate math
- Changes for -O2. Optimizing non-user functions to save compiling time.
- Fix the SWSB when there is no send in kernel
- Add support to generate thread IDs in 2x2 blocks.
- Seperate global and local variables to reduce compilation time.
- Don't replace OpDecorate with OpGroupDecorate.
- Add InferAddressSpacesPass only if needed.
- Fix crash in SIMD32 mode caused by pseudo_ret instruction's source operand right bound computation.
- Update DispatchGPGPUWalkerAlongYFirst lookup
- Changes for -O2. Optimizing non-user functions to save compiling time.
- Changed ldms and ldmcs convertion to 16bit. Fixed ldmcs usage in users other than 16bit ldms
- Cleanup unnecessary dynamic allocations.
- Avoid warning of implicit i64->i32 by forcing explicit conversion.
- Optimization for signed reminder for constant power of 2 int32.
- Switch TPM to SVM entirely.
- Do not modify wrregion input in non-overlapping region optimization.
- Changed ldms and ldmcs convertion to 16bit. Fixed ldmcs usage in users other than 16bit ldms
- Avoid warning of implicit i64->i32 by forcing explicit conversion
- Simplify usage of IGC_BUILD__VC_ENABLED cmake option Change IGC_VC_DISABLED macro to more consistent IGC_VC_ENABLED
- Removed external dependency on llvm_patches and improved llvm setup in project
- Produce truncate instead of __builtin_spirv_OpUConvert for not rounded/saturated converts
- Fix missing barrier when inline ASM is used in a kernel
- Split uniform into thread uniform, work group uniform, and global uniform. which give us a detail info that could be used to enable better optimization.
- Set InlineAsm usage per function group, to create correct builder for multiple FGs.
- Support for stackcalls with InlineAsm by parsing multiple functions in single text stream.
- Broadcast uniform variables if 'rw' constraint was specified (Inline ASM)
- Optimize generic pointer load for kernels not using local memory.
- Bug fix for SWSB when comparing the footprint.
- Produce truncate instead of __builtin_spirv_OpUConvert for not rounded/saturated converts.
- Extend GAS phi resolution to all loops, not only top level ones.
- Remove the dependence between dummy csel instructions.
- Adds custom iterator class for Function Group. Can iterate through the FunctionGroup class, which uses a 2D vector storage.
- Split uniform into thread uniform, work group uniform, and global uniform. which give us a detail info that could be used to enable better optimization.
- Change OpenCL builtin mad implementation to use fma instruction instead of multiply add.
- Cast Base and Insert parameters to unsigned to avoid sign extension while shifting
- Add check for compute shaders that may need XYZ walk of thread IDs.
- ZEBinary: Fix scractch memory buffer creation.
- If unmasked regions are nested then the most nested intrinsic llvm.genx.GenISA.UnmaskedRegionEnd switched off unmasked code generation, resulting in other embracing nested regions generatedr as masked code.
- Fix missing barrier when inline ASM is used in a kernel.
- Extra flag has been added to WIAnalysis Runner to not mark some uniform instructions as random.
- Added a field to implicit argument structure for stack calls. Modified layout of local ids based on SIMD size.
- IGA: add disassembler option "--output-on-fail"
- Fix discovery of inlined DISubprogram nodes
- Implement support for both SPV-IR forms for BitFieldInsert builtins
- Introduction of new entry in IGC constant folder for bfrev.
- Update TracePointerSource() function to detect cases where two different resource pointer values describe the same resource.
- Vector backend does not support creation of L0 module with external functions. Insert assert in GenXCisaBuilder, explaining that.
- Take SpillMemOffset into consideration when reporting spill size.
- Split send has argument no 4, and it can be addr register. Make sure check dependence on src3 as well.
- Add case when propagating non-generic pointer to store.
- Disable certain transformations when compiling code for debug.
- Add -vc-promote-array-alloca-limit knob to control array promotion total size (2nd edition). Force array promotion for CMRT binary.
- Replace strcat by compound assignment operator
- Now appropriately handling shl instructions with unsupported types.
- More fixes to get local RA to honor declare even-alignment.
- Print SLMsize in compiler output file
- IGA SWSB refactoring: Unify InstType getter function
- Fix missing barrier when inline ASM is used in a kernel
- Extract vc input handling into another function
- Fix an assertion due to unexpected RAUW with a constant
- Extend supported subtargets in VC
- Solve the memory leak issue of SWSB
- Add control to route some resources to LSC/HDC
- Fix scratch surface allocation for VC
- Remove addrspacecast only if there no other uses.
- Set alwaysinline on invoke kernels. Don't add stack call or indirect call attributes.
- Extract vc input handling into another function
- Add interface target for vc intrinsics headers
- Move stepping into Options instead of a global variable.
- Add DoNotSpill attribute for vISA variables.
- ZEBinary: Support buffer_offset implicit argument
- If all its operands are region invariant, an inst is region invariant.
- Commit base data structures for implicit argument handling for bindless offsets. Changes in StatelessToBindless promotion will come later.
- For optnone builtins, allow -O0 flag to determine if we should call them as subroutines or stackcalls.
- Allow EnableA64WA env variable in Linux relesae mode.
- BinaryEncodingIGA: fix math pipe instruction check
- Upgraded error messages with source file locations and names of the kernel causing the error.
- Implement support for both SPV-IR forms for conversion builtins
- Prevent redundant lowering attempt during SIMD CF Conformance
- Now appropriately handling shl instructions with unsupported types.
- Make sure trivial RA honors even-alignment.
- ZEBinary: add regkey to enable .bss section for zero-initialized global variables
- add -vc-promote-array-alloca-limit knob to control array promotion total size
- Add simplify CFG pass to pass manager to simplify work of LICM
- Add an option for GenXPromoteArray threshold
- Debug location expression improvements
- Reduce memory footprint in GraphColor
- Fix binary encoding for simd2 align16 instructions
- Filter out "endif" and "else" when inserting dummy mov.
- Avoid localization of large data for oclbin, use relocation instead.
- Add option for TPM memory placement.
- Correct localization costs for global vectors.
Dependencies revisions
- intel/opencl-clang@c8cd72e
- KhronosGroup/SPIRV-LLVM-Translator@424e375 (for opencl-clang)
- intel/vc-intrinsics@7ee152a
- KhronosGroup/SPIRV-LLVM-Translator@ab5e12a (for VectorCompiler)
- llvm/[email protected]
Ubuntu 18.04 binary packages for LLVM10/Clang10 are included.
igc-1.0.6087
Fixed Issues / Improvements
- Fix a bug when comparing two source regions as type was not considered.
- Other minor fixes and improvements.
Dependencies revisions
- intel/llvm-patches@9cbc7cf
- intel/opencl-clang@c8cd72e
- KhronosGroup/SPIRV-LLVM-Translator@424e375 (for opencl-clang)
- intel/vc-intrinsics@5032643
- KhronosGroup/SPIRV-LLVM-Translator@ab5e12a (for VectorCompiler)
- llvm/[email protected]
Ubuntu 18.04 binary packages for LLVM10/Clang10 are included.
igc-1.0.6083
Fixed Issues / Improvements
- Process legalization of 64-bit moves on VC,
- Added the support for sending the textureID and dimensions value to the driver,
- Support for SPV_EXT_shader_atomic_float_min_max,
- Update the read suppression WA to use less dummy instructions,
- Move CMABI::doInitialization code in a separate helper analysis,
- Do not hash label operands, CreateLabel() should always return a new label,
- Modernize GraphColor code,
- Optimize to generate mad by promoting src2 from :b to :wv,
- Fix for creating a dump directory for Linux,
- Update configuration_flags.md,
- Fix Phi handling in SIMD CF Conformance,
- Introduction of new entry in IGC constant folder for bfi,
- Extend ValueTracker to be able to track inside user functions,
- Remove omitting zero/undef sample params for cube maps,
- Bug fixes for O0 inlining heuristic,
- Corrects a defect where the vISA asm parser erroneously used,
- Move ocl runtime info to headers,
- Added missing code for ADL_S and RKL,
- Fix builtin mangling for OpReadClockKHR,
- Make sure structurizer uses correct mask offset,
- Setting FunctionControl to force indirect call now applies to all user functions,
- Fixed logic error when there exists a reg key which is a substring of another reg key (i.e. ShaderDumpEnable and ShaderDumpEnableAll),
- Add UMD control to disable higher Simds,
- Report private memory usage in assembly dump,
- Other minor fixed and improvements.
Dependencies revisions
- intel/llvm-patches@9cbc7cf
- intel/opencl-clang@c8cd72e
- KhronosGroup/SPIRV-LLVM-Translator@424e375 (for opencl-clang)
- intel/vc-intrinsics@5032643
- KhronosGroup/SPIRV-LLVM-Translator@ab5e12a (for VectorCompiler)
- llvm/[email protected]
Ubuntu 18.04 binary packages for LLVM10/Clang10 are included.
igc-1.0.5964
Fixed Issues / Improvements
- Process legalization of 64-bit moves on VC backend side.
- Renumber subroutines after removing unreachable code.
- Add enviroment variable for OCL debugging options.
- As urem needs positive operands, srcMod could generate negative operands. To be safe, disable srcMod for urem.
- The existing I64 shift emu had wrong result if shift amt is not in [31, 0]. The problem is that the inner condition was generated incorrectly.
- Passing context for code patching.
- Process legalization of 64-bit moves on VC backend side.
- Search for genx.output.1 intrinsics not only in return blocks (some execution paths may have callable instead).
- Fix for DumpToCustomDir flag and logic of ShaderDumpPidDisable flag on Linux.
- Fix: Wrong pattern is matched in GenSpecificPattern.
- Add support for SPV_INTEL_long_constant_composite.
- Process legalization of 64-bit moves on VC backend side.
- Use simple token allocation algorithm in debug mode.
- Disable stateless to stateful promotion after 32 promotion.
- Adding /Ob3 inline expansion option to 64-bit release config for agressive inlining within IGC.
- Fix coalescing of output arguments after migration to genx.output.1 intrinsic.
- Support for SPV_EXT_shader_atomic_float_add extension.
- Guard the FC patch SWSB info generation to save compilation time
- When a byte is promoted to word, its signedness should remain unchanged.
- Temp WA to limit kernel name length
- Initialize address register for indirect addressing if shader has indirect resources accesses i.e. a0 is used in send descriptor.
- Add support for SPV_INTEL_fp_fast_math_mode in SPIRVReader.
- Address register initial support.
- Fix initialization of GenXTidyControlFlow
- Fix push constant threshold for CFL GT3.
- When a byte is promoted to word, its signedness should remain unchanged.
- Fixed performance issues with subroutine inlining heuristic.
- Move block push constants threshold setting from being the default IGC flag value to CPlatform.
- Remove -hasRNEAndRenorm and its associated code.
- int64 mul does not support srcMod
- Refactored some conditions to make the source code more idiomatic and easier to read.
- Change of stateless indirect access reporting mechanism.
- Prevent unnecessary copies generation on GenXCoalescing.
- Extract code that adds Compute Shder CodeGen passes to a separate function.
- Change return type of createWrRegion and small refactoring in GenXBaling
- Hybrid RA with spill
- Maintain physical pred/succ during CFG BB insert/delete.
- RA compilation time--remove unnecessary operations build inteference with local RA.
- Workaround for sampler feedback bug.
- Optimization for signed scalar division for constant power of 2 int as divided.
Dependencies revisions
- intel/llvm-patches@9cbc7cf
- intel/opencl-clang@c8cd72e
- KhronosGroup/SPIRV-LLVM-Translator@424e375 (for opencl-clang)
- intel/vc-intrinsics@5032643
- KhronosGroup/SPIRV-LLVM-Translator@ab5e12a (for VectorCompiler)
- llvm/[email protected]
Ubuntu 18.04 binary packages for LLVM10/Clang10 are included.
igc-1.0.5884
Fixed Issues / Improvements
- Avoid read-modify-write when spilling scalar variables.
- Open source ROCKETLAKE and ALDERLAKE_S
- Refactor spill/fill intrinsic to not rely on the execution size passed in.
- Replace a hot function with templated version for better compile time.
- Move private memory allocations to SLM.
- Add preserve CFG and WIA to AdvMemOpt to save time
- Enable ForceInlineStackCallWithImplArg by default, and -O0 no longer force inlines all function calls.
Dependencies revisions
- intel/llvm-patches@9cbc7cf
- intel/opencl-clang@c8cd72e
- KhronosGroup/SPIRV-LLVM-Translator@424e375 (for opencl-clang)
- intel/vc-intrinsics@5032643
- KhronosGroup/SPIRV-LLVM-Translator@e8a52ab (for VectorCompiler)
- llvm/[email protected]
Ubuntu 18.04 binary packages for LLVM10/Clang10 are included.
igc-1.0.5819
Fixed Issues / Improvements
- Moved FP64 math to separate bc, a second attempt,
- Added debug printouts for DebugInfo,
- Renamed createSrc to make it less verbose and aligned with createDst,
- Added processing of new masked gather intrinsics (gather4_masked_scaled2 and gather_masked_scaled2)
- Moved ModuleAllocaInfo to the header for reuse,
- Unified inlining heuristic for stackcalls and subroutines,
- Cleaned up IR debug dumps,
- Fix for argument indirection with already indirected call,
- Fixed a bug where spilled dst's size was incorrectly computed in debug mode,
- VC: Support FP64 BiF after it was supported in scalar backend,
- Added optimizations for signed division for constant power of 2,
- Cleaned up in GenXCategory,
- Fixed subregister offset for spilled destination,
- IMF LA open-sourcing: Switch back to previous FP32 atan2 implementation,
- Changed reduce implementation to remove extra barriers,
- Corrected wrappers in llvm::DIBuilder,
- Provide the ability to call one kernel from another,
- Reduced time on LiveVar update,
- Don't insert branches in loops and on a big amount of samples,
- Enabled ForceInlineStackCallWithImplArg by default, and -O0 no longer force inlines all function calls,
- Reduced the RA compilation time-use: replace push_back with emplace_back,
- Handle optnone builtins with subroutines instead of stackcalls,
- Other minor fixed and improvements.
Dependencies revisions
- intel/llvm-patches@9cbc7cf
- intel/opencl-clang@4e83bbf
- KhronosGroup/SPIRV-LLVM-Translator@424e375 (for opencl-clang)
- intel/vc-intrinsics@5032643
- KhronosGroup/SPIRV-LLVM-Translator@e8a52ab (for VectorCompiler)
- llvm/[email protected]
Ubuntu 18.04 binary packages for LLVM10/Clang10 are included.
igc-1.0.5761
Fixed Issues / Improvements
- Added padding between globals when encoding,
- Added SPIRVDLL_SRC variable which takes prepared sprivdll sources,
- Added support to emit relocations in debug info,
- Improved LiveVar time by changing data-structure,
- Improvements in VC debug info,
- Increased per-thread stack size for SVM case,
- Made GenXTidyControlFlow actually preserve liveness,
- Moved splitStructPhis implementation to the proper place,
- Optimized generic pointer load for kernels not using local memory,
- Reduced the RA compilation time,
- Reduced the redundant interferences caused by function call,
- Specified type of pointer arithmetic to avoid tagging,
- Updated patch token version,
- Utilized genx.gaddr instrinsic for const/global tables,
- Other minor fixes and improvements.
Dependencies revisions
- intel/llvm-patches@9cbc7cf
- intel/opencl-clang@4e83bbf
- KhronosGroup/SPIRV-LLVM-Translator@424e375 (for opencl-clang)
- intel/vc-intrinsics@a08fe5b
- KhronosGroup/SPIRV-LLVM-Translator@e8a52ab (for VectorCompiler)
- llvm/[email protected]
Ubuntu 18.04 binary packages for LLVM10/Clang10 are included.
igc-1.0.5723
Fixed Issues / Improvements
- Considering uniformness during register pressure estimate,
- Eliminated name length field restriction,
- Enabled spill cleanup for fp based spill/fill,
- Fixed extra option processing for CM online compilation,
- Fixed image tracking for GetBufferPtr scenario,
- Fixed spill code generation for spilled dest with non-zero subregister,
- Fixed the assignment of BTI values in the case of multiple uses,
- IMF LA open-sourcing,
- Implemented SPV_INTEL_unstructured_loop_controls extension,
- Other minor fixes and improvements.
Dependencies revisions
- intel/llvm-patches@9cbc7cf
- intel/opencl-clang@4e83bbf
- KhronosGroup/SPIRV-LLVM-Translator@424e375 (for opencl-clang)
- intel/vc-intrinsics@a08fe5b
- KhronosGroup/SPIRV-LLVM-Translator@e8a52ab (for VectorCompiler)
- llvm/[email protected]
Ubuntu 18.04 binary packages for LLVM10/Clang10 are included.
igc-1.0.5699
Fixed Issues / Improvements
- Added IMF LA math function for FP32
- Avoid OCL kernel recompilation if there is less than 2% spill/fill
- Implement support for implicit arguments in stack call functions
- Fix bug in SPIRV reader to correctly propagate flags
- Local variables no longer optimized out in off-loaded functions
- Use ValueTracker to track width and height media block read/write parameter
- Add support for reading implicit arguments from stack call functions
- Fix vISA parser error for fcall/fret
- Optimize to generate mad by promoting src2 from :b to :w
Dependencies revisions
- intel/llvm-patches@9cbc7cf
- intel/opencl-clang@4e83bbf
- KhronosGroup/SPIRV-LLVM-Translator@424e375 (for opencl-clang)
- intel/vc-intrinsics@a08fe5b
- KhronosGroup/SPIRV-LLVM-Translator@e8a52ab (for VectorCompiler)
- llvm/[email protected]
Ubuntu 18.04 binary packages for LLVM10/Clang10 are included.