Skip to content

Conversation

@danhoeflinger
Copy link
Contributor

@danhoeflinger danhoeflinger commented Nov 18, 2025

Align __get_sycl_range with SYCL runtime behavior for write access mode

Fixes #1272

Summary

This PR aligns __get_sycl_range's handling of the write access mode with SYCL runtime semantics by:

  1. Adding support for the no_init property to control copy-in behavior
  2. Making write mode perform copy-in by default (SYCL-compliant)
  3. Fixing transform_if patterns to use proper access modes instead of workarounds
  4. Removing vestigial _Iterator template parameter from __get_sycl_range

Changes

Core Implementation

  • Added bool _NoInit = false template parameter to __get_sycl_range
  • Updated __is_copy_direct_v to make write mode copy-in by default unless no_init is specified
  • Now properly aligns with SYCL specification: write implies copy-in unless suppressed with no_init

Preserved Existing Behavior (No Functional Changes)

Updated all existing write mode callsites to use /*_NoInit=*/true, preserving their current no-copy-in behavior:

  • algorithm_impl_hetero.h: 8 callsites (copy_if, partition_copy, unique_copy, merge, reverse_copy, rotate_copy, set
    operations)
  • numeric_impl_hetero.h: 3 callsites (transform_scan variants, adjacent_difference)
  • async_impl_hetero.h: 2 callsites (async operations)
  • parallel_backend_sycl.h: 3 callsites (set operation temporary buffers)
  • single_pass_scan.h: 1 callsite (kernel template)

Fixed transform_if Workarounds

Changed __pattern_walk2_transform_if and __pattern_walk3_transform_if from using read_write (workaround) to write
(without no_init):

  • Uses proper semantics: copy-in to preserve non-transformed elements
  • Eliminates confusing access mode misuse that could cause issues with vectorized paths
  • Updated comments to reflect the proper solution

Fixed Histogram Pattern

Changed histogram implementation from write workaround to proper read_write + no_init:

  • Resolves TODO comment requesting this functionality
  • Correctly expresses intent: kernel needs to read bins for atomic updates, but doesn't need initial copy-in

Cleanup

  • Removed unused _Iterator template parameter from __get_sycl_range (noted in TODO)

Bonus Bug Fix

  • Fixed issue in esimd_radix_sort, where the wrong __get_sycl_range result was used for output keys and values. Caught in review.

 * separated no_init from write
 * remove unnecessary type specification

Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
@danhoeflinger danhoeflinger marked this pull request as draft November 18, 2025 17:25
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR refactors __get_sycl_range to align with SYCL runtime semantics for the write access mode. The primary change introduces a _NoInit template parameter to control copy-in behavior, making write mode perform copy-in by default (SYCL-compliant) unless explicitly suppressed.

Key Changes:

  • Added _NoInit template parameter to __get_sycl_range to control copy-in behavior for write access mode
  • Updated transform_if patterns to use proper write access mode instead of read_write workaround
  • Fixed histogram pattern to use read_write + no_init instead of write workaround

Reviewed Changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
utils_ranges_sycl.h Core implementation: added _NoInit parameter, removed unused _Iterator parameter, updated __is_copy_direct_v logic
algorithm_impl_hetero.h Updated all callsites to remove _Iterator parameter; added /*_NoInit=*/true to preserve existing behavior for write mode; fixed transform_if patterns
numeric_impl_hetero.h Updated callsites to remove _Iterator parameter and add /*_NoInit=*/true for write mode
histogram_impl_hetero.h Fixed histogram to use read_write + no_init instead of write workaround; removed _Iterator parameter from callsites
parallel_backend_sycl.h Updated set operation temporary buffers with /*_NoInit=*/true; removed _Iterator parameter
binary_search_impl.h Removed unused _Iterator template parameter from all __get_sycl_range calls
async_impl_hetero.h Updated async operations with /*_NoInit=*/true for write mode
glue_async_impl.h Removed _Iterator parameter from sort_async
single_pass_scan.h Updated scan kernel template with /*_NoInit=*/true
esimd_radix_sort_dispatchers.h Removed _Iterator parameter from radix sort dispatcher
esimd_radix_sort.h Removed _Iterator parameter from all radix sort variants

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Align __get_sycl_range with SYCL runtime in it's treatment of write access mode and no_init{}

1 participant