Skip to content

[Repo Assist] perf: word-at-a-time backward copy in l_memmove#122

Draft
github-actions[bot] wants to merge 1 commit intomainfrom
repo-assist/perf-memmove-backward-waat-2026-04-09-f4453f9fbe69eafc
Draft

[Repo Assist] perf: word-at-a-time backward copy in l_memmove#122
github-actions[bot] wants to merge 1 commit intomainfrom
repo-assist/perf-memmove-backward-waat-2026-04-09-f4453f9fbe69eafc

Conversation

@github-actions
Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot commented Apr 9, 2026

🤖 This PR was created by Repo Assist, an automated AI assistant.

Summary

The backward-copy branch of l_memmove (triggered when dst > src and the ranges overlap) previously used a plain byte-at-a-time loop. This PR applies the same word-at-a-time technique already present in the forward branch.

Root cause

The forward path was already optimised (PR #107 era), but the backward path was left as:

while (len--)
    *--d = *--s;

For large overlapping copies (e.g. memmove(p+1, p, 64KB)) this is 8× slower than necessary on 64-bit targets.

Fix

Three-phase backward copy (mirrors the forward path):

  1. Tail alignment — byte-copy trailing bytes until d is word-aligned.
  2. Word-at-a-time — if s is also word-aligned, copy sizeof(uintptr_t) bytes per step backward using may_alias word pointers.
  3. Head tail-out — byte-copy remaining leading bytes.

Read-before-write safety is preserved: in the backward pass d > s throughout, so each word write always lands at a higher address than the corresponding word read. No source data is overwritten before it is consumed.

Tests added (test_memmove)

Test Purpose
Large backward overlap 64B (dst = src+1) Exercises the new word-at-a-time path
Large forward overlap 64B (dst = src-1) Confirms existing forward path handles large moves
Large non-overlapping copy 64B Baseline sanity check

Test Status

Target Build Test
Linux gcc ✅ PASS ✅ PASS (1585 assertions)
Linux clang ✅ PASS ✅ PASS (1583 assertions)
ARM/AArch64 ⚠️ SKIP (cross-compilers unavailable in CI env)

Generated by 🌈 Repo Assist at {run-started}. Learn more.

To install this agentic workflow, run

gh aw add githubnext/agentics/workflows/repo-assist.md@1f672aef974f4246124860fc532f82fe8a93a57e

The backward branch (dst > src, overlapping regions) used a plain
byte-at-a-time loop.  This commit applies the same word-at-a-time
technique already used in the forward branch:

1. Byte-copy trailing bytes until d is word-aligned.
2. If s is also word-aligned, copy sizeof(uintptr_t) bytes per
   iteration backward using may_alias word pointers.
3. Byte-copy the remaining head bytes.

The read-before-write ordering is preserved: in the backward pass
d > s throughout, so each word write lands ahead of (higher address
than) the corresponding word read, making the copy safe for all
overlapping configurations.

Tests added to test_memmove():
- large backward overlap (64 bytes, dst = src+1) — exercises the
  new word-at-a-time path
- large forward overlap (64 bytes, dst = src-1) — confirms the
  existing forward path also handles large moves correctly
- large non-overlapping copy (64 bytes) — baseline sanity check

CI: Linux gcc + clang, 1585/1583 assertions PASS

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants