Skip to content

docs: retract 43 tok/s projection post-D1b failure; 32 tok/s is the current ANE decode ceiling#80

Merged
john-rocky merged 1 commit into
mainfrom
docs/retract-43-tok-s-projection
Apr 15, 2026
Merged

docs: retract 43 tok/s projection post-D1b failure; 32 tok/s is the current ANE decode ceiling#80
john-rocky merged 1 commit into
mainfrom
docs/retract-43-tok-s-projection

Conversation

@john-rocky
Copy link
Copy Markdown
Owner

Summary

Files touched (one line each)

  • docs/MOBILE_2K_COMPETITIVE_PLAN.md — retraction callout at top; value prop one-liner swapped to ~1 W + ~1 s TTFT (projected, item 27) + 32 tok/s (measured, current); competitive table honest about the 42 % decode gap; §"Projection basis" 43 tok/s subsection rewritten as 32 tok/s ceiling (measured) with root-cause analysis; execution table collapses from A+B to B (item 27) only.
  • docs/PHASE_B_DECISION.md §"What this means for the go-forward target" — D1b item flipped to REGRESSED with structural cause (c3→c4 data dep); item 27 called out as sole tractable decode-adjacent lever; the 43 tok/s claim explicitly retracted inline.
  • docs/PRIORITY_ROADMAP.md — item 27 footnote added marking it as the single critical-path decode-adjacent item on the roadmap after D1b invalidation.
  • docs/HANDOFF.md — read-order now includes the D1b failure evidence (PR feat(pipelining): chunk3 async on .cpuAndGPU — negative result (STOP, do not merge as default) #79 + PHASE_D_PIPELINING_IMPL.md on branch); opening prompt retracts 43 tok/s; next-session start options are item 27 OR one of PR feat(pipelining): chunk3 async on .cpuAndGPU — negative result (STOP, do not merge as default) #79's three conversion/-side options (decoupled c4 / speculative h3 / model re-chunking).

Net: +182 / −83 across 4 docs (~99 net lines, under the 150 cap).

Scope

Test plan

…urrent ANE decode ceiling

PR #78 reframed the value prop around a triad of ~1 W power, ~1 s TTFT,
and ~43 tok/s decode (projected via PR #77's compute-unit-split spike).
PR #79 (open) implemented the full 2-stage pipeline that projection
required and measured a 24 % regression across all 4 prompt categories,
with a bit-exact failure on summary @ token 50 from fp16 rounding
between ANE and GPU backends of chunk 3.

Root cause: the Gemma-4 chunk graph has a strict c3 → c4 data dep
(c4 consumes c3's hidden_states_out). The only within-step overlap
window is a ~1 µs Swift dict-build against ~16 ms GPU c3; the
cross-step pipeline is blocked by the symmetric token-feedback edge.
No non-speculative decode overlap is available on the current graph;
PR #79's three future options all require conversion/-side work.

This commit retracts the ~43 tok/s projection on main and propagates
the consequence: 32 tok/s is the measured ANE decode ceiling, item 27
(GPU prefill / TTFT) is now the single critical-path decode-adjacent
lever, and the gap vs LiteRT-LM on decode widens from 20 % to 42 %.
The UX argument (~1 W, ~1 s TTFT, GPU-free host envelope) carries
the pitch, not decode parity.

Touched:
- MOBILE_2K_COMPETITIVE_PLAN.md: retraction callout, triad update,
  competitive table honesty, Projection basis rewrite, D1b removed
  from execution table (B is now the only item).
- PHASE_B_DECISION.md §"What this means for the go-forward target":
  D1b status flipped to REGRESSED with structural cause, item 27
  elevated to sole decode-adjacent lever.
- PRIORITY_ROADMAP.md item 27: footnote marking it as the single
  critical-path decode-adjacent item after D1b invalidation.
- HANDOFF.md: read-order includes the D1b failure doc; opening
  prompt retracts 43 tok/s, next-session starts are item 27 OR one
  of PR #79's three conversion/-side options.

Preserves history — callouts cite PR #79 / commit 7c21c7b rather
than rewriting the prior reasoning chain.

Total net-added prose ≈ 99 lines across 4 docs. Docs only.
@john-rocky john-rocky merged commit fe646c1 into main Apr 15, 2026
@john-rocky john-rocky deleted the docs/retract-43-tok-s-projection branch April 15, 2026 09:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant