feat(mu): save_mu_method storage policy + 6-grid prior benchmark + 0.16.1 compat#12
Open
al4225 wants to merge 18 commits into
Open
feat(mu): save_mu_method storage policy + 6-grid prior benchmark + 0.16.1 compat#12al4225 wants to merge 18 commits into
al4225 wants to merge 18 commits into
Conversation
Two session notes under inst/notes/sessions/. The plan memo records the 2026-05-03 Slack and meeting synthesis: the six-cell prior_variance_scope * wavelet_qnorm * mixture_null_weight grid; the three-mode save_mu_method design (complete / alpha_collapsed / lead) with the mf_thin() helper; the susieR 0.16.1 trace-format compat finding surfaced when P0 upgraded susieR; and the vignette / test parameter audit that confirmed only one prose disambiguation was needed (ash::nullweight vs mfsusie's mixture_null_weight). The results memo summarises the SLURM job 31970954 baseline run (36 m 22 s wall-clock, 454 MB peak): per_scale + mixture_null_weight = 0.05 hits FDR 0.05 nominal in 11 s, mixture_null_weight = 0 collapses to FDR 0.77 in 191 s with 4-of-10 reps non-converged, and per_scale_normal sits at FDR 0.41-0.48 on the Gaussian fixture. Implications and follow-up scenarios are captured in the memo. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
susieR 0.16.1 added is_compact_track_snapshots(fit$trace) inside ibss_finalize -> make_susie_track_history. The validator requires each tracking list element to be list(alpha = data.frame, effect = data.frame, iteration = data.frame). The old track_ibss_fit.mf_individual wrote list(alpha = matrix, sigma2 = list, pi_V = list, elbo = scalar), which the new validator rejects -- mfsusie(..., track_fit = TRUE) errored with "fit$trace is not a compact SuSiE track". Fix: delegate to susieR's own make_track_snapshot helper (cached as a package-level binding via R/zzz.R::.onLoad alongside the other susieR internals). Pre-sanitise model$sigma2 to NA_real_ for the snapshot copy because mfsusieR's sigma2 is a list[M] whereas susieR's track_scalar assumes a length-1 numeric. The real per-iteration sigma2 stays on the fit; only the trace records NA. fit$elbo is unaffected. Verified against susieR 0.16.1: pre-fix test_ibss_methods_branches.R:35:3 fails on both main and the fix-mu-storage branch (failure pre-dates this PR; surfaced when P0 upgraded susieR). Post-fix the test passes; the full suite is green (0 fail / 0 regression). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add save_mu_method = c("complete", "alpha_collapsed", "lead") to
mfsusie() and the fsusie() wrapper. Default "complete" is unchanged
(p x T_basis[m] per-(effect, outcome) posterior moments, the only
mode supporting model_init warm-starts, predict.mfsusie(newx), and
the per-variant clfsr toggle in plots).
"alpha_collapsed" replaces each p x T matrix by the alpha-weighted
1 x T summary alpha %*% mu_full (and analogously mu2). To keep
coef.mfsusie lossless on the raw-X scale, the fit also carries
fit$coef_wavelet[[l]][[m]] = alpha %*% (mu_full / csd_X) as a
separate 1 x T summary -- per-j csd_X scaling cannot be recovered
after alpha-collapse. Under this mode coef.mfsusie and
mf_post_smooth are numerically equivalent to "complete" at
tolerance 1e-12.
"lead" keeps only the lead variable per effect, where
j* = which.max(alpha[l, ]). The fit also carries fit$top_index[l].
Cheap single-variable summary biased toward j*.
Both 1D modes drop the per-variant axis the SER step needs, so
predict.mfsusie(newx), the plot per-variant clfsr toggle, and
mfsusie(model_init = thinned_fit) error with a save_mu_method
message pointing the user to "complete" or mf_thin().
Add mf_thin(fit, method = c("alpha_collapsed", "lead")) for
post-fit thinning so callers can keep a complete checkpoint for
warm-starts and a thinned copy for distribution.
Implementation:
- R/save_mu_method.R (new): trim (mf_apply_save_mu_method) and
helpers (get_effect_wavelet_moments, get_coef_wavelet_curve,
mf_save_mu_method, mf_has_per_variant_mu) and mf_thin export.
- R/mfsusie.R: new arg + match.arg + model_init guard +
post-finalize step 7 invoking the trim.
- R/mfsusie_methods.R: coef / predict / .post_smooth_scalewise /
.other_effects_pos / clfsr_curves loop refactored to go through
the helpers; predict.mfsusie(newx) errors on 1D modes.
- R/mfsusie_plot.R: compute_clfsr_matrix errors on 1D modes.
Tests: tests/testthat/test_save_mu_method.R locks 12 invariants
including coef numerical equivalence with complete (1e-12), object
size reduction, and the four guard error paths. Each test_that
block prints a [RUNTIME] line.
Docs: vignettes/post_processing.Rmd gains a "Storage modes"
section with a comparison table and the mf_thin() workflow, plus
an ash::nullweight disambiguation note.
NEWS.md entries for save_mu_method and mf_thin.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two R drivers covering the qnorm x prior_variance_scope x
mixture_null_weight grid plus the heavy-tailed and null follow-up
scenarios.
inst/bench/profiling/benchmark_per_scale_normal_6grid.R is the
Gaussian baseline: 6 cells (per_scale x {qnorm F, T} x {mnw 0.05, 0}
plus per_scale_normal x {qnorm F, T}), 5 reps each, on an
n = 84 / p = 500 / M = 2 / T = 64 / L = 10 fixture with three
causal SNPs. Records FDR / power / CS count / CS purity / runtime
/ fit_size / peak memory / niter / converged per cell.
inst/bench/profiling/benchmark_heavy_tailed_null_6grid.R is the
follow-up: same 6-cell prior grid, two new scenarios.
heavy_tailed_signal adds 18% per-cell outlier contamination
(sd = 4 vs the sd = 1 noise floor) so wavelet_qnorm has a path to
matter. null_no_signal removes the causal SNPs so the per-cell
type-I rate at the 0.05 PIP threshold can be measured.
Both drivers write per-replicate .rds and per-cell summary .csv
to inst/bench/profiling/results/ at run time. Those output files
are untracked (machine-specific artifacts); the inst/notes/sessions/
results memo carries the aggregated tables that future readers
need.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Append the heavy_tailed_signal and null_no_signal follow-up sections to inst/notes/sessions/2026-05-03-2314-per-scale-normal-baseline-results.md with the SLURM 31977248 data (1 h 1 m wall-clock, 0.43 GB peak, 60 fits across 6 cells x 5 reps x 2 scenarios). Result snapshot in the SNP-level FDR view: per_scale_normal moves from FDR 0.41-0.48 on the Gaussian baseline to 0.00-0.08 on heavy-tailed and to 0 false discoveries across 10 null reps; per_scale + mnw = 0.05 stays the only per_scale cell that is calibrated end-to-end across the three scenarios; per_scale + mnw = 0 produces 4.2 false credible sets on pure null and 0.71-0.82 FDR on heavy-tailed and motivates the hint planned for PR-2. Implications section recorded in the memo: keep the package defaults (per_outcome scope, mixture_null_weight = NULL = 0.05) for PR-1, and document per_scale_normal as the recommended choice for non-Gaussian / real-data fits in the vignette / ?mfsusie. A default switch is deferred to a later PR after the real-data perm grid finishes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Drop overstated framings in inst/notes/sessions/2026-05-03-2314...md: - "per_scale_normal is the strongest mode" -> "per_scale_normal is well-calibrated under heavy-tailed and null Y in this benchmark"; the Gaussian baseline shows the relative ordering shifts with scenario, so "strongest mode" overreaches what 5 reps per cell can support. - "the cell never fires (zero ... across all 10 reps)" -> "no false discovery fires across the 10 reps; the per-cell rate is bounded above by roughly 10-20% but cannot be distinguished from zero". Observation, not absolute claim. - "sharp reversal from the Gaussian baseline" -> "the relative ordering shifts with scenario". Same observation, lower temperature. - "Recommend `per_scale_normal` as the default for non-Gaussian" -> "Consider `per_scale_normal` for non-Gaussian fits". The benchmark argues for documentation-level guidance, not a recommendation strong enough to drive the default switch. - "wavelet_qnorm matters slightly more on heavy-tailed" -> "adds modest improvement"; drop the "magic fix" rhetorical contrast. Findings 2 and 3 keep their substance (mnw = 0.05 well-calibrated across all three scenarios; mnw = 0 broken in all three) because the data supports those claims at the chosen replicate count. Caveats section: explicit that 10 null reps cannot distinguish "exactly zero false CS" from "rare false CS"; a 20-rep rerun would close the gap. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add an "Inspecting the structure" subsection under "Storage modes:
save_mu_method" in vignettes/post_processing.Rmd. The new chunk
fits the same X / Y under each of the three save_mu_method modes
and calls str() on fit$mu / fit$coef_wavelet / fit$top_index so a
reader can see the shape differences side by side without
re-deriving them from the comparison table:
- "complete": fit$mu[[l]][[m]] is p x T_basis[m]
- "alpha_collapsed": fit$mu[[l]][[m]] is 1 x T_basis[m];
fit$coef_wavelet[[l]][[m]] is 1 x T_basis[m]
(raw-X coef summary)
- "lead": fit$mu[[l]][[m]] is mu_full[j*, ] (1 x T_basis[m]);
fit$top_index[l] = which.max(alpha[l, ])
Keep eval = FALSE so the vignette does not re-fit at build time;
the output dimensions are commented inline.
Also note the attr(fit, "save_mu_method") tag so the reader knows
how downstream code dispatches between modes.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Cut off-scope material from inst/notes/sessions/2026-05-03-2048-mu-storage-and-benchmark-plan.md: - Drop "Source documents" pointer to a private path that is not scientifically reproducible. - Drop the "Issue StatFunGen#8 SSC" subsection (off-topic for the mu-storage plan; SSC math verification belongs in its own follow-up note when the work happens). - Drop the "Issues not yet planned" subsection covering StatFunGen#3, StatFunGen#5, and StatFunGen#11. The GitHub issue tracker is the source of truth for cross-issue triage; replicating it in private notes adds noise without traceable updates. - Replace the multi-resolution PR-1 / PR-2 / PR-3 narrative with the design as it landed: three-mode save_mu_method + mf_thin, benchmark grid as a code block, default-value table. - Tighten the susieR 0.16.1 compat note to a single section that matches what the fix(track) commit body says. - Tighten the parameter-audit findings list. Drop sbatch wrapper references from inst/notes/sessions/2026-05-03-2314-per-scale-normal-baseline-results.md ("Files / references" + the two SLURM intro paragraphs). Only the R driver scripts and the saved .rds / .csv are scientifically load-bearing; the sbatch files are job-submission scaffolding private to a single cluster and have been removed from the chore(bench) commit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Reconcile the apparent mismatch between the vignette's CS-level assessment of per_scale_normal and the per-cell FDR 0.41-0.48 in the baseline memo. They are consistent; they measure different things. Vignette (vignettes/fsusie_intro.Rmd:299-303) and the older inst/bench/profiling/per_scale_normal_vignette_sweep.R use a CS-level classifier: one CS = one judgment, TP if the CS lead variable is a causal SNP or LD-correlates >= 0.5 with one. Under that metric, per_scale_normal and per_scale agree on the Gaussian example. The 6-grid baseline memo uses a SNP-level FDR at the 0.05 PIP threshold: every PIP > 0.05 is one judgment. Under that metric, per_scale_normal lets a few extra non-causal SNPs cross 0.05 even though they sit outside every credible set, so FDR rises to 0.41-0.48 while cs_count stays at 3 with all CS leads on causals. Update Finding 3 in 2026-05-03-2314...md to surface the metric distinction and to drop the "fast but not as well-calibrated" framing, which read like a contradiction. The same fact is now stated as: same CS recovery, looser raw PIP curve. Implication for PR-1 unchanged: do not switch the package default to per_scale_normal on the Gaussian fixture alone, and document the metric distinction in the vignette so users on the PIP-threshold path know what to expect. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Extend both 6-grid benchmark drivers with two new discovery metrics
in addition to the original SNP-loose (PIP > 0.05) view:
CS-level (fdr_cs / power_cs)
Each fit$sets$cs is one judgment; lead = SNP within the CS
with max PIP. TP if lead is causal OR
abs(cor(X[, lead], X[, causal])) >= 0.5 for some causal.
Power = unique causals covered by some TP CS / |true_idx|.
Matches inst/bench/profiling/per_scale_normal_vignette_sweep.R
and vignettes/fsusie_intro.Rmd:299-303.
SNP-level hybrid (fdr_hyb / power_hyb)
A SNP j is one discovery iff
(j in some CS with min.abs.corr >= 0.8)
OR (j is in no CS AND pip[j] > 0.5).
A SuSiE-conventional fine-mapping output: trust high-purity
CSes (every member counts), and outside CSes only credit
standalone SNPs at high PIP.
Each per-replicate rds row also carries fit_pip / fit_cs /
fit_purity / fit_true_idx as list-columns plus rep_seed, so any
future metric can be computed retroactively without re-fitting.
The aggregate output drops the list-columns from the formula so
per-cell summaries remain numeric.
Update inst/notes/sessions/2026-05-03-2314...md with the v3 numbers
(SLURM 31993466 / 31993467; 33 m + 1 h 0 m wall-clock; both exit
0; 0.43 GB peak each). Replace the single-metric per-cell tables
with a three-metric table per scenario, and add a "Metric
definitions" section that states what each metric counts and what
question it answers. Findings restated:
- per_scale_normal is FDR_cs = FDR_hyb = 0 across all three
scenarios (Gaussian baseline / heavy-tailed / null);
cs_count matches the truth in every signal rep and is exactly
zero in every null rep. SNP-loose 0.41-0.48 on Gaussian
reflects sub-CS PIP leakage that conventional fine-mapping
output does not act on.
- per_scale + mnw = 0.05 stays well-calibrated under all three
metrics across all three scenarios with small slippage on
heavy-tailed (FDR loose 0.10-0.15) and null (FDR cs / hyb 0.20-0.40
on the 20-40% of reps that fire any spurious CS).
- per_scale + mnw = 0 is broken under every metric in every
scenario: SNP loose 0.71-0.82, cs / hyb 0.41-0.65, 4.2 false
CSes on pure null, runtime 17-30x mnw=0.05, 1-2 of 5 reps
non-converged.
- wavelet_qnorm = TRUE adds modest improvement on heavy-tailed
Y under both SNP metrics; effect on Gaussian and null is within
rep-to-rep variance.
Implications for PR-1 unchanged: keep the package defaults
(mixture_null_weight = NULL, prior_variance_scope = "per_outcome").
Any switch to per_scale_normal is deferred to a later PR after the
real-data perm grid finishes.
The .rds and .csv outputs under inst/bench/profiling/results/ are
not committed; the rds path is referenced in the memo for
local-reproducibility. The list-column shape lets later metric
additions reuse the existing 5 reps without rerunning.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Drop eval = FALSE from the "Storage modes: save_mu_method" chunks in vignettes/post_processing.Rmd so the vignette actually runs the MWE at build time. The chunks now: - Build three fits (one per save_mu_method mode) on the same N3finemapping subset used by the rest of the vignette. - str() each fit$mu[[1]][[1]] and show coef_wavelet / top_index. - saveRDS each to tempfile() and report a per-mode object.size and on-disk .rds size table. - Report max abs diff of coef() between modes (zero for alpha_collapsed by construction; floating-point only on this small fixture for lead because every CS happens to have purity ~1, but a footnote points to the p = 500 baseline result memo where lead diverges by ~0.108). The saveRDS targets are tempfile() so the vignette does not write files into the user's working directory. Add a "save_mu_method storage MWE" subsection to inst/notes/sessions/2026-05-03-2314-per-scale-normal-baseline-results.md with the actual numbers from a separate p = 500 / T = 64 / M = 2 fixture: complete = 7.8 MB / 2950 KB; alpha_collapsed = 0.50 MB / 230 KB (15.7x in-memory, 12.9x on-disk shrink); lead = 0.49 MB / 226 KB (16.1x / 13.1x). Cross-checked fit$top_index against apply(fit$alpha, 1, which.max). coef() max abs diff: complete vs alpha_collapsed = 0; complete vs lead = 0.108 on this larger fixture. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Vignette: rewrite alpha-trajectory chunk to read from the new susie_track
object (single list of stacked tables, alpha is long-format thresholded
at 1/p) instead of the pre-0.16.1 per-iteration L x p matrix.
test_post_smooth_smash.R: add skip_if_not_installed("smashr") guard to
the second test_that block; smashr is r44-linux-only per pixi config.
…g all sources Rewrote 2026-05-11-mfsusier-vs-mvf-engineering-comparison.md from scratch after reading every source file in both packages (21 R files total). Covers: - Architecture (S3-delegated susie_workhorse vs custom while-loop IBSS) - Y input format, wavelet preprocessing, Bhat/Shat divergence (D2) - Prior structure, M-step, inner EM loop, warm vs cold start - ER2 bias correction bug in mvf (D1), KL sign error (D3), NA bug (D4) - LFSR (both packages implement it via different paths) - Posthoc trait config probabilities (mvf only, Yuan Nat Gen 2024) - Output structure table with correct alpha access patterns - save_mu_method, model_init, greedy/backfit, convergence, nullweight Fixed test_mfsusier_vs_mvf_mwe.R: - mvf alpha is a list of L plain vectors; access via do.call(rbind, fit_mvf$alpha) not lapply(alpha, function(a) a$alpha_f) (that field does not exist) - mvf purity is fit$purity (top-level) not nested in cs[[i]]$purity - alpha row-sum test uses vapply(fit_mvf$alpha, sum, numeric(1)) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…st_outcome_configuration + lbf_variable_outcome mfsusieR posthoc is NOT absent: it is implemented via susieR::susie_post_outcome_configuration(), which reads the L x p x M lbf_variable_outcome array stored during IBSS. Rewrites §12 with exact function signatures, two by= modes, two method= options, and a key- differences table against mvf. Adds Tests 7-8 to the MWE: Test 7 verifies lbf_variable_outcome shape and that susiex output inherits the correct class; Test 8 documents mvf's automatic posthoc storage at fit$posthoc[[l]]. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…S, LFSR, posthoc Rewrites the MWE from scratch with a single lazy-cache fixture that runs both packages once. Eight tests cover: pipeline smoke, alpha/PIP shape, sigma2 D1 divergence, CS structure, HMM smooth output, LFSR range and indexing, and posthoc API (explicit susie_post_outcome_configuration for mfsusieR vs automatic fit$posthoc for mvf). All 47 assertions pass. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
§2 Y input format: mfsusieR does support scalar outcomes via T_m=1 (DWT skipped, single column used directly); the error was saying it does not. Clarifies one-channel vs two-channel (Y_f/Y_u) distinction. §4 Bhat/Shat: notes that mvf's per-variable marginal SE is fixed — cal_Bhat_Shat_multfsusie always calls fsusieR:::cal_Bhat_Shat with no parameter to switch formula. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Three logically independent pieces shipped on this branch:
save_mu_methodstorage policy onmfsusie()/fsusie()(closes Slim / trim mu, mu2 etc #7) +mf_thin()post-fit helper.track_ibss_fit.mf_individual(silent fix discovered while debugging the benchmark driver).Pre-work (already merged before this branch):
remotes::install_github('stephenslab/susieR')to pick up the per-iteration S3 generics and the newmake_track_snapshotdelegate.1.
save_mu_method(PR-1, closes #7)mfsusie()/fsusie()gainsave_mu_method = c('complete', 'alpha_collapsed', 'lead'). Default is'complete'— bit-for-bit identical to prior behavior. The other two trimmu,mu2, andcoef_waveletat fit finalize so downstream consumers get factor-pstorage savings on long fits.mu[[l]][[m]]shapecompletep × T_malpha_collapsed1 × T_mα[l, ] %*% μ_full— PIP-weighted curve, lossless for any α-aware downstreamlead1 × T_mμ_full[j*, ]wherej* = which.max(α[l, ])— biased but cheap;fit\$top_index[l]is added so callers can recover the lead variablecoef_waveletfollows the same trim (α %*% (μ / csd_X)for collapsed; lead row for lead).mf_thin()is a post-fit equivalent — apply the same trim to acompletefit without re-running IBSS.A
model_initguard refuses to warm-start IBSS from a non-completefit (we don't have full per-variablemuto seed with). Tested.Files
R/save_mu_method.R(new, 193 LOC) —mf_apply_save_mu_method()(the trim itself),get_effect_wavelet_moments()/get_coef_wavelet_curve()helpers (socoef.mfsusie/predict.mfsusie/ post-smoothing can read mu uniformly across modes),mf_thin()(exported), model_init guard.R/mfsusie.R— addssave_mu_methodarg,match.argvalidation, finalize step 7 calls the trim.R/mfsusie_methods.R—coef.mfsusie/predict.mfsusie/.post_smooth_scalewise/.other_effects_posrefactored to read mu via the helpers (no behavior change incompletemode).R/mfsusie_plot.R—compute_clfsr_matrixerrors with a clear message on 1D modes (lfsr requires per-variable mu).man/mf_thin.Rd,man/mfsusie.Rd— roxygen regen.tests/testthat/test_save_mu_method.R(new, 152 LOC) — 12test_thatblocks covering shape, equality ofα %*% complete.mutoalpha_collapsed.mu,predict()agreement across modes,mf_thin()round-trip, model_init guard.vignettes/post_processing.Rmd— new section runs all three modes on the same fit, printsstr()ofmu/mu2/coef_waveletand asaveRDS()size table. Chunks areeval=TRUEso the rendered vignette carries real numbers.NEWS.md— entry under 0.0.2.2. susieR 0.16.1 trace-format compat (silent fix)
susieR master moved the per-iteration trace from a list of full models to compact snapshots emitted by
make_track_snapshot(). The oldtrack_ibss_fit.mf_individualreturned the full model object, which started failing the newis_compact_track_snapshotsvalidator after the install_github bump.Files
R/ibss_methods.R—track_ibss_fit.mf_individualdelegates tomake_track_snapshot()and sanitizes the list-shapedsigma2(per-outcome) before handing it to the validator, which expects a scalar.R/zzz.R— cachesmake_track_snapshotin.onLoadso the method can call it without:::.No public API change.
3. 6-grid prior benchmark
Goal: figure out whether
prior_variance_scope = 'per_scale_normal'is a viable replacement for'per_scale'on Gaussian, heavy-tailed, and null data, and whetherwavelet_qnorm/mixture_null_weightknobs interact.Grid
6 cells = (per_scale, per_scale_normal) × (qnorm, noqnorm) × {mixture_null_weight ∈ {0, 0.05} on per_scale only — per_scale_normal has no mixsqp weight to nudge}.
Three scenarios × 6 cells × 5 reps = 90 fits per scenario, 270 total.
Discovery metrics
Three are reported because a single PIP-threshold metric is misleading on functional fits:
pip > 0.05counts (high-recall, noisy).mvfsusiepaper figures.pip > 0.5. Recovers the cases per_scale_normal is good at (highly-pure singleton CS) without the per_scale_normal-specific noise the loose metric picks up.Power = 1.000 in both signal scenarios across all cells (the sims are SNR-saturated by design); the metric story is entirely on FDR.
Files
inst/bench/profiling/benchmark_per_scale_normal_6grid.R(Gaussian baseline, 285 LOC) — driver, fixture spec, eval_fit returning all three metrics, list-column save so metrics can be recomputed retroactively from savedcs/pipwithout re-fitting.inst/bench/profiling/benchmark_heavy_tailed_null_6grid.R(heavy-tailed + null follow-up, 298 LOC).inst/notes/sessions/2026-05-03-2048-mu-storage-and-benchmark-plan.md— plan memo, mu-storage scope only, no off-topic.inst/notes/sessions/2026-05-03-2314-per-scale-normal-baseline-results.md— results memo with 18-cell three-metric tables (Gaussian / heavy-tailed / null), MWE storage section. Tone is calibrated; under loose SNP per_scale_normal looks bad, under CS-level and hybrid it's FDR=0 across all three scenarios. Memo flags the metric dependence rather than picking a winner.Bench results files (
results/*.rds, slurm wrappers) are intentionally not committed.Test plan
pixi run --environment r44 devtools_test— full suite passestests/testthat/test_save_mu_method.R— 32 PASS / 0 FAILpixi run --environment r44 devtools_document— man/ regeneratedvignettes/post_processing.Rmdbuilds witheval=TRUEchunkspixi run --environment r44 rcmdcheck(sanity check before merge)🤖 Generated with Claude Code