feat: translate remaining SAS analysis pipeline to R#8
Conversation
…tives, and mammogram extraction functions - Add predict_survival_unadjusted(), predict_survival_baseline_adjusted(), predict_survival_ipw() with g-computation via pooled logistic regression and RCS splines (splines::ns) - Add compute_ipw_weights() for stabilised IPW weights truncated at p99 - Add fit_outcome_hr() for weighted pooled logistic regression outcome model - Add bootstrap_ci() for percentile bootstrap confidence intervals - Add false_positives() for false-positive rate by arm and screening round - Add extract_screening_mammograms(), extract_any_mammograms(), extract_diagnostic_mammograms() as Medicare claims extraction templates - Add tests for predict_survival, compute_ipw_weights, fit_outcome_model - Bump version to 0.0.0.9015; add splines to Imports - Update NEWS.md with new function entries Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: d-morrison <2474437+d-morrison@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Completes most of the remaining SAS-to-R translation for the package’s target-trial-emulation workflow by adding exported analysis helpers for weighting, survival estimation, bootstrap CIs, false-positive summaries, and mammogram extraction templates.
Changes:
- Added new core analysis functions for IPW construction, survival prediction, hazard-ratio estimation, bootstrap confidence intervals, false-positive summaries, and Medicare claims extraction.
- Exported and documented the new functions, and updated package metadata/description to reflect the broader analysis pipeline.
- Added initial unit tests for survival prediction, IPW weights, and outcome-model structure.
Reviewed changes
Copilot reviewed 23 out of 23 changed files in this pull request and generated 15 comments.
Show a summary per file
| File | Description |
|---|---|
tests/testthat/test-predict_survival.R |
Adds basic tests for unadjusted survival output and empty input handling. |
tests/testthat/test-fit_outcome_model.R |
Adds basic structure/error tests for fit_outcome_hr(). |
tests/testthat/test-compute_ipw_weights.R |
Adds basic tests for returned weight columns and empty input handling. |
R/predict_survival.R |
Introduces unadjusted, baseline-adjusted, and IPW survival prediction helpers plus shared modeling utilities. |
R/fit_outcome_model.R |
Adds pooled-logistic outcome model fitting and OR/CI extraction. |
R/false_positives.R |
Adds false-positive rate summarisation by arm and screening round. |
R/extract_mammograms.R |
Adds screening/diagnostic/combined mammogram extraction helpers for claims data. |
R/compute_ipw_weights.R |
Adds cumulative IPW calculation and 99th-percentile truncation. |
R/bootstrap_ci.R |
Adds bootstrap percentile CI computation for survival differences. |
NEWS.md |
Records the new exported analysis functionality. |
NAMESPACE |
Exports the newly added public functions. |
man/predict_survival_unadjusted.Rd |
Generated documentation for unadjusted survival prediction. |
man/predict_survival_ipw.Rd |
Generated documentation for IPW survival prediction. |
man/predict_survival_baseline_adjusted.Rd |
Generated documentation for baseline-adjusted survival prediction. |
man/fit_outcome_hr.Rd |
Generated documentation for outcome HR fitting. |
man/false_positives.Rd |
Generated documentation for false-positive summaries. |
man/extract_screening_mammograms.Rd |
Generated documentation for screening mammogram extraction. |
man/extract_diagnostic_mammograms.Rd |
Generated documentation for diagnostic mammogram extraction. |
man/extract_any_mammograms.Rd |
Generated documentation for combined mammogram extraction. |
man/ettbc-package.Rd |
Updates package-level description to include the new analysis scope. |
man/compute_ipw_weights.Rd |
Generated documentation for IPW weight computation. |
man/bootstrap_ci.Rd |
Generated documentation for bootstrap CI estimation. |
DESCRIPTION |
Bumps development version, broadens package description, and adds splines to Imports. |
|
@copilot apply changes based on the comments in this thread and fix spellcheck |
…check Agent-Logs-Url: https://github.com/ucdavis/ettbc/sessions/c45b07db-8114-4722-bd90-2561239310c7 Co-authored-by: d-morrison <2474437+d-morrison@users.noreply.github.com>
Agent-Logs-Url: https://github.com/ucdavis/ettbc/sessions/c45b07db-8114-4722-bd90-2561239310c7 Co-authored-by: d-morrison <2474437+d-morrison@users.noreply.github.com>
Done in 84f1910. Here's a summary of what was addressed from the review thread:
|
Done in b472312. Three issues from review 4269647539 addressed:
|
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 17 out of 28 changed files in this pull request and generated 3 comments.
Files not reviewed (11)
- man/bootstrap_ci.Rd: Language not supported
- man/compute_ipw_weights.Rd: Language not supported
- man/ettbc-package.Rd: Language not supported
- man/extract_any_mammograms.Rd: Language not supported
- man/extract_diagnostic_mammograms.Rd: Language not supported
- man/extract_screening_mammograms.Rd: Language not supported
- man/false_positives.Rd: Language not supported
- man/fit_outcome_hr.Rd: Language not supported
- man/predict_survival_baseline_adjusted.Rd: Language not supported
- man/predict_survival_ipw.Rd: Language not supported
- man/predict_survival_unadjusted.Rd: Language not supported
|
@copilot apply changes based on the comments in this thread and iterate with copilot review before sending this PR back to me |
Agent-Logs-Url: https://github.com/ucdavis/ettbc/sessions/dd7e51ed-6843-440b-88b7-70c96d94cf77 Co-authored-by: d-morrison <2474437+d-morrison@users.noreply.github.com>
Agent-Logs-Url: https://github.com/ucdavis/ettbc/sessions/dd7e51ed-6843-440b-88b7-70c96d94cf77 Co-authored-by: d-morrison <2474437+d-morrison@users.noreply.github.com>
…ence Agent-Logs-Url: https://github.com/ucdavis/ettbc/sessions/dd7e51ed-6843-440b-88b7-70c96d94cf77 Co-authored-by: d-morrison <2474437+d-morrison@users.noreply.github.com>
Agent-Logs-Url: https://github.com/ucdavis/ettbc/sessions/dd7e51ed-6843-440b-88b7-70c96d94cf77 Co-authored-by: d-morrison <2474437+d-morrison@users.noreply.github.com>
…ming note Agent-Logs-Url: https://github.com/ucdavis/ettbc/sessions/dd7e51ed-6843-440b-88b7-70c96d94cf77 Co-authored-by: d-morrison <2474437+d-morrison@users.noreply.github.com>
… with @nord pattern) Agent-Logs-Url: https://github.com/ucdavis/ettbc/sessions/dd7e51ed-6843-440b-88b7-70c96d94cf77 Co-authored-by: d-morrison <2474437+d-morrison@users.noreply.github.com>
Agent-Logs-Url: https://github.com/ucdavis/ettbc/sessions/dd7e51ed-6843-440b-88b7-70c96d94cf77 Co-authored-by: d-morrison <2474437+d-morrison@users.noreply.github.com>
…return Agent-Logs-Url: https://github.com/ucdavis/ettbc/sessions/dd7e51ed-6843-440b-88b7-70c96d94cf77 Co-authored-by: d-morrison <2474437+d-morrison@users.noreply.github.com>
…_basis() Agent-Logs-Url: https://github.com/ucdavis/ettbc/sessions/dd7e51ed-6843-440b-88b7-70c96d94cf77 Co-authored-by: d-morrison <2474437+d-morrison@users.noreply.github.com>
Agent-Logs-Url: https://github.com/ucdavis/ettbc/sessions/dd7e51ed-6843-440b-88b7-70c96d94cf77 Co-authored-by: d-morrison <2474437+d-morrison@users.noreply.github.com>
Done in f352817. Three issues from review 4270685928 addressed, iterated through code review until no substantive issues remained:
|
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 17 out of 28 changed files in this pull request and generated 5 comments.
Files not reviewed (11)
- man/bootstrap_ci.Rd: Language not supported
- man/compute_ipw_weights.Rd: Language not supported
- man/ettbc-package.Rd: Language not supported
- man/extract_any_mammograms.Rd: Language not supported
- man/extract_diagnostic_mammograms.Rd: Language not supported
- man/extract_screening_mammograms.Rd: Language not supported
- man/false_positives.Rd: Language not supported
- man/fit_outcome_hr.Rd: Language not supported
- man/predict_survival_baseline_adjusted.Rd: Language not supported
- man/predict_survival_ipw.Rd: Language not supported
- man/predict_survival_unadjusted.Rd: Language not supported
|
@copilot apply changes based on the comments in this thread |
…copilot-instructions clarification Agent-Logs-Url: https://github.com/ucdavis/ettbc/sessions/1704c56c-dd99-4d2f-bd9a-3e1d91281935 Co-authored-by: d-morrison <2474437+d-morrison@users.noreply.github.com>
Done in cbed3ac. Four issues from review 4274879388 addressed:
|
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 17 out of 28 changed files in this pull request and generated 4 comments.
Files not reviewed (11)
- man/bootstrap_ci.Rd: Language not supported
- man/compute_ipw_weights.Rd: Language not supported
- man/ettbc-package.Rd: Language not supported
- man/extract_any_mammograms.Rd: Language not supported
- man/extract_diagnostic_mammograms.Rd: Language not supported
- man/extract_screening_mammograms.Rd: Language not supported
- man/false_positives.Rd: Language not supported
- man/fit_outcome_hr.Rd: Language not supported
- man/predict_survival_baseline_adjusted.Rd: Language not supported
- man/predict_survival_ipw.Rd: Language not supported
- man/predict_survival_unadjusted.Rd: Language not supported
|
@copilot apply changes based on the comments in this thread |
…generalization, NEWS cli import Agent-Logs-Url: https://github.com/ucdavis/ettbc/sessions/da8a9b08-d9f9-45b3-addb-6c48acef64f3 Co-authored-by: d-morrison <2474437+d-morrison@users.noreply.github.com>
Done in 8efa3fc. Four documentation issues from review 4278124228 addressed:
|
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 17 out of 28 changed files in this pull request and generated 3 comments.
Files not reviewed (11)
- man/bootstrap_ci.Rd: Language not supported
- man/compute_ipw_weights.Rd: Language not supported
- man/ettbc-package.Rd: Language not supported
- man/extract_any_mammograms.Rd: Language not supported
- man/extract_diagnostic_mammograms.Rd: Language not supported
- man/extract_screening_mammograms.Rd: Language not supported
- man/false_positives.Rd: Language not supported
- man/fit_outcome_hr.Rd: Language not supported
- man/predict_survival_baseline_adjusted.Rd: Language not supported
- man/predict_survival_ipw.Rd: Language not supported
- man/predict_survival_unadjusted.Rd: Language not supported
|
@copilot apply changes based on the comments in this thread |
Agent-Logs-Url: https://github.com/ucdavis/ettbc/sessions/81178e8c-a33e-457b-839e-f1568fab67c1 Co-authored-by: d-morrison <2474437+d-morrison@users.noreply.github.com>
…g_data for bootstrap Agent-Logs-Url: https://github.com/ucdavis/ettbc/sessions/81178e8c-a33e-457b-839e-f1568fab67c1 Co-authored-by: d-morrison <2474437+d-morrison@users.noreply.github.com>
Done in 7e5a072 (with a491163 staging the pre-existing uncommitted doc changes). Three issues from review 4278478682 addressed:
|
|
|
|
||
| These commands must be run in this order and all must pass without errors before pushing changes or requesting code review. This ensures that CI/CD workflows will pass and prevents wasting reviewer time on fixable issues. | ||
|
|
||
| **POLICY: After completing all code changes, call the `parallel_validation` tool and iterate — applying any feedback raised by the code review or CodeQL scan — until `parallel_validation` reports no new issues. Only then consider the task done.** Note: `parallel_validation` is a Copilot coding agent tool (invokable from within a cloud agent session); it is not a shell command or script in this repository. |
There was a problem hiding this comment.
parallel_validation doesn't seem to be real; instead, just iteratively request copilot reviews from inside the cloud agent session until the reviews come back clean
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 17 out of 28 changed files in this pull request and generated 2 comments.
Files not reviewed (11)
- man/bootstrap_ci.Rd: Language not supported
- man/compute_ipw_weights.Rd: Language not supported
- man/ettbc-package.Rd: Language not supported
- man/extract_any_mammograms.Rd: Language not supported
- man/extract_diagnostic_mammograms.Rd: Language not supported
- man/extract_screening_mammograms.Rd: Language not supported
- man/false_positives.Rd: Language not supported
- man/fit_outcome_hr.Rd: Language not supported
- man/predict_survival_baseline_adjusted.Rd: Language not supported
- man/predict_survival_ipw.Rd: Language not supported
- man/predict_survival_unadjusted.Rd: Language not supported
| build_model_formula <- function(ns_col_names, covariate_cols = NULL) { | ||
| ns_interaction_terms <- paste0("STOPBASE:", ns_col_names) | ||
| base_terms <- c( | ||
| "STOPBASE", "STOPBASE:month3", | ||
| ns_interaction_terms, | ||
| "month3", ns_col_names | ||
| ) | ||
| all_terms <- c(base_terms, covariate_cols) | ||
| stats::as.formula(paste("dead_t1 ~", paste(all_terms, collapse = " + "))) | ||
| } |
| # assigning new integer IDs to handle duplicates | ||
| boot_groups <- lapply(seq_along(boot_ids), function(i) { | ||
| grp <- long_data_split[[as.character(boot_ids[i])]] | ||
| grp[[id_col]] <- i |
Translates the remaining SAS files from the García-Albéniz et al. (2020) analysis pipeline into exported R functions, completing the core statistical methodology for target trial emulation. Includes correctness fixes applied in response to code review feedback.
New functions
compute_ipw_weights()cann17b+cann18predict_survival_unadjusted()cann15predict_survival_baseline_adjusted()cann16predict_survival_ipw()cann21fit_outcome_hr()cann20bootstrap_ci()cann23+cann24cli::cli_warn(); RNG state restored on exit;n_bootvalidated to be ≥ 1false_positives()cann30extract_screening_mammograms()/extract_any_mammograms()/extract_diagnostic_mammograms()b02_*Time in all survival models is modeled with restricted cubic splines via
splines::ns()(knots at months 6, 48, 72, matching the SAS%RCSPLINEmacro).splinesandcliadded toImports.Correctness fixes (from code review)
predict_survival_*:max_monthfilter is now applied before fitting the pooled logistic model (not only at prediction time), matching the SAScann15/16/21cutoff behaviour. Addedid_colparameter; fixed hardcoded"id"column instandardize_survival().check_both_arms()is now called on both the raw input and the filteredfit_data(aftermax_month/NAoutcome filtering) to catch cases where filtering removes all rows for one arm.check_both_arms()now also rejectsNAarm values and any arm values other than"STOPBASE"or"CONTINUE", erroring instead of silently misclassifying unexpected inputs.standardize_survival()now errors with a clear message when no baseline (month_col == 0) rows are found infit_data, preventing silent return of a spurious perfect-survival curve.weight_colis validated to exist infit_dataand to be numeric before being passed toglm(), preventing silent unweighted fitting when the column is missing.id_colis validated to exist infit_databefore use instandardize_survival(), replacing a low-level error with a clearcli_abort()message. The spline basis is now computed dynamically via acompute_ns_basis()helper that renames its output columns tons1,ns2, ...,nsK: all columns are generated from the fullrcs_knotsvector (not just the first two), so no spline terms are silently dropped when more than one interior knot is specified.rcs_knotsis validated to have at least three entries (two boundary + at least one interior knot) and its documentation now correctly states it accepts a vector of at least 3 elements;compute_ns_basis()@returndocs corrected to state the basis haslength(rcs_knots) - 1columns (notlength(rcs_knots) - 2).standardize_survival()reusescompute_ns_basis()at prediction time, keeping fit and prediction bases consistent.covariate_colsinpredict_survival_ipw()is now documented to require baseline-measured covariates, matching the standardization logic that operates over baseline rows only.compute_ipw_weights: 99th-percentile truncation is now computed separately within each arm (STOPBASE and CONTINUE), reproducing the two-stepcann18approach. Arm-specific truncation guards against single-arm input. Theanymammo_colparameter has been removed;tslm_lagalready measures months since the last any mammogram (screening or diagnostic) and implicitly captures compliance-window resets. CONTINUE-arm weight logic corrected: (a) weight now updates at every month in the compliance window (tslm_lag11–13), not only whenscrmammo == 1; (b) numerator now uses conditional probabilities under the discrete uniform distribution (1/3 at month 11, 1/2 at month 12, 1 at month 13), matching SAScann17b; (c) weight updates now stop after a breast-cancer diagnosis (bc_month_colandmonth2_colpassed to internal helper); (d)NApredicted probabilities in the CONTINUE-arm helper are now treated as0.0(consistent with the STOPBASE branch) rather than the previous arbitrary0.5. Arm dispatch now guards againstNAarm values explicitly (viais.na(arm)check before%in%) to avoid a base-Rmissing value where TRUE/FALSE needederror, then aborts for any value other than"STOPBASE"or"CONTINUE". The@examplessection no longer references the removedanymammocolumn.pred_prob_colis now validated to exist inlong_data, be numeric, and fall within [0, 1] before any row-level processing begins. The internal temporary row-index column now uses a collision-resistant name (..ettbc_row_idx..) to avoid silently overwriting any existing user column with the same name.false_positives: Evaluations are filtered to within each arm's observed follow-up before counting. Period classification now uses the 0-indexedmonth2follow-up column (parameter renamedhist_month_col→hist_month2_col). Duplicate evaluations withinwindow_monthsper participant-arm are deduplicated. An empty-result data frame is returned safely when no IDs match the cohort or whenlong_datahas 0 rows (even ifhist_datais non-empty).@returndocs clarified: arm-period combinations with no histological evaluations are omitted from the result.arm_colvalues are now validated up front:NAor any value other than"STOPBASE"/"CONTINUE"raises acli_abort()error before any processing occurs.NAvalues inhist_month2_colnow abort with a clear error message rather than causing a base-Rmissing value where TRUE/FALSE needederror in the deduplication loop.fit_outcome_hr: Addedcluster_id_colparameter; confidence intervals are now computed as Wald CIs in both thesandwichand fallback branches (consistent named numeric vector output). Whensandwichis available, cluster-robust variance viasandwich::vcovCL()is used, matching the SASPROC SURVEYLOGISTICapproach.sandwichadded toSuggests.@detailsformula generalized to usens1 + … + nsK(instead of hard-codedns1/ns2) to reflect dynamic spline column generation fromrcs_knots. The STOPBASE main-effect OR is documented as the baseline-time ratio from a model with arm-by-time interactions. Added single-arm guard on both input and filtered model data.weight_colis validated to exist infit_dataand to be numeric before use;cluster_id_colis validated to exist infit_dataand to contain no missing values before being passed tosandwich::vcovCL(), replacing a low-level NULL-access error with a clearcli_abort()message. Now usescompute_ns_basis()and dynamicns_col_namesfor consistency with the survival prediction functions.rcs_knotsdocumentation updated to reflect that the parameter accepts a vector of at least 3 elements (two boundary + at least one interior knot).bootstrap_ci: Returns an NA-filled data frame for empty input instead of erroring with mismatched column lengths. Failed iterations are now counted;cli::cli_warn()is issued when the failure rate exceeds the newfail_thresholdargument (default 10%); warning message formatting corrected (removed extraneous backslash).set.seed()now saves and restores the caller's RNG state on exit so seeding is confined to the function call.col_quantile()passesnames = FALSEtostats::quantile()soapply()always returns a plain numeric vector (not a named 1×N matrix).id_colis now forwarded to both the point-estimate and bootstrap-iteration calls topredict_survival_ipw(), so non-default participant ID column names are handled correctly.n_bootis now validated to be ≥ 1, with a clearcli_abort()error for zero or negative values.rcs_knotsdocumentation updated to reflect that vectors with at least 3 elements are supported.long_datais now pre-split byid_colonce before the bootstrap loop (instead of callingmerge()on every iteration), substantially reducing overhead on large datasets.extract_mammograms_impl: Empty-resultidcolumn now preserves the type ofclaims[[id_col]]instead of defaulting tointeger(0).NEWS.md: Updated dependency entry to record that bothsplinesandcliwere added toImports.Tests added
test-bootstrap_ci.R: output structure, empty-input guard, seed reproducibilitytest-extract_mammograms.R: HCPCS code filtering, month conversion, empty result, diagnostic/any variantstest-false_positives.R: structure, empty hist data, empty long data (with non-empty hist data), no-match cohort, censoring filter, deduplication, period classificationtest-predict_survival.R: added coverage forpredict_survival_baseline_adjusted()andpredict_survival_ipw()(output structure, empty-data handling, weight handling)test-compute_ipw_weights.R: added deterministic fixtures for STOPBASE grace-period behaviour, CONTINUE-arm compliance-window weight updates (tslm_lag11–13), and per-arm 99th-percentile truncationChecklist
Fixes #issue-number(if relevant).-.testthat).