Add datasight integration and torc-server export command by daniel-thom · Pull Request #299 · NatLabRockies/torc

daniel-thom · 2026-05-03T17:46:26Z

Summary

Adds a datasight integration guide and reference files (schema, schema_description, queries) under examples/datasight/ for natural-language SQL exploration of a torc database.
Adds a new torc-server export subcommand that produces a standalone, optionally-filtered SQLite copy of the live database with original workflow and job IDs preserved verbatim — addresses a longstanding gap where torc workflows export/import renumbers IDs on import, breaking log-line debugging for end users handed a workflow copy.
Updates admin/server-deployment.md with operator-facing documentation for the new command and the rationale for its access-control sanitization defaults.

How the export works

VACUUM INTO '<output>' — transactionally consistent, defragmented snapshot. Does not block source writers/readers. The export connection participates in SQLite's normal WAL coherency, so the snapshot reflects every committed transaction the running server can see.
Open the snapshot with foreign_keys=ON and (if a filter was requested) run a single DELETE FROM workflow WHERE id NOT IN (<filter>). The schema's existing ON DELETE CASCADE chain on every per-workflow table cleans up jobs, files, results, events, ro_crate entities, etc. for the rows the filter actually deleted.
Sweep orphans (always). Cascade only fires for parent rows the filter deleted, so any pre-existing orphans in the source DB (rows whose workflow_id already pointed at a missing workflow before the export ran — typically from a delete_workflow code path that bypassed cascade by toggling PRAGMA foreign_keys = OFF, or a bare sqlite3 CLI session, which defaults to FKs off) survive VACUUM INTO. The export iteratively runs PRAGMA foreign_key_check and deletes every reported violation until none remain. workflow_status is pruned separately (its back-reference column has no FK declared and so is invisible to foreign_key_check). Runs in unfiltered mode too — FK violations are data corruption, not fidelity to the source.
If a filter was applied and --preserve-access-groups is not set, wipe user_group_membership (no per-workflow scoping → leaks unrelated users' affiliations) and access_group (which cascades workflow_access_group). Rationale: there's no clean per-workflow filter for these tables that wouldn't risk leaking entries about other users/groups, so the conservative default is to drop them and let admins opt back in for vetted recipients.
Final VACUUM (skip with --no-vacuum) reclaims space freed by the deletes.

If anything in steps 2–5 fails after step 1 has written the snapshot, the partial output file is removed before the error is reported — a failed export never leaves a half-finished database on disk.

Filter modes (mutually exclusive)

Form	Behavior
(no filter)	Full copy — keeps ACL tables intact regardless of `--preserve-access-groups`.
`--user alice [--user bob …]`	`workflow.user IN (…)`.
`--access-group 7 [--access-group 9 …]`	`workflow.id IN (SELECT workflow_id FROM workflow_access_group WHERE group_id IN (…))`.
Positional IDs `42 99 …`	`workflow.id IN (…)`.

Empty filter result → error + remove the partially-written output file. Output exists → refuse unless --overwrite.

Test plan

3 unit tests for helpers in src/server/export.rs (placeholder generation, id-list formatting, URL normalization).
15 integration tests in tests/test_export.rs driving run_export directly against a freshly-migrated SQLite DB seeded with two access groups and three workflows:
- All four filter shapes (None, single user, single workflow ID, single access group).
- All three multi-value variants (multi user, multi workflow ID, multi access group — exercising the multi-bind / comma-joined-id SQL paths).
- Nonexistent workflow ID is silently ignored when mixed with valid IDs.
- Default ACL-table sanitization wipes the three tables; --preserve-access-groups keeps them.
- Full copy preserves ACL tables verbatim (operator already has full access).
- Pre-existing orphans (injected with foreign_keys = OFF) are pruned in both filtered and unfiltered modes; pragma_foreign_key_check reports zero violations on the output.
- --no-vacuum (run_final_vacuum=false) still produces a valid filtered DB with no FK violations.
- Empty filter result errors out and removes the partial file.
- Refuse-to-overwrite behavior, then succeeds with --overwrite.
Cross-era cascade coverage: tests seed event, failure_handler, ro_crate_entity, and slurm_stats rows so a future migration that loses ON DELETE CASCADE on workflow_id would fail the suite.
cargo build, cargo clippy --all --all-targets --all-features -- -D warnings, cargo fmt --check, dprint check — all clean.
Verified end-to-end against a real production DB: a filtered export that previously produced 166 orphan rows across 12 tables now reports 0 FK violations after the orphan-pruning fix.

Follow-up

The workflow_status table is a 1:1 indirection from workflow whose missing FK is the proximate cause of one of the orphan classes addressed here. Filed as #300 — out of scope for this PR.

🤖 Generated with Claude Code

Adds a new "Analyzing Workflows with datasight" guide plus reference files (schema.yaml, schema_description.md, queries.yaml) under examples/datasight/ for pointing the datasight SQL-exploration tool at a torc database. End users typically can't read the production server's SQLite file, and the existing JSON export/import flow renumbers IDs on import — breaking log-line debugging. To address this, adds `torc-server export`, which produces a standalone SQLite copy of the live database with original workflow and job IDs preserved verbatim. Filters: --user (repeatable), --access-group (repeatable), or positional workflow IDs. Without a filter, produces a full copy. Uses VACUUM INTO for a transactionally consistent snapshot, then a single DELETE driven by the schema's existing ON DELETE CASCADE chain to prune non-matching workflows. Filtered exports strip access_group and user_group_membership by default to avoid leaking entries about other users and groups; --preserve-access-groups overrides for full copies or vetted recipients. Documents the new path in datasight.md (admin-mediated SQLite export becomes the recommended default for end users) and in admin/server-deployment.md (operator-facing flag reference + rationale for the sanitization defaults). Tests: 3 unit tests for the helper functions and 12 integration tests covering all four filter shapes (None / single user / single ID / single access group), all three multi-value variants (multi user / multi ID / multi access group), nonexistent-ID handling, both sanitization values, the empty-result error, and overwrite-refusal. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Copilot

Pull request overview

This PR adds a new torc-server export workflow for producing filtered SQLite snapshots of a live Torc database, and documents a new datasight-based analysis path that uses either direct DB access or those exported snapshots. It fits into the codebase as an operator/end-user debugging and analysis improvement around the server’s SQLite storage model.

Changes:

Adds torc-server export CLI support and the underlying export/filter/sanitization implementation.
Adds integration/unit tests for export behavior across filter modes, overwrite handling, and ACL sanitization.
Adds datasight reference files and documentation for analyzing Torc workflow databases.

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
`tests/test_export.rs`	Adds export integration tests and helper seeding utilities.
`src/server/export.rs`	Implements SQLite snapshot/export logic, filtering, and ACL stripping.
`src/server.rs`	Exposes the new export module behind `server-bin`.
`src/bin/torc-server.rs`	Adds the `export` CLI subcommand and argument handling.
`examples/datasight/schema_description.md`	Provides schema/domain guidance for datasight over Torc DBs.
`examples/datasight/schema.yaml`	Adds datasight table/column filtering config.
`examples/datasight/queries.yaml`	Adds seed NL/SQL examples for datasight.
`docs/src/specialized/tools/index.md`	Links the new datasight guide from the tools index.
`docs/src/specialized/tools/datasight.md`	Adds end-user/admin setup guidance for datasight and export-based workflows.
`docs/src/specialized/admin/server-deployment.md`	Documents the new export command for operators.
`docs/src/SUMMARY.md`	Adds the datasight page to the docs summary/nav.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…example bugs - Remove the --checkpoint flag and "may be slightly stale" wording. The export connection participates in SQLite's normal WAL coherency, so the snapshot already reflects every committed transaction the running server can see; the flag was solving a problem that doesn't exist in the documented same-host scenario, and PRAGMA wal_checkpoint(TRUNCATE) would briefly take an exclusive lock antagonistic to the live server. - Prune orphan workflow_status rows during filtered exports. workflow_status has no ON DELETE CASCADE from workflow because the back-reference column was added via ALTER TABLE ADD COLUMN (which SQLite cannot extend with FK constraints), so DELETE FROM workflow leaves orphans whose count alone would leak the total workflow count of the source DB. - Extend the cascade tests to seed event, failure_handler, ro_crate_entity, and slurm_stats for bob's workflow, then assert each is empty after a filter that drops bob. Gives cross-era coverage so a future missing ON DELETE CASCADE on workflow_id would fail this suite. Also asserts the workflow_status pruning behavior end-to-end. Documentation/example fixes from review: - examples/datasight/schema.yaml: drop jobs_sort_method from excluded_columns (the column was removed in migration 20260318). - docs/.../datasight.md: NREL → NatLabRockies in the reference-files table to match the rest of the page. - docs/.../datasight.md: fix DATABASE_URL example from sqlite:/// to sqlite:////; absolute paths in sqlalchemy-style URLs need four slashes. - examples/datasight/queries.yaml: fix the header comment about :workflow_id placeholders to match the actual hard-coded `123` examples. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

The export's filter relies on cascade-from-DELETE: dropping rows in the workflow table fires ON DELETE CASCADE for everything pointing at those workflow IDs. But cascade only fires for parent rows that the filter actually deletes. The torc server has a delete_workflow code path that toggles PRAGMA foreign_keys = OFF and does cleanup manually; if that cleanup is incomplete, the source DB carries pre-existing orphans whose parent workflow is already gone. VACUUM INTO copies the orphans verbatim, the filter's cascade has no parent to fire on, and the orphans survive. Reproduced in the wild: torc-server export 26 27 28 25 against a real DB produced an output with foreign_key_check reporting 166 violations across 12 tables (failure_handler rows for workflows 33/34, plus orphan rows in event, compute_node, job, result, ro_crate_entity, etc.). Fix: after the workflow filter, iteratively run PRAGMA foreign_key_check and DELETE every row it reports until none remain. Bounded loop so a pathological schema can't spin forever. Filtered mode only — full copies intentionally preserve source state. Reduces 166 violations to 0 in one pass on the user's actual DB. Adds an integration test that injects pre-existing orphans (with foreign_keys briefly off) and verifies they're pruned. Updates the admin doc with a new "Sweep orphans" step describing the cause and the fix. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Copilot

Pull request overview

Copilot reviewed 11 out of 11 changed files in this pull request and generated 4 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…ure, test --no-vacuum Three fixes from Copilot's review of PR #299: - prune_orphans and the workflow_status orphan cleanup now run for every export, not just filtered ones. Pre-existing FK violations in the source DB are data corruption, not "fidelity to the source," and a full-copy export with foreign_key_check reporting violations is not a useful artifact. - Refactored run_export so any error after VACUUM INTO has written the snapshot removes the partial output file before propagating. Previously only the "filter matched no workflows" path cleaned up; failures in prune_orphans, ACL sanitization, or the final VACUUM left a half-finished database on disk that callers could mistake for a valid export. - Added two tests: full_copy_also_prunes_pre_existing_orphans confirms orphan cleanup runs in unfiltered mode (with the same orphan-injection pattern as the filtered test); no_vacuum_flag_skips_final_vacuum... exercises run_final_vacuum=false (the --no-vacuum CLI surface) and asserts the output is still a valid filtered DB with no FK violations. Documentation updated to reflect the always-on orphan sweep and the cleanup-on-failure guarantee. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Copilot

Pull request overview

Copilot reviewed 11 out of 11 changed files in this pull request and generated 2 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Two follow-up fixes from Copilot's review of PR #299: - Reject `--output` paths that resolve to the same filesystem location as the source database. Without this, a slip like `torc-server export --database prod.db --output prod.db --overwrite` would call `remove_file(prod.db)` in `prepare_output_path` before the export even opens the source — turning a typo into permanent data loss. Comparison is by canonical path so symlinks, `./`, and the various `sqlite:` URL prefixes all collapse to the same target. Two new tests cover the direct case and a textually-aliased path that resolves to the same file. - Update a stale comment in `full_copy_keeps_acl_tables` that claimed orphan pruning doesn't run in unfiltered mode. After the recent change to always run prune_orphans, this is no longer true; the assertions remain correct because the seed has no orphans, so the sweep finds nothing to remove. Comment now points at the dedicated full-copy orphan test for actual coverage. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* Merge workflow_status into workflow (#300) Collapse the 1:1 workflow_status satellite table into workflow itself: - Add run_id, is_canceled to workflow (is_archived already existed). - Backfill from workflow_status using the existing status_id join. - Strip the FOREIGN KEY (status_id) constraint via PRAGMA writable_schema so DROP COLUMN status_id and DROP TABLE workflow_status work without rename-recreating workflow (which would cascade-delete every child table per the CLAUDE.md landmine). Also drop has_detected_need_to_run_completion_script: no code path ever set it to true, so the read paths (TUI badge, IsCompleteResponse) were dead. Completion-script intent now lives in on_workflow_complete workflow actions. Eliminates the orphan class that PR #299's export workaround was papering over: with the data in workflow itself, there is nowhere for a status row to be stranded. Also removes the noisy JOIN from list queries and the status_id field from the API contract. Regenerate Rust OpenAPI spec, Python client, and Julia client. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * Bump HTTP_API_VERSION to 0.15.0 for #300 breaking changes Removed fields from the API contract: WorkflowModel.status_id, WorkflowStatusModel.has_detected_need_to_run_completion_script, and IsCompleteResponse.needs_to_run_completion_script. Bump the version constant and regenerate Rust, Python, and Julia clients so their doc-comment version strings track. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * Collapse WorkflowStatusModel into WorkflowModel at the API layer (#300) The earlier commit on this branch merged workflow_status into workflow at the storage layer but left the satellite shape on the API surface — the server constructed a synthetic WorkflowStatusModel from one row of workflow, exposed it via /workflows/{id}/status (GET/PUT), and clients still treated it as a separate object. That kept the indirection issue #300 was meant to remove, just hidden in a different place. This change finishes the collapse: - Add run_id, is_canceled, is_archived to WorkflowModel; populate them in get_workflow / list_workflows / create_workflow; accept them in update_workflow via COALESCE. - Delete WorkflowStatusModel from models.rs and the Rust-owned OpenAPI spec. - Delete GET /workflows/{id}/status, PUT /workflows/{id}/status; their data is reachable via /workflows/{id} now. - Keep POST /workflows/{id}/reset_status as a meaningful domain operation, but its response is now a WorkflowModel-shaped object. cancel_workflow likewise returns post-cancel WorkflowModel. - Migrate all internal callers (cancel/is_complete/reset, claim_jobs, workflow_manager, run_jobs_cmd, torc-slurm-job-runner, archive command, orphan detection) to read from WorkflowModel. - Update integration tests across test_workflows, test_workflow_manager, test_results, test_jobs, test_orphaned_jobs, test_slurm_commands, test_workflow_metadata_project to use the new shape. - Regenerate Rust, Python, and Julia clients. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

daniel-thom requested a review from Copilot May 3, 2026 17:55

Copilot started reviewing on behalf of daniel-thom May 3, 2026 17:56 View session

Copilot AI reviewed May 3, 2026

View reviewed changes

Comment thread examples/datasight/schema.yaml Outdated

Comment thread docs/src/specialized/tools/datasight.md Outdated

Comment thread docs/src/specialized/tools/datasight.md Outdated

Comment thread tests/test_export.rs

Comment thread examples/datasight/queries.yaml Outdated

daniel-thom and others added 2 commits May 3, 2026 12:13

daniel-thom mentioned this pull request May 3, 2026

Merge workflow_status into workflow (eliminate 1:1 indirection and orphan class) #300

Closed

6 tasks

daniel-thom requested a review from Copilot May 3, 2026 18:54

Copilot started reviewing on behalf of daniel-thom May 3, 2026 18:54 View session

Copilot AI reviewed May 3, 2026

View reviewed changes

Comment thread src/bin/torc-server.rs

Comment thread src/server/export.rs

Comment thread src/server/export.rs Outdated

Comment thread src/server/export.rs

daniel-thom requested a review from Copilot May 3, 2026 19:31

Copilot started reviewing on behalf of daniel-thom May 3, 2026 19:31 View session

Copilot AI reviewed May 3, 2026

View reviewed changes

Comment thread src/server/export.rs

Comment thread tests/test_export.rs Outdated

daniel-thom merged commit 5c369c7 into main May 3, 2026
9 checks passed

daniel-thom deleted the feat/integrations branch May 3, 2026 20:39

daniel-thom mentioned this pull request May 3, 2026

Merge workflow_status into workflow (#300) #302

Merged

9 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add datasight integration and torc-server export command#299

Add datasight integration and torc-server export command#299
daniel-thom merged 5 commits intomainfrom
feat/integrations

daniel-thom commented May 3, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

daniel-thom commented May 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

How the export works

Filter modes (mutually exclusive)

Test plan

Follow-up

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

daniel-thom commented May 3, 2026 •

edited

Loading