Add datasight integration and torc-server export command#299
Add datasight integration and torc-server export command#299daniel-thom merged 5 commits intomainfrom
Conversation
Adds a new "Analyzing Workflows with datasight" guide plus reference files (schema.yaml, schema_description.md, queries.yaml) under examples/datasight/ for pointing the datasight SQL-exploration tool at a torc database. End users typically can't read the production server's SQLite file, and the existing JSON export/import flow renumbers IDs on import — breaking log-line debugging. To address this, adds `torc-server export`, which produces a standalone SQLite copy of the live database with original workflow and job IDs preserved verbatim. Filters: --user (repeatable), --access-group (repeatable), or positional workflow IDs. Without a filter, produces a full copy. Uses VACUUM INTO for a transactionally consistent snapshot, then a single DELETE driven by the schema's existing ON DELETE CASCADE chain to prune non-matching workflows. Filtered exports strip access_group and user_group_membership by default to avoid leaking entries about other users and groups; --preserve-access-groups overrides for full copies or vetted recipients. Documents the new path in datasight.md (admin-mediated SQLite export becomes the recommended default for end users) and in admin/server-deployment.md (operator-facing flag reference + rationale for the sanitization defaults). Tests: 3 unit tests for the helper functions and 12 integration tests covering all four filter shapes (None / single user / single ID / single access group), all three multi-value variants (multi user / multi ID / multi access group), nonexistent-ID handling, both sanitization values, the empty-result error, and overwrite-refusal. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR adds a new torc-server export workflow for producing filtered SQLite snapshots of a live Torc database, and documents a new datasight-based analysis path that uses either direct DB access or those exported snapshots. It fits into the codebase as an operator/end-user debugging and analysis improvement around the server’s SQLite storage model.
Changes:
- Adds
torc-server exportCLI support and the underlying export/filter/sanitization implementation. - Adds integration/unit tests for export behavior across filter modes, overwrite handling, and ACL sanitization.
- Adds datasight reference files and documentation for analyzing Torc workflow databases.
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
tests/test_export.rs |
Adds export integration tests and helper seeding utilities. |
src/server/export.rs |
Implements SQLite snapshot/export logic, filtering, and ACL stripping. |
src/server.rs |
Exposes the new export module behind server-bin. |
src/bin/torc-server.rs |
Adds the export CLI subcommand and argument handling. |
examples/datasight/schema_description.md |
Provides schema/domain guidance for datasight over Torc DBs. |
examples/datasight/schema.yaml |
Adds datasight table/column filtering config. |
examples/datasight/queries.yaml |
Adds seed NL/SQL examples for datasight. |
docs/src/specialized/tools/index.md |
Links the new datasight guide from the tools index. |
docs/src/specialized/tools/datasight.md |
Adds end-user/admin setup guidance for datasight and export-based workflows. |
docs/src/specialized/admin/server-deployment.md |
Documents the new export command for operators. |
docs/src/SUMMARY.md |
Adds the datasight page to the docs summary/nav. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…example bugs - Remove the --checkpoint flag and "may be slightly stale" wording. The export connection participates in SQLite's normal WAL coherency, so the snapshot already reflects every committed transaction the running server can see; the flag was solving a problem that doesn't exist in the documented same-host scenario, and PRAGMA wal_checkpoint(TRUNCATE) would briefly take an exclusive lock antagonistic to the live server. - Prune orphan workflow_status rows during filtered exports. workflow_status has no ON DELETE CASCADE from workflow because the back-reference column was added via ALTER TABLE ADD COLUMN (which SQLite cannot extend with FK constraints), so DELETE FROM workflow leaves orphans whose count alone would leak the total workflow count of the source DB. - Extend the cascade tests to seed event, failure_handler, ro_crate_entity, and slurm_stats for bob's workflow, then assert each is empty after a filter that drops bob. Gives cross-era coverage so a future missing ON DELETE CASCADE on workflow_id would fail this suite. Also asserts the workflow_status pruning behavior end-to-end. Documentation/example fixes from review: - examples/datasight/schema.yaml: drop jobs_sort_method from excluded_columns (the column was removed in migration 20260318). - docs/.../datasight.md: NREL → NatLabRockies in the reference-files table to match the rest of the page. - docs/.../datasight.md: fix DATABASE_URL example from sqlite:/// to sqlite:////; absolute paths in sqlalchemy-style URLs need four slashes. - examples/datasight/queries.yaml: fix the header comment about :workflow_id placeholders to match the actual hard-coded `123` examples. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The export's filter relies on cascade-from-DELETE: dropping rows in the workflow table fires ON DELETE CASCADE for everything pointing at those workflow IDs. But cascade only fires for parent rows that the filter actually deletes. The torc server has a delete_workflow code path that toggles PRAGMA foreign_keys = OFF and does cleanup manually; if that cleanup is incomplete, the source DB carries pre-existing orphans whose parent workflow is already gone. VACUUM INTO copies the orphans verbatim, the filter's cascade has no parent to fire on, and the orphans survive. Reproduced in the wild: torc-server export 26 27 28 25 against a real DB produced an output with foreign_key_check reporting 166 violations across 12 tables (failure_handler rows for workflows 33/34, plus orphan rows in event, compute_node, job, result, ro_crate_entity, etc.). Fix: after the workflow filter, iteratively run PRAGMA foreign_key_check and DELETE every row it reports until none remain. Bounded loop so a pathological schema can't spin forever. Filtered mode only — full copies intentionally preserve source state. Reduces 166 violations to 0 in one pass on the user's actual DB. Adds an integration test that injects pre-existing orphans (with foreign_keys briefly off) and verifies they're pruned. Updates the admin doc with a new "Sweep orphans" step describing the cause and the fix. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 11 out of 11 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…ure, test --no-vacuum Three fixes from Copilot's review of PR #299: - prune_orphans and the workflow_status orphan cleanup now run for every export, not just filtered ones. Pre-existing FK violations in the source DB are data corruption, not "fidelity to the source," and a full-copy export with foreign_key_check reporting violations is not a useful artifact. - Refactored run_export so any error after VACUUM INTO has written the snapshot removes the partial output file before propagating. Previously only the "filter matched no workflows" path cleaned up; failures in prune_orphans, ACL sanitization, or the final VACUUM left a half-finished database on disk that callers could mistake for a valid export. - Added two tests: full_copy_also_prunes_pre_existing_orphans confirms orphan cleanup runs in unfiltered mode (with the same orphan-injection pattern as the filtered test); no_vacuum_flag_skips_final_vacuum... exercises run_final_vacuum=false (the --no-vacuum CLI surface) and asserts the output is still a valid filtered DB with no FK violations. Documentation updated to reflect the always-on orphan sweep and the cleanup-on-failure guarantee. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 11 out of 11 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Two follow-up fixes from Copilot's review of PR #299: - Reject `--output` paths that resolve to the same filesystem location as the source database. Without this, a slip like `torc-server export --database prod.db --output prod.db --overwrite` would call `remove_file(prod.db)` in `prepare_output_path` before the export even opens the source — turning a typo into permanent data loss. Comparison is by canonical path so symlinks, `./`, and the various `sqlite:` URL prefixes all collapse to the same target. Two new tests cover the direct case and a textually-aliased path that resolves to the same file. - Update a stale comment in `full_copy_keeps_acl_tables` that claimed orphan pruning doesn't run in unfiltered mode. After the recent change to always run prune_orphans, this is no longer true; the assertions remain correct because the seed has no orphans, so the sweep finds nothing to remove. Comment now points at the dedicated full-copy orphan test for actual coverage. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* Merge workflow_status into workflow (#300) Collapse the 1:1 workflow_status satellite table into workflow itself: - Add run_id, is_canceled to workflow (is_archived already existed). - Backfill from workflow_status using the existing status_id join. - Strip the FOREIGN KEY (status_id) constraint via PRAGMA writable_schema so DROP COLUMN status_id and DROP TABLE workflow_status work without rename-recreating workflow (which would cascade-delete every child table per the CLAUDE.md landmine). Also drop has_detected_need_to_run_completion_script: no code path ever set it to true, so the read paths (TUI badge, IsCompleteResponse) were dead. Completion-script intent now lives in on_workflow_complete workflow actions. Eliminates the orphan class that PR #299's export workaround was papering over: with the data in workflow itself, there is nowhere for a status row to be stranded. Also removes the noisy JOIN from list queries and the status_id field from the API contract. Regenerate Rust OpenAPI spec, Python client, and Julia client. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * Bump HTTP_API_VERSION to 0.15.0 for #300 breaking changes Removed fields from the API contract: WorkflowModel.status_id, WorkflowStatusModel.has_detected_need_to_run_completion_script, and IsCompleteResponse.needs_to_run_completion_script. Bump the version constant and regenerate Rust, Python, and Julia clients so their doc-comment version strings track. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * Collapse WorkflowStatusModel into WorkflowModel at the API layer (#300) The earlier commit on this branch merged workflow_status into workflow at the storage layer but left the satellite shape on the API surface — the server constructed a synthetic WorkflowStatusModel from one row of workflow, exposed it via /workflows/{id}/status (GET/PUT), and clients still treated it as a separate object. That kept the indirection issue #300 was meant to remove, just hidden in a different place. This change finishes the collapse: - Add run_id, is_canceled, is_archived to WorkflowModel; populate them in get_workflow / list_workflows / create_workflow; accept them in update_workflow via COALESCE. - Delete WorkflowStatusModel from models.rs and the Rust-owned OpenAPI spec. - Delete GET /workflows/{id}/status, PUT /workflows/{id}/status; their data is reachable via /workflows/{id} now. - Keep POST /workflows/{id}/reset_status as a meaningful domain operation, but its response is now a WorkflowModel-shaped object. cancel_workflow likewise returns post-cancel WorkflowModel. - Migrate all internal callers (cancel/is_complete/reset, claim_jobs, workflow_manager, run_jobs_cmd, torc-slurm-job-runner, archive command, orphan detection) to read from WorkflowModel. - Update integration tests across test_workflows, test_workflow_manager, test_results, test_jobs, test_orphaned_jobs, test_slurm_commands, test_workflow_metadata_project to use the new shape. - Regenerate Rust, Python, and Julia clients. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Summary
examples/datasight/for natural-language SQL exploration of a torc database.torc-server exportsubcommand that produces a standalone, optionally-filtered SQLite copy of the live database with original workflow and job IDs preserved verbatim — addresses a longstanding gap wheretorc workflows export/importrenumbers IDs on import, breaking log-line debugging for end users handed a workflow copy.admin/server-deployment.mdwith operator-facing documentation for the new command and the rationale for its access-control sanitization defaults.How the export works
VACUUM INTO '<output>'— transactionally consistent, defragmented snapshot. Does not block source writers/readers. The export connection participates in SQLite's normal WAL coherency, so the snapshot reflects every committed transaction the running server can see.foreign_keys=ONand (if a filter was requested) run a singleDELETE FROM workflow WHERE id NOT IN (<filter>). The schema's existingON DELETE CASCADEchain on every per-workflow table cleans up jobs, files, results, events, ro_crate entities, etc. for the rows the filter actually deleted.workflow_idalready pointed at a missing workflow before the export ran — typically from adelete_workflowcode path that bypassed cascade by togglingPRAGMA foreign_keys = OFF, or a baresqlite3CLI session, which defaults to FKs off) surviveVACUUM INTO. The export iteratively runsPRAGMA foreign_key_checkand deletes every reported violation until none remain.workflow_statusis pruned separately (its back-reference column has no FK declared and so is invisible toforeign_key_check). Runs in unfiltered mode too — FK violations are data corruption, not fidelity to the source.--preserve-access-groupsis not set, wipeuser_group_membership(no per-workflow scoping → leaks unrelated users' affiliations) andaccess_group(which cascadesworkflow_access_group). Rationale: there's no clean per-workflow filter for these tables that wouldn't risk leaking entries about other users/groups, so the conservative default is to drop them and let admins opt back in for vetted recipients.VACUUM(skip with--no-vacuum) reclaims space freed by the deletes.If anything in steps 2–5 fails after step 1 has written the snapshot, the partial output file is removed before the error is reported — a failed export never leaves a half-finished database on disk.
Filter modes (mutually exclusive)
--preserve-access-groups.--user alice [--user bob …]workflow.user IN (…).--access-group 7 [--access-group 9 …]workflow.id IN (SELECT workflow_id FROM workflow_access_group WHERE group_id IN (…)).42 99 …workflow.id IN (…).Empty filter result → error + remove the partially-written output file. Output exists → refuse unless
--overwrite.Test plan
src/server/export.rs(placeholder generation, id-list formatting, URL normalization).tests/test_export.rsdrivingrun_exportdirectly against a freshly-migrated SQLite DB seeded with two access groups and three workflows:--preserve-access-groupskeeps them.foreign_keys = OFF) are pruned in both filtered and unfiltered modes;pragma_foreign_key_checkreports zero violations on the output.--no-vacuum(run_final_vacuum=false) still produces a valid filtered DB with no FK violations.--overwrite.event,failure_handler,ro_crate_entity, andslurm_statsrows so a future migration that losesON DELETE CASCADEonworkflow_idwould fail the suite.cargo build,cargo clippy --all --all-targets --all-features -- -D warnings,cargo fmt --check,dprint check— all clean.Follow-up
The
workflow_statustable is a 1:1 indirection fromworkflowwhose missing FK is the proximate cause of one of the orphan classes addressed here. Filed as #300 — out of scope for this PR.🤖 Generated with Claude Code