NatLabRockies · daniel-thom · May 3, 2026 · May 3, 2026 · May 3, 2026 · May 3, 2026
diff --git a/docs/src/SUMMARY.md b/docs/src/SUMMARY.md
@@ -110,6 +110,7 @@
   - [Dashboard Deployment](./specialized/tools/dashboard-deployment.md)
   - [Configuring AI Assistants](./specialized/tools/ai-assistants.md)
   - [AI-Assisted Workflow Management](./specialized/tools/ai-assistant.md)
+  - [Analyzing Workflows with datasight](./specialized/tools/datasight.md)
   - [Map Python Functions Across Workers](./specialized/tools/map_python_function_across_workers.md)
   - [Filtering CLI Output with Nushell](./specialized/tools/filtering-with-nushell.md)
   - [Shell Completions](./specialized/tools/shell-completions.md)

diff --git a/docs/src/specialized/admin/server-deployment.md b/docs/src/specialized/admin/server-deployment.md
@@ -341,6 +341,120 @@ Poor fits:
 - **Process death loses unsaved data.** Always snapshot before stopping the server if you care about
   the current state. `SIGTERM`/`SIGINT` (graceful shutdown) does **not** automatically snapshot.
 
+## Exporting Filtered Database Copies
+
+`torc-server export` produces a standalone SQLite copy of the live database, optionally filtered to
+a subset of workflows. The original workflow and job IDs are preserved verbatim, so log files,
+ticket references, and screenshots referring to the production IDs remain interpretable in the
+exported copy. The most common use case is **handing a debugging copy to an end user** who does not
+have direct access to the production database — for example, so they can analyze their workflows
+with [datasight](../tools/datasight.md), `sqlite3`, or another SQL tool without touching production.
+
+```bash
+# Hand a single user their workflows
+torc-server export --user alice --output alice.db
+
+# Export everything in a project's access group
+torc-server export --access-group 7 --output proj-energy.db
+
+# Pull a specific list of workflows (positional)
+torc-server export 42 99 314 --output requested.db
+
+# Full unfiltered copy (useful as a hot-backup)
+torc-server export --output snapshot.db
+```
+
+The filters are mutually exclusive — pick one of `--user` (repeatable), `--access-group`
+(repeatable), or positional workflow IDs. Without any filter, the command produces a full copy.
+
+### How it works
+
+1. **Snapshot.** SQLite's [`VACUUM INTO`](https://sqlite.org/lang_vacuum.html#vacuuminto) writes a
+   transactionally consistent, defragmented copy of the live database to the output path. This does
+   _not_ require quiescing the running server — readers and writers continue normally during the
+   snapshot.
+2. **Filter.** The output database is reopened with foreign keys enabled, and a single
+   `DELETE FROM workflow WHERE id NOT IN (<filter>)` runs. Every per-workflow table has
+   `ON DELETE CASCADE` on `workflow_id`, so jobs, files, results, events, ro_crate entities, compute
+   nodes, etc. are removed automatically by the cascade chain — for the workflows the filter
+   actually deleted.
+3. **Sweep orphans (always).** Cascade only fires when the parent row IS deleted, so pre-existing
+   orphans in the source DB survive the snapshot. Common sources: a `delete_workflow` code path that
+   toggled `PRAGMA foreign_keys = OFF`, or a bare `sqlite3` CLI session (the CLI defaults to
+   `foreign_keys = OFF`). The export iteratively runs `PRAGMA foreign_key_check` and deletes every
+   reported violation until none remain. `workflow_status` is pruned separately (its back-reference
+   column has no FK declared and so is invisible to `foreign_key_check`). This step runs for
+   unfiltered exports too — FK violations are data corruption, not fidelity to the source.
+4. **Sanitize.** If a filter was applied and `--preserve-access-groups` is not set, the exported
+   database has its `user_group_membership` and `access_group` tables emptied. See
+   [Access-control sanitization](#access-control-sanitization) below.
+5. **Compact.** A final `VACUUM` reclaims the space freed by the deletes (skip with `--no-vacuum`).
+
+If anything in steps 2–5 fails after step 1 has written the snapshot, the partial output file is
+removed before the error is reported — a failed export never leaves a half-finished database on
+disk.
+
+### Flags
+
+| Flag                       | Effect                                                                                             |
+| -------------------------- | -------------------------------------------------------------------------------------------------- |
+| `-o`, `--output <PATH>`    | Output SQLite file path (required).                                                                |
+| `-d`, `--database <PATH>`  | Source database path. Defaults to `DATABASE_URL`.                                                  |
+| `--user <NAME>`            | Keep only workflows owned by this user. Repeatable.                                                |
+| `--access-group <ID>`      | Keep only workflows linked to this access-group ID. Repeatable.                                    |
+| (positional)               | Keep only these workflow IDs.                                                                      |
+| `--overwrite`              | Replace the output file if it already exists.                                                      |
+| `--preserve-access-groups` | Keep `access_group` / `user_group_membership` / `workflow_access_group` instead of stripping them. |
+| `--no-vacuum`              | Skip the final `VACUUM`. Faster, but the output file retains the source database's allocated size. |
+
+If a filter is specified and matches zero workflows, the command errors out and removes the
+partially-written output file rather than producing an empty database.
+
+### Access-control sanitization
+
+By default, `torc-server export` strips three tables from any **filtered** export:
+
+- `user_group_membership` — has no per-workflow scoping, so leaving it intact would leak unrelated
+  users' group affiliations.
+- `access_group` — group names and descriptions for groups across the whole server.
+- `workflow_access_group` — cascades away when `access_group` is emptied.
+
+This is conservative on purpose: there is no straightforward per-workflow filter that wouldn't risk
+accidentally leaking entries about other users or groups. If the recipient _is_ authorized to see
+the entire access-control state (for example, when handing a full copy to another admin), pass
+`--preserve-access-groups` to keep the tables intact.
+
+For unfiltered (full-copy) exports, the access tables are kept as-is regardless — the operator
+running the command already has access to everything in the database.
+
+### Recommended workflow for end-user requests
+
+The expected interaction pattern is admin-mediated:
+
+1. End user asks for a copy of workflow `42` (or all workflows under user `alice`, etc.).
+2. Admin runs `torc-server export` with the appropriate filter on the server host.
+3. Admin reviews the output (`sqlite3 alice.db "SELECT id, user, name FROM workflow"`) and confirms
+   it contains only the intended scope.
+4. Admin transfers the file to the user.
+5. User analyzes the copy locally — IDs match production, so anything that was in their logs or
+   tickets continues to make sense.
+
+This avoids needing to grant the user direct filesystem access to the production database, while
+still giving them a faithful debugging artifact.
+
+### Notes
+
+- **Live server safe.** `VACUUM INTO` does not block the source server's writers or readers, and the
+  export connection participates in SQLite's normal WAL coherency, so the snapshot reflects every
+  committed transaction the running server can see.
+- **Same-IDs guarantee.** The export preserves all primary keys. By contrast,
+  [`torc workflows export`](../../core/workflows/export-import-workflows.md) emits portable JSON
+  that loses ID identity on import — use that flow when the recipient cannot get a SQLite file from
+  an admin.
+- **External files are not bundled.** Only database rows are exported. Files referenced by `path` in
+  the `file` table (job inputs, outputs, logs on shared filesystems) are not copied; the recipient
+  analyzes metadata only unless those paths are independently shared.
+
 ## Complete Example: Production Deployment
 
 ```bash

diff --git a/docs/src/specialized/tools/datasight.md b/docs/src/specialized/tools/datasight.md
@@ -0,0 +1,231 @@
+# Analyzing Torc Workflows with datasight
+
+[**datasight**](https://github.com/dsgrid/datasight) is an AI-powered data exploration tool that
+connects an AI agent to a SQL database (DuckDB, PostgreSQL, SQLite, Flight SQL) and provides a web
+UI for asking natural-language questions. The agent writes SQL, runs it, and renders interactive
+Plotly visualizations.
+
+A torc server stores its state in **SQLite**, which makes it a natural fit for datasight. This guide
+shows how to point datasight at a torc database to answer questions like:
+
+- _Which jobs in workflow 123 exceeded their memory allocation?_
+- _Show failures grouped by return code._
+- _What's the average exec time per resource group?_
+- _Which compute nodes ran the longest jobs?_
+
+> **Read-only tool.** datasight queries the database — it does not mutate it. For changes (rerun,
+> recover, update resources) use the regular `torc` CLI commands.
+
+---
+
+## Three Audiences
+
+Setup depends on whether you have direct read access to the torc server's SQLite file.
+
+| Audience                                | Path                                                                                                              |
+| --------------------------------------- | ----------------------------------------------------------------------------------------------------------------- |
+| **Admin / shared-server operator**      | Point datasight directly at `dev.db` (or your prod DB path).                                                      |
+| **End user — admin will run an export** | Ask the admin to run `torc-server export` to produce a filtered SQLite copy and hand it back. Same IDs preserved. |
+| **End user without any admin coop**     | Use `torc workflows export` to get a JSON, import into a local torc-server, then point datasight at _that_ DB.    |
+
+The admin path is one step. The admin-mediated export (Path B) is the right default when you can get
+cooperation from whoever runs the server — it preserves the original workflow and job IDs, so log
+files and ticket references continue to make sense. Path C is the fallback when no admin help is
+available; the JSON round-trip assigns new IDs.
+
+---
+
+## Path A — Admin / Direct DB Access
+
+### 1. Install datasight
+
+```bash
+uv tool install datasight
+```
+
+Set up an Anthropic API key (or another supported LLM provider — see the datasight README).
+
+### 2. Bootstrap a project from the torc reference files
+
+This repo ships a ready-to-use config under
+[`examples/datasight/`](https://github.com/NatLabRockies/torc/tree/main/examples/datasight):
+
+```bash
+mkdir ~/torc-analysis
+cd ~/torc-analysis
+datasight init
+cp /path/to/torc/examples/datasight/schema.yaml .
+cp /path/to/torc/examples/datasight/schema_description.md .
+cp /path/to/torc/examples/datasight/queries.yaml .
+```
+
+Edit `.env` to point at the torc SQLite database:
+
+```bash
+DATABASE_URL=sqlite:////absolute/path/to/torc/server/db/sqlite/dev.db
+LLM_PROVIDER=anthropic
+ANTHROPIC_API_KEY=...
+```
+
+### 3. Run
+
+```bash
+datasight run
+```
+
+Open <http://127.0.0.1:8084> and start asking questions. By default datasight binds to localhost;
+use `--unix-socket /path/to/datasight.sock` for SSH-forwarded socket access on HPC login nodes.
+
+---
+
+## Path B — Admin-Mediated SQLite Export (recommended for end users)
+
+The admin runs `torc-server export` on the server host to produce a filtered SQLite file containing
+only the requested workflows, then sends the file to you. You point datasight at it like a normal
+SQLite database.
+
+This preserves all original workflow and job IDs, so the database in your hands is _the same
+database_ the production server has — just trimmed to your subset. Log lines like
+`workflow_id=42 job_id=917` keep matching, which makes Path A debugging usable for end users.
+
+### 1. Admin runs the export
+
+```bash
+# Filter by user (most common)
+torc-server export --user alice --output alice.db
+
+# Or by access group
+torc-server export --access-group 7 --output proj-energy.db
+
+# Or by specific workflow IDs
+torc-server export 42 99 --output requested.db
+
+# Full copy (no filter)
+torc-server export --output snapshot.db
+```
+
+By default, `access_group`, `workflow_access_group`, and `user_group_membership` are stripped from
+filtered exports because those tables span the whole server and would leak entries for other users
+and groups. Pass `--preserve-access-groups` only when producing a full copy or when the recipient is
+authorized to see the entire access-control state.
+
+The admin should review the output (`sqlite3 alice.db "SELECT id, user, name FROM workflow"`) before
+handing it over.
+
+Useful flags:
+
+| Flag                       | Effect                                                                                                      |
+| -------------------------- | ----------------------------------------------------------------------------------------------------------- |
+| `--overwrite`              | Replace an existing output file.                                                                            |
+| `--preserve-access-groups` | Keep ACL tables instead of stripping them. Only safe for full copies or vetted recipients.                  |
+| `--no-vacuum`              | Skip the final `VACUUM`; faster, but the file keeps the source's original size even after rows are deleted. |
+
+The export uses SQLite's `VACUUM INTO` for a transactionally consistent snapshot, then deletes
+non-matching workflows; cascading foreign keys clean up the per-workflow rows automatically.
+
+### 2. Point datasight at the file
+
+Follow the same setup as Path A, with `DATABASE_URL` pointing at the SQLite file the admin sent you.
+No local torc-server is required.
+
+> **Refresh.** The export is a snapshot in time. For new results, ask for a fresh export — there is
+> no in-place update.
+
+---
+
+## Path C — User Without Any DB Access
+
+If you can't get an admin to run `torc-server export` for you, fall back to the JSON export/import
+flow. This trades ID preservation for not needing any server-side cooperation.
+
+### 1. Get an export with results included
+
+Either run this yourself (if you have CLI access to the shared server) or ask the admin to run it
+for you:
+
+```bash
+torc workflows export <workflow_id> --include-results --include-events \
+  --output workflow_<id>.json
+```
+
+The `--include-results` flag is **essential** — without it the `result` table is empty and most of
+the useful queries (memory overruns, slow jobs, failure causes) won't work. `--include-events` is
+optional but useful for timeline analysis.
+
+See [Exporting and Importing Workflows](../../core/workflows/export-import-workflows.md) for the
+full export/import reference.
+
+### 2. Run a personal local torc-server
+
+You only need this for storage; you don't have to actually execute jobs through it. Install torc
+locally, then start a server with its own SQLite database:
+
+```bash
+torc-server run --host localhost -p 8080
+```
+
+In a second shell, point your CLI at it:
+
+```bash
+export TORC_API_URL="http://localhost:8080/torc-service/v1"
+torc workflows import workflow_<id>.json
+```
+
+The import creates a fresh workflow with a new ID in your local DB. Take note of the local DB path
+(default `db/sqlite/dev.db`).
+
+### 3. Run datasight against your local DB
+
+Follow the same setup as Path A, with `DATABASE_URL` pointing at your local torc-server's SQLite
+file.
+
+> **Note on freshness.** The export is a snapshot. If you want to analyze new results from the
+> shared server you need a fresh export. For ongoing monitoring, ask your admin about adding
+> read-only access to the production DB or running datasight against it on the server side.
+
+---
+
+## The Reference Files
+
+| File                                                                                                                | Purpose                                                                                                       |
+| ------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------- |
+| [`schema.yaml`](https://github.com/NatLabRockies/torc/blob/main/examples/datasight/schema.yaml)                     | Restricts AI exploration to the analytically useful tables; hides internal scheduling columns.                |
+| [`schema_description.md`](https://github.com/NatLabRockies/torc/blob/main/examples/datasight/schema_description.md) | Domain context for the AI: status integer enum, return code conventions, key joins, JSON metadata extraction. |
+| [`queries.yaml`](https://github.com/NatLabRockies/torc/blob/main/examples/datasight/queries.yaml)                   | Seeded NL/SQL pairs the AI uses as few-shot examples.                                                         |
+
+The `schema_description.md` is the highest-leverage file — without it, the AI will see raw integer
+status codes (0–10) on `job.status` and have no way to decode them, won't know that `137` return
+code means OOM, and won't know to use `memory_bytes` instead of the human-readable `memory` string
+for math. **Customize it** with anything specific to how your team uses `workflow.metadata` (project
+tags, ticket IDs, dataset versions, etc.) — that's where datasight unlocks the most value.
+
+---
+
+## Example Questions to Try
+
+Once datasight is running, try these to verify the integration is working:
+
+- _"How many workflows are in the database, grouped by user?"_
+- _"For workflow 123, show jobs whose peak memory exceeded their allocation, sorted by overage."_
+- _"Plot exec time distribution for workflow 123, faceted by resource group."_
+- _"Which compute nodes had the most failed jobs last week?"_
+- _"For my workflows tagged with project='climate-2026' in metadata, summarize total CPU-hours."_
+
+Pin useful results to the dashboard, or export the session as a self-contained HTML page or a
+runnable Python script — see the datasight docs for more.
+
+---
+
+## Troubleshooting
+
+**The AI keeps writing `WHERE status = 'failed'`.** Your `schema_description.md` isn't being loaded.
+Confirm it sits in the project directory next to `.env` and restart `datasight run`.
+
+**Queries return nothing for `result`-table questions.** Either the workflow hasn't run any jobs
+yet, or (Path C) your export was missing `--include-results`.
+
+**`datasight run` complains it can't open the DB.** SQLite requires an absolute path in
+`DATABASE_URL` (`sqlite:////absolute/path/...` — note the four slashes).
+
+**Schema introspection is slow.** The torc DB has many internal tables; the included `schema.yaml`
+already filters to the analytical subset, which keeps startup fast.
diff --git a/docs/src/specialized/tools/index.md b/docs/src/specialized/tools/index.md
@@ -7,6 +7,8 @@ Additional tools and third-party integrations.
 - [Dashboard Deployment](./dashboard-deployment.md) - Deploying the web dashboard
 - [Configuring AI Assistants](./ai-assistants.md) - Setting up AI integration
 - [AI-Assisted Workflow Management](./ai-assistant.md) - Using AI for workflow management
+- [Analyzing Workflows with datasight](./datasight.md) - Natural-language SQL exploration of the
+  torc database
 - [Map Python Functions Across Workers](./map_python_function_across_workers.md) - Python
   integration
 - [Filtering CLI Output with Nushell](./filtering-with-nushell.md) - Advanced CLI usage