Skip to content

Merge workflow_status into workflow (eliminate 1:1 indirection and orphan class) #300

@daniel-thom

Description

@daniel-thom

Background

workflow_status is a separate table carrying a strict 1:1 relationship to workflow:

CREATE TABLE workflow_status (
  id   INTEGER PRIMARY KEY AUTOINCREMENT,
  run_id INTEGER NOT NULL DEFAULT 1,
  has_detected_need_to_run_completion_script INTEGER NOT NULL DEFAULT 0,
  is_canceled  INTEGER NOT NULL DEFAULT 0,
  is_archived  INTEGER NOT NULL DEFAULT 0,
  workflow_id  INTEGER NULL  -- added later via ALTER, no FK
);

workflow.status_id references workflow_status.id. The reverse pointer (workflow_status.workflow_id) was added in migration 20260222000001_add_workflow_id_to_workflow_status to clean up orphans, but ALTER TABLE ADD COLUMN cannot attach a foreign-key constraint, so the back-reference is unenforced.

Problems this design causes

  1. Orphan rows. Any code path that deletes a workflow with PRAGMA foreign_keys = OFF (the server's own delete_workflow does this on purpose, per the migration comment) leaves the corresponding workflow_status row stranded. A user manually running DELETE FROM workflow WHERE id = N from sqlite3 triggers the same problem, since the CLI defaults to foreign_keys = OFF. Without the FK, PRAGMA foreign_key_check cannot detect these orphans — so they accumulate silently.

  2. Information leak in exports. torc-server export --user alice strips workflows owned by other users, but workflow_status row count alone leaks how many workflows existed in the source DB. PR Add datasight integration and torc-server export command #299 works around this with an explicit DELETE FROM workflow_status WHERE id NOT IN (SELECT status_id FROM workflow) step — a workaround that wouldn't be needed if the relationship were properly modeled.

  3. Circular FK trap if we add the missing FK. The naive fix — adding FOREIGN KEY (workflow_id) REFERENCES workflow(id) ON DELETE CASCADE — creates a circular NOT NULL dependency between workflow.status_id and workflow_status.workflow_id: each row needs the other to exist before it can be inserted. To make it work you'd have to make one side nullable (defeats much of the safety) or use DEFERRABLE INITIALLY DEFERRED FKs. And the migration to add the FK would require recreating workflow_status, which is a parent table — risky inside a migration transaction per the rename-recreate landmine in CLAUDE.md.

  4. No analytical value to the indirection. Every workflow_status row corresponds to exactly one workflow row. Joining across the indirection adds noise to queries (datasight schema_description has to spell out the join explicitly) and to the API surface (status_id field on the workflow contract).

Proposed change

Collapse workflow_status into workflow and drop the table:

ALTER TABLE workflow ADD COLUMN run_id INTEGER NOT NULL DEFAULT 1;
ALTER TABLE workflow ADD COLUMN has_detected_need_to_run_completion_script INTEGER NOT NULL DEFAULT 0;
ALTER TABLE workflow ADD COLUMN is_canceled INTEGER NOT NULL DEFAULT 0;
-- workflow already has is_archived; merge or drop one.
-- Backfill the new columns from workflow_status via the existing status_id join.
-- Then drop workflow.status_id and the workflow_status table.

This eliminates the orphan class, the circular-FK problem, and the indirection in one move.

Migration considerations

  • workflow.is_archived already exists in the parent table. Confirm whether the workflow_status.is_archived column is authoritative and reconcile.
  • Endpoints and serializers that reference status_id or expose status fields under a nested object will need updates. Worth auditing the OpenAPI spec and src/server/api/workflows.rs first to size the API change.
  • Dropping workflow_status is safe (no FK references inbound once we remove workflow.status_id).
  • The migration should not require the rename-recreate pattern on either table — ALTER TABLE ADD COLUMN plus ALTER TABLE DROP COLUMN (SQLite 3.35+) handles it.

Non-goals

  • This issue is not about the export-side workaround. PR Add datasight integration and torc-server export command #299's prune_orphans step (using PRAGMA foreign_key_check) is correct defense-in-depth and stays even after this refactor: existing DBs in the wild will still carry orphans from past CLI sessions, and any future code path that runs with foreign_keys = OFF could still create new ones across other tables.
  • Not changing the user-facing notion of workflow status; the columns just live in a different place.

Acceptance criteria

  • workflow_status table is removed.
  • workflow.status_id is removed.
  • All status flags read/written directly on the workflow row.
  • API contract / OpenAPI spec updated.
  • Migration backfills existing rows correctly.
  • Existing tests still pass; add a regression test confirming a workflow's status is no longer reachable as an orphan after deletion.

Surfaced from review of PR #299.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions