Skip to content

Hardening: error handling and persistence ordering in web handlers #469

@nitrobass24

Description

@nitrobass24

Four related production-code findings surfaced during the CodeRabbit review of #467. None of them is causing user-facing bugs today, but each is a real correctness concern when persistence fails (disk full, permission error, process killed mid-write). Deferred from #468 to keep that PR test-only; tracking them here so each can land in its own PR with proper test coverage of the failure mode.


1. src/python/web/handler/auto_queue.py — wrap to_file in try/except

Around line 47-49. The handler mutates state via add_pattern / remove_pattern and then writes with self.__auto_queue_persist.to_file(self.__persist_path), but doesn't catch IO errors. A persistence failure bubbles up as an unhandled 500 with a stack trace.

Suggested change: wrap the persistence call in try/except, log the exception (including the error object), and return a controlled HTTP 5xx (e.g. HTTPResponse(status=500, body=\"failed to persist auto-queue\")). Apply to both endpoints that mutate state (add and remove pattern paths).

Risk if shipped without this: disk-full on the config volume leaves the in-memory state ahead of the on-disk state. A subsequent restart silently reverts the change.


2. src/python/web/handler/config.py — copy-then-write atomicity

Around line 76-79. set_property mutates self.__config before persistence. If to_file fails after the mutation, in-memory state diverges from disk — and __on_lftp_config_change() may have already fired with the half-applied change.

Suggested change: create a copy of the config, apply inner_config.set_property(key, value) on the copy, attempt to write to disk via self.__config.to_file(self.__config_path) (already atomic via temp file + rename per #433), and only on write success replace self.__config with the updated copy and call __on_lftp_config_change(). Exceptions during write must prevent the in-memory swap and the LFTP callback.

Risk if shipped without this: the LFTP hot-reload path (#433) can fire on a value that didn't make it to disk. After restart the LFTP runtime would reset to the on-disk value, contradicting what the UI showed.


3. src/python/web/handler/integrations.py — persistence ordering on detach

Around line 113-117. The delete-instance handler calls self.__path_pairs_config.detach_arr_target(instance_id), then writes self.__config.to_file (integrations) before self.__path_pairs_config.to_file. If the process crashes between the two writes, the on-disk integrations file no longer references the instance but the path-pairs file still has dangling arr_target_ids pointing at it.

Suggested change: save in the order path_pairs → integrations. Persist path-pairs (with the detach applied) first, then write the integrations config. Same methods, same paths — just the order of the two to_file calls.

Risk if shipped without this: unclean shutdown leaves dangling references. The cross-validation in #426 will reject those path pairs on next load.


4. src/python/web/handler/path_pairs.py — wrap to_file in try/except

Around line 117-118. Same shape as #1: create / update / delete handlers all return HTTPResponse(... status=201) (or 204) without catching exceptions from self.__config.to_file(self.__path_pairs_path).

Suggested change: wrap each to_file call in try/except. On exception, log the error and return:

HTTPResponse(
    status=500,
    body=json.dumps({\"error\": \"failed to persist path pairs\"}),
    headers={\"Content-Type\": \"application/json\"},
)

so the client gets a structured error instead of a stack trace.

Risk if shipped without this: same divergence pattern as #1 — UI thinks the change took, on-disk state says otherwise.


How to tackle

These are independent enough that they can land as separate PRs (one per file). Each needs:

  • The fix itself
  • A unit test for the failure mode (unittest.mock.patch on to_file to raise an exception, then assert the handler returns the right status and the in-memory state was not mutated)
  • For Bump requests from 2.25.1 to 2.32.4 in /src/python #2 specifically, also assert __on_lftp_config_change was not called

Suggested order: #2 (config) first, since it has the largest blast radius (touches LFTP hot-reload). Then #3 (ordering — small, mechanical). Then #1 and #4 (mirror each other; could be one PR with shared test scaffolding).

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions