Skip to content

DiLoCo auth.py duplicates dataset_server token-file / loopback / peer-cert helpers #98

@jdinalt

Description

@jdinalt

Found during the post-#90 review pass on PR #93. Reuse/maintainability, not a bug.

Problem

src/forgather/ml/diloco/auth.py re-implements, nearly byte-for-byte, helpers that already exist in tools/dataset_server/auth.py:

  • write_standalone_token / standalone_token_file / diloco_tokens_dir — the os.open+fchmod+os.replace atomic-write dance and the 0700 parent-dir tightening.
  • url_is_local / url_port / read_standalone_token and the _LOCAL_HOSTS set (which also re-implements forgather.tls.policy.host_is_loopback, less correctly — the literal set misses 127.0.0.0/8 addresses).
  • peer_cert_authenticated mirrors tools/forgather_server/auth.py:_request_has_client_cert.

The stdlib-vs-FastAPI split justifies a separate verifier (verify_bearer/_send_401), but the transport-agnostic token-file I/O, URL-loopback classification, and peer-cert chain check have no framework dependency and now live in 2-3 places. A future hardening (e.g. fsync-before-replace, TOCTOU dir-mode tightening, SAN/CN allowlist on the cert check) has to be found and applied in each copy or they drift.

Fix

Extract a shared module (e.g. forgather.security.token_store parameterized by service subdir name, and reuse forgather.tls.policy.host_is_loopback). Have dataset_server, diloco, and inference_server all call it.

Minor adjacent hardening to fold in: the register audit record writes worker-supplied hostname verbatim — json.dumps neutralizes injection, but a length cap would prevent audit-log bloat from a malicious worker.

Ref: #90.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions