feat: Google Workspace + Microsoft 365 identity console + Claude device-task dispatch#1053
Conversation
|
Full review — security + correctness + reliability + test coverage (4 passes; traced cross-file data flows on Security: PASS. Both new tables ( BLOCKER 1 (CRITICAL) — the M365 direct-Graph backend is unreachable in production. BLOCKER 2 (CRITICAL) — tenantCascade is incomplete; fails the cascade contract test. In services/tenantCascade.ts, the new BLOCKER 3 (CRITICAL, reliability) — failed privileged mutations are recorded as successes. HIGH — error masking on the M365 path. HIGH — MEDIUM — direct-vs-Delegant precedence is silent. Test gaps (privileged identity surface — worth closing before merge): no end-to-end approval-gating assertion (only the static tier map is tested — a refactor could make a destructive tool auto-execute with all tests still green); destructive tools aren't proven un-invokable as reads; Positives: migrations are idempotent, no inner BEGIN/COMMIT, correct date-prefix ordering; routes export Status: Security PASS. 3 blockers (M365 tool-gate, tenantCascade completeness/order, silent-failure on privileged mutations) + 2 HIGH error-handling + test gaps. Given the size, the offered split (Google / M365-direct / device-dispatch) would isolate Blockers 1–2 to the M365 slice and make each piece reviewable on its own. |
ToddHebebrand
left a comment
There was a problem hiding this comment.
Deep review (two-reviewer pass: backend/RLS/secrets + AI-tools/dispatch/UI). The feature is in good shape — this is a one-line bounce on a GDPR-cascade contract gap.
The whole tenancy/secret surface verified clean. Both new tables carry RLS correctly: google_workspace_connections and m365_connections each have org_id NOT NULL, ENABLE+FORCE ROW LEVEL SECURITY, and all four breeze_org_isolation_* policies in the creating migration (shape-1, auto-discovered by rls-coverage.integration.test.ts — no allowlist edit needed). Service-account keys / client secrets are encrypted at rest, never returned by toConnectionResponse, never logged. Every /google/connection and /m365/connection route self-applies authMiddleware + requirePermission(ORGS_READ/WRITE), with requireMfa() on mutations. Migrations are idempotent, correctly dated, no inner BEGIN/COMMIT. SSRF-safe (fixed Graph base, GUID-validated tenant id, OData single-quote escaping). On the AI side, the #991-class tier-resolution bug is fixed and test-guarded — Google tools are session-aware (intentionally out of the exec aiTools Map), getToolTier falls through to googleToolTiers, and aiAgentSdkTools.googlegating.test.ts asserts every google_* name resolves to a defined tier. Device-task dispatch (aiAgent.ts:63-86) validates same-org and the #1047 canAccessSite axis before binding; destructive tools (google_wipe_mobile_device, google_offboard_user, suspend/reset) are tier-3 → checkGuardrails forces requiresApproval.
Must-fix — new m365_connections missing from the org-cascade contract. apps/api/migrations/2026-06-01-m365-connections.sql:13 creates m365_connections with an org_id column, but ORG_CASCADE_DELETE_ORDER in apps/api/src/services/tenantCascade.ts does not list it — the entry near line 106 is delegant_m365_connections, a different (older, #991) table. tenantCascade.integration.test.ts cross-checks the list against every org_id column in information_schema and will fail CI on the omission, and functionally an org delete would orphan its M365 identity connection. This is the exact gap class #991 hit. Add 'm365_connections' alongside the other connection tables (google_workspace_connections is already correctly listed at :109).
Nice-to-have (non-blocking). GoogleWorkspaceIntegration.tsx and M365Integration.tsx POST-save / DELETE-disconnect via raw fetchWithAuth with inline setSaveState feedback — that's the CLAUDE.md-permitted inline-handler exception, but they're recorded nowhere (not in runActionAllowlist.ts / migration backlog), so they silently skip runAction's HTTP-200 {success:false} detection. Add them to the backlog so they're tracked, not invisible.
Everything else is approve-ready — re-request once m365_connections is in the cascade list.
…tion + wiring google_workspace_connections table (org-scoped, RLS shape 1, FORCE RLS, 4 org policies), idempotent migration (validated twice on local test DB), schema barrel export, ORG_CASCADE_DELETE_ORDER registration, and GOOGLE_WORKSPACE_ENABLED feature flag (default off). No client/route/tools yet — those are Phase 1b/2. WIP checkpoint on a long integrated build; squashed before PR.
…nt tools
Phase 1b: @googleapis/admin + @googleapis/gmail + google-auth-library deps;
googleClient.ts (DWD JWT via each package's own auth namespace — admin
impersonates super-admin, gmail impersonates target user; least-privilege
scopes; typed error normalization); googleHelpers.ts (org-scoped connection
load + authorize + decrypt); routes/google.ts (GET/POST/DELETE /google/connection,
ORGS_WRITE+MFA gated, fail-closed live-Directory verify before storing the
encrypted SA key, never returns the key); mounted in index.ts.
Phase 2: aiToolsGoogle.ts (8 handlers — lookup[T1], reset_password, suspend,
restore, signout, set_forwarding, set_vacation, update_user[T3]); registered in
aiAgentSdkTools.ts via googleToolDefinitions gated on GOOGLE_WORKSPACE_ENABLED;
TOOL_TIERS + aiGuardrails RBAC ('google' resource) entries. All T3 writes ride
the existing tier-3 approval + audit path; require a reason.
Tests: 31 new (client, helpers, 8 handlers incl. error mapping + disposition,
gating). Existing aiGuardrails/m365 gating/session tests still green. API tsc
clean. No push (pre-deploy).
…th-AI UI
API: createSession now accepts an org-validated deviceId + approvalMode and
binds them to the ai_session (the 'task on this computer' scoping); route schema
extended. Cross-org device binding rejected ('Invalid device').
Web: aiStore.createSession takes an optional deviceId and sends approvalMode;
new startDeviceTask(deviceId, ctx) opens a device-scoped session. DeviceDetails
gets a 'Fix with AI' button (data-testid device-fix-with-ai) that launches it
with the device page-context.
Tests: aiAgent.deviceTask.test.ts (4, incl. cross-org rejection); existing
aiAgent.m365 binding unaffected. Web aiStore test green (node22); astro check 0
errors/0 warnings. Full touched-surface regression: 154/154 green. pnpm lockfile
updated for the @googleapis deps. No push (pre-deploy).
…he named Google list) Adds calendar sharing, the last item Billy explicitly named that wasn't already covered (aliases are handled in google_update_user). New getCalendarClient impersonates the calendar OWNER with a narrow calendar.acls scope; google_share_calendar (tier 3, reason+approval) inserts an ACL rule (role: freeBusyReader/reader/writer/owner, default reader) on the owner's primary or a named calendar. - @googleapis/calendar@^15 dep - CALENDAR_SCOPES added to the DWD scope CSV (operator must authorize) - 4 handler tests + gating test bumped to 9 tools + scope-union test - api tsc clean; 35/35 google unit tests green
… (cluster 1) google_offboard_user: best-effort sequence over a departing user — optional OOO + manager forwarding (no copy kept) run first while the account is active, then OAuth-token revoke, remove-from-all-groups, SELECTIVE mobile account wipe (admin_account_wipe, corporate data only — BYOD-safe, never a full device wipe), sign-out, suspend last. Each step independent + reported. google_wipe_mobile_device: STOLEN-DEVICE-ONLY full factory reset (admin_remote_wipe), explicitly separate from offboard, erases the whole device. Also fixes a latent fail-closed RBAC gap: google_share_calendar (Phase 4), google_offboard_user and google_wipe_mobile_device were missing from TOOL_PERMISSIONS, which would have blocked them at the guardrail. New DWD scopes (scope-minimized): admin.directory.group, .group.member, .device.mobile.action. Verified all API shapes against the installed type defs + Google docs (account_wipe vs remote_wipe enum). api tsc clean; 42/42 google unit tests + 80/80 guardrails green. Follow-up noted: add a TOOL_TIERS<->TOOL_PERMISSIONS coverage test.
…y-email (cluster 2) google_security_drift (tier 1, read): pages the directory and buckets users into no-2-step, super-admins, suspended, never-logged-in, and stale (>N days, default 90). Pure computeSecurityDrift() is unit-tested. No new scope (uses admin.directory.user). google_email_report (tier 1): runs the drift report, renders HTML, and emails it via the existing email service. Recipient is LOCKED to the connection's admin address (no arbitrary recipient) so the agent cannot exfiltrate directory data; errors cleanly when no email provider is configured. Registered in aiAgentSdkTools (TOOL_TIERS + definitions), aiGuardrails (TOOL_PERMISSIONS, resource google/read), gating test bumped to 13 tools. api tsc clean; 127/127 google+guardrails unit tests green.
…raph backend + UI Google (clusters 1-3 complete, 24 tools): group membership, OU move, rename, 2SV reset, mail delegation, license management (+@googleapis/licensing) + the Google Workspace connection UI (Identity integrations tab). M365: org-scoped m365_connections table + RLS migration, /m365/connection connect route (M365_ENABLED flag), m365DirectGraph.ts (client-credentials Graph backend reusing c2cM365 helpers), aiToolsM365 made backend-aware (direct Graph when an org connection exists, Delegant broker otherwise), and the Microsoft 365 connection UI sub-tab. Existing Delegant tools/tests unchanged. Verified: api 5455 passing, web 697 passing, astro check 0 errors, tsc clean. Checkpoint on feature branch; not pushed.
Cover invokeDirect's Graph mapping for the direct M365 backend: get_user GET, disable_user PATCH accountEnabled:false, reset_user_password PATCH passwordProfile (+ generated temp password), list_groups, signin-activity $filter, 403→forbidden, and the missing-userId bad_request guard (no Graph call). 7/7 green.
…ntime) The 24 google_* tools are session-aware (not in the aiTools execution registry) and were not in m365ToolTiers, so getToolTier() returned undefined for all of them. checkGuardrails then treated every Google tool as tier undefined → 'Unknown tool' → blocked, so the entire Google Workspace suite failed at runtime despite passing registration + handler unit tests (those never exercise the checkGuardrails→getToolTier path). Wire googleToolTiers into getToolTier's fallback chain. Add a regression test asserting getToolTier resolves every google_* tool to its declared tier. Verified: tsc clean; full api suite 5463 passing / 0 failures.
…-in-depth) m365DirectGraph get_user_signin_activity interpolated the user id into an OData $filter; encodeURIComponent covers URL transport but not OData literal escaping. Double single-quotes so a quote in the id can't alter the filter. Same-tenant read with the org's own token, so low severity. tsc + test green.
Covers the M365 connection route the credentials UI drives: GET never returns the secret; POST verifies via Graph (acquireClientCredentialsToken + testGraphAccess) and encrypts before storing, returning 201 with no secret in the body; Graph-verify failure and token-acquisition failure both 400 with a hint and do not encrypt/store; missing client secret is rejected 400. 7/7.
Symmetric to the m365 route tests: GET never returns the service-account key; POST verifies via a live Directory call (getDirectoryClient.users.get) and encrypts the key before storing, returning 201 with no key in the body; a failed DWD verify and a malformed key both 400 (with hint on the former) and do not encrypt/store; missing key rejected 400. 7/7, tsc clean.
index.ts applies no global auth to the /api/v1 group; each route file attaches authMiddleware itself. google.ts and m365.ts were the only routers missing it, so requirePermission saw no auth context and returned 401 for every caller even with a valid Bearer. Add authMiddleware before the feature-flag gate, plus a regression test in each suite that mounts the router without harness auth and asserts authMiddleware is invoked.
…y connection forms Collapsible step-by-step in the Google and M365 connection forms (service-account JSON + domain-wide delegation scopes for Google; Entra app registration tenant/client id + client secret + Graph app permissions for M365).
…s#1049 security contracts - M365 tenant id: acquireClientCredentialsToken now requires the M365TenantId brand (Entra GUID). Validate-and-narrow via isM365TenantId at both call sites (POST /m365/connection and m365DirectGraph.getToken), rejecting non-GUID ids before they reach the token URL; update fixtures/mocks + add a reject test. - aiAgent.createSession device binding: honor LanternOps#1047's auth.canAccessSite so a site-restricted caller cannot bind a session to an out-of-site device within an accessible org. Add site-restricted reject + allow tests.
Avoids Trivy's gcp-service-account secret rule false-positive on the example
'{ "type": "service_account", ... }' literal; use a descriptive placeholder.
…chable M365 gate, cascade Addresses Todd's review of the Google/M365 identity console: - M365 tool gate: m365ToolDefinitions advertises tools on M365_ENABLED OR DELEGANT_BASE_URL (was Delegant-only), so the direct app-only Graph path is reachable in production. - tenantCascade: add m365_connections, sort google_workspace_connections — both org_id tables now covered by GDPR org-erasure; cascade contract passes. - Privileged mutations no longer log failure as success: googleOffboardUserHandler returns a structured error envelope when any step fails; m365 errorTemplate returns errorString(code,...) instead of prose; resolveUserId propagates the underlying error (404 -> not-found, auth/forbidden/5xx -> the real failure) rather than collapsing every non-ok to phantom 'user not found'. - google_set_forwarding: verify the destination's verificationStatus and surface a real create failure (via a create-or-read probe) instead of reporting forwarding 'enabled' when it can't deliver; same fix in the offboard step. - Add google_disable_forwarding to turn forwarding back off (optionally remove the address). Wired into tiers + guardrails + SDK registry. - Tests for all the above (forwarding pending/failure/already-exists, disable, offboard envelope, m365 auth-vs-not-found).
cba9559 to
14d2581
Compare
|
Thanks for the thorough review — all 3 blockers + both HIGH addressed, plus a turn-it-off companion. Blocker 1 — M365 direct path unreachable. Blocker 2 — tenantCascade. Added Blocker 3 — failed mutations logged as success. Root cause was success inferred from string shape. Fixed at the source: HIGH — HIGH — New — Tests added for: forwarding pending-verification, real create failure, already-exists/read-status, disable, the offboard error envelope, and m365 auth-failure-vs-not-found. Note the Gmail/Graph mutation paths are covered by unit mocks (I don't have a live test tenant for them). Did not split into 3 PRs — happy to if you'd prefer, but since the blockers are fixed and you've already reviewed the whole thing, a delta re-review seemed lighter than carving out the shared tenantCascade/SDK-registry/session machinery. Your call. Re-requesting review. |
|
Status: verified merge-ready — holding for the release after the in-flight security release. This isn't a code concern; it's purely release sequencing, so nothing more is needed from you right now. Re-review of the fix-up ( The Trivy CRITICAL on the service-account field is a stale false positive — already remediated by A few non-blocking nits I'll hand off when we pick this back up (no action now): the unused Will merge once the security release is out. Leaving the existing review state as-is until then. |
…device-tasks # Conflicts: # pnpm-lock.yaml
…device-tasks # Conflicts: # pnpm-lock.yaml
… 500)
"Fix with AI" on a device dispatches POST /ai/sessions with only { deviceId }
and no orgId. createSession resolved the session org as
`options.orgId ?? auth.orgId ?? auth.accessibleOrgIds?.[0]`, so a partner /
multi-org caller (auth.orgId === null) bound the session to accessibleOrgIds[0]
— an org unrelated to the device. The SECURITY-CRITICAL cross-org check
(dev.orgId !== orgId) then rejected the dispatch, and the bare throw surfaced
as HTTP 500 "Invalid device". For an MSP admin every Fix-with-AI click failed.
Fix: resolve the device up front and anchor the session org to the device's
org when the caller can access it (auth.canAccessOrg), before falling back to
auth.orgId / accessibleOrgIds[0]. The opaque cross-org/site checks are
unchanged: a device the caller cannot reach still falls through to the same
"Invalid device" rejection (no existence leak), and an explicit orgId that
doesn't own the device is still rejected. Also map "Invalid device" to 400
instead of 500 in the route.
Tests (TDD): multi-org caller binds to the device's org; explicit-orgId
mismatch still rejected; route maps Invalid device -> 400. Verified live: the
repro now returns 201 bound to the device's org; bogus device returns 400.
Found via live E2E (FEATURE_TEST_LOG 2026-06-01); unit tests passed orgId
explicitly so they missed it.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
(cherry picked from commit 7b026ad219c0b2ab933f8ca20e7665d32a246ac2)
#1182) ## Summary Addresses **L-5** of the #917 hardening cluster. Partner unsuspend (`apps/api/src/routes/admin/abuse.ts`) did: ```ts UPDATE users SET status='active' WHERE partner_id=? AND status='disabled' ``` — re-enabling **every** disabled user under the partner, regardless of why they were disabled. So a suspend → unsuspend cycle would silently resurrect users an operator had deliberately disabled for **compromise, off-boarding, or a manual admin action**. ## Fix — record why a user is disabled - **Migration** `2026-06-09-users-disabled-reason.sql`: adds a nullable `users.disabled_reason` (NULL = "disabled for some non-suspension reason") and backfills users currently disabled under a still-`suspended` partner as `'partner_suspended'`, so in-flight suspensions still unsuspend correctly. Idempotent; adding a column to the already-RLS-forced `users` table needs no policy change. - **Suspend** stamps `disabled_reason='partner_suspended'` on the users it disables, and now skips users already `disabled` so it never re-stamps an other-reason disable. Session/JWT/OAuth/remote-session revocation still covers **all** partner users (via the separate `affectedUserIds` set), so the suspension's security posture is unchanged. - **Unsuspend** re-enables only `status='disabled' AND disabled_reason='partner_suspended'`, clearing the marker. Users disabled for any other reason stay disabled. The two user-state queries are extracted into small exported helpers (`disablePartnerUsersForSuspension`, `reEnableSuspensionDisabledUsers`) so the handler and the integration test share the exact SQL. ## Tests - New `partnerUnsuspendScope.integration.test.ts` (real Postgres) proves the row-filtering semantics end-to-end: a user disabled for another reason is **not** re-stamped by suspend and is **left disabled** after unsuspend, while suspension-disabled users are restored. - Existing abuse unit suite (26) and the `autoMigrate` ordering guard (43) green; `tsc` + lint clean. > Note: a local `db:check-drift` against the shared dev DB flags `2026-06-01-google-workspace-connections` / `m365-connections` ledger entries with no files — those are from the still-open #1053 branch (not on `main`), unrelated to this PR. The new migration applies cleanly on a fresh DB (verified via the integration runner's `autoMigrate`). Addresses **L-5** of #917 (other sub-items tracked separately). 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
# Conflicts: # apps/api/package.json # apps/web/src/components/devices/DeviceDetails.tsx # pnpm-lock.yaml
Trivy blocking scan flagged HIGH CVE-2026-48068/48069 in @grpc/grpc-js 1.14.3 (malformed-request server crash), a transitive dep of firebase-admin. Freshly-published CVE; main's last scan passed only because it ran before the vuln DB updated. Pin to the patched 1.14.4 via the existing pnpm.overrides convention. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Summary
Adds a Google Workspace + Microsoft 365 identity console (enter credentials in-app, then run helpdesk identity actions through the AI assistant) plus the Phase 3 Claude device-task dispatch groundwork. Both providers are feature-flagged dark by default (
GOOGLE_WORKSPACE_ENABLED,M365_ENABLED).Google Workspace
google_workspace_connectionstable (RLS, shape-1 org_id) + migration; connection UI (primary domain, super-admin to impersonate, service-account JSON) that does a live Directory call to verify domain-wide delegation before storing the (encrypted) key, and never returns it.TOOL_PERMISSIONS+getToolTier.Microsoft 365
m365_connectionstable (RLS) + migration; connection UI (tenant id, app/client id, client secret) verified via a live Graph call before storing the encrypted secret.m365_*tools are made backend-aware: direct app-only Graph when the org has a connection, else the existing Delegant broker. Newm365DirectGraphmaps the operations to Graph v1.0.Claude device-task dispatch (Phase 3)
createSessioncan bind to a device; Fix-with-AI entry point.Conformance to the in-flight tenant-isolation work
createSessiondevice binding honors the site-axis (auth.canAccessSite) added in fix(security): site-scope the AI-agent tool layer (cross-site RBAC gap) #1047, not just org, so a site-restricted caller cannot bind to an out-of-site device.deviceArgsCoveragecontract: the identity tools are email-keyed (not device-id), and Phase 3 adds no new device-arg tool.M365TenantId(validated GUID) required byacquireClientCredentialsToken, rejecting non-GUID tenant ids before they reach the token URL.Tests
Route tests, tool gating/handler tests, schema + RLS, and the site-scope/deviceArgs/dead-perms contract tests are green locally (api vitest, web vitest, tsc, eslint).
Note
This is a large branch; happy to split into Google / M365 / device-dispatch PRs if that reviews better.