Skip to content

feat: Model-selection policy schema for agentic workflows #2309

@lpcox

Description

@lpcox

Summary

Define a model-selection policy object that captures how agentic workflows choose, validate, and fall back between models at runtime. The schema lives in this repo (gh-aw-firewall) alongside other policy primitives (network ACLs, seccomp profiles), but is understood by the gh-aw compiler and enforced by AWF at execution time.

Companion to gh-aw#29191 (model fallback feature request).

Proposed Policy Schema

{
  "$schema": "https://github.com/github/gh-aw-firewall/schemas/model-policy.v1.json",
  "version": "1",

  // Primary model selection
  "model": {
    "id": "gpt-5.2",                    // requested model
    "reasoning_effort": "medium",        // optional engine-specific param
    "provider": "copilot"                // copilot | anthropic | openai | custom
  },

  // Fallback chain — tried in order when primary is unavailable
  "fallback": [
    { "id": "gpt-4.1", "provider": "copilot" },
    { "id": "claude-sonnet-4-20250514", "provider": "anthropic" },
    { "strategy": "auto" }              // sentinel: pick best available
  ],

  // Constraints applied to ALL model selections (primary + fallbacks)
  "constraints": {
    "capabilities": ["tool-use", "vision"],  // required capabilities
    "max_context_window": null,              // null = no limit
    "min_context_window": 128000,            // minimum tokens
    "cost_tier": "standard"                  // standard | premium | economy
  },

  // Behavior when no model satisfies constraints
  "on_unavailable": "fail",  // "fail" | "warn-and-use-best" | "queue"

  // Audit/observability
  "audit": {
    "log_selection": true,          // log which model was selected and why
    "log_fallback_reason": true     // log why primary was skipped
  }
}

Integration Points

1. Compiler (gh-aw compile)

The compiler reads model-selection policy from workflow frontmatter and serializes it into the lock file:

# In .md workflow frontmatter
model: gpt-5.2
model-policy:
  fallback: [gpt-4.1, claude-sonnet-4-20250514, auto]
  constraints:
    capabilities: [tool-use]
    min_context_window: 128000
  on_unavailable: fail

The compiler validates:

  • Model IDs against a known registry (warn on unknown, don't block)
  • Constraint fields are well-formed
  • Fallback chain doesn't exceed max depth (e.g., 5)

2. AWF Runtime Enforcement

At container startup, AWF:

  1. Reads the model policy from the workflow metadata (passed via env var AWF_MODEL_POLICY_B64 or similar)
  2. Queries available models via the API proxy sidecar (GET /models)
  3. Resolves the effective model by walking: primary → fallback[0] → fallback[1] → ... → auto
  4. Applies constraints to filter candidates (capabilities, context window, cost tier)
  5. Sets AWF_RESOLVED_MODEL env var in the agent container
  6. Emits audit log entries for observability

3. API Proxy Sidecar

The api-proxy can enforce the resolved model:

  • If agent requests a model different from AWF_RESOLVED_MODEL, either:
    • Rewrite the request to use the resolved model (transparent enforcement)
    • Reject with 400 and guidance (strict enforcement)
  • Configuration: "enforcement": "rewrite" | "reject" | "passthrough"

Schema Location

gh-aw-firewall/
├── schemas/
│   └── model-policy.v1.json          # JSON Schema definition
├── src/
│   ├── model-policy.ts               # Parser + validator
│   └── model-resolver.ts             # Resolution logic (primary → fallback → auto)
└── docs/
    └── model-selection-policy.md      # Specification document

Design Principles

  1. Declarative over imperative — Policy describes intent, not implementation
  2. Fail-safe defaults — Without explicit policy, current behavior (hard fail) is preserved
  3. Auditable — Every model selection decision is logged with reasoning
  4. Composable — Org-level policies can constrain repo-level policies (future: org policy inheritance)
  5. Engine-agnostic — Works across Copilot, Claude, Codex, and custom engines

Open Questions

  • Should auto strategy use capability matching or just pick the "best" available model?
  • How does org-level model governance interact? (e.g., org bans certain models)
  • Should the policy support time-based selection? (e.g., use cheaper model for scheduled jobs)
  • Is AWF_MODEL_POLICY_B64 the right transport, or should it be a file mount?

Related

  • gh-aw#29191 — Model fallback feature request (upstream)
  • sweagentd#11264 — CCA jobs failing with model_not_supported (motivating incident)
  • Network policy precedent: --allow-domains → Squid ACL (same pattern: declarative policy → runtime enforcement)

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions