Harden Strands demo adapter and document live validation results by ShawnC7208 · Pull Request #3 · ShawnC7208/APD

ShawnC7208 · 2026-04-18T00:53:57Z

Summary

Fixes three runtime errors that surfaced during live validation of both launch APDs through Claude (Anthropic API) via the Strands adapter
Documents the completed live validation in `docs/public-launch-checklist.md`

What changed

`adapters/strands/demo/run_apd_with_aer.py`

Auto-detect `ANTHROPIC_API_KEY` and configure `AnthropicModel` so the demo runs without AWS credentials (falls back to Strands default Bedrock if not set)
Tightened runtime envelope prompt to specify exact enum values for `tool_invocations.outcome` — live run showed the model returning descriptive sentences instead of `"success" | "failure" | "canceled"`
Added robust `AgentResult` text extraction with markdown fence stripping to handle non-text content blocks in Strands responses
Added evidence shape guard and outcome normalizer for live model responses

`docs/public-launch-checklist.md`

Marked live runtime validation ✅ complete (2026-04-17)
Recorded run results: both APDs passed schema validation; `refund-escalation-synthesized` was 0-difference against its APD; `invoice-logging` had 1 `failed-check` diff (model correctly flagged it could not send an email in a demo context — adapter rough edge, not a spec defect)
Documented adapter rough edges for follow-up

`CHANGELOG.md` — changes noted under `[Unreleased]`

`.gitignore` — `launch-validation/` run artifacts excluded

Live validation verdict

The spec, schema, and conformance comparison logic all held up under real Claude-backed execution. The one `failed-check` difference on `invoice-logging` demonstrates the comparison machinery working correctly, not a defect.

Test plan

`npm test` passes (SDK, CLI, smoke)
Both APDs ran live through Strands + Anthropic API
`aer validate --strict` passes for both generated AERs
`aer compare` passes for `refund-escalation-synthesized` (0 differences)
`aer compare` on `invoice-logging` shows expected 1 `failed-check` diff

🤖 Generated with Claude Code

- Auto-detect ANTHROPIC_API_KEY and use AnthropicModel so the demo runs without AWS credentials; falls back to Strands default otherwise. - Tighten runtime envelope prompt to specify exact enum values for tool_invocations.outcome ("success", "failure", "canceled") after the live run showed the model returning descriptive sentences. - Add robust AgentResult text extraction with markdown fence stripping to handle non-text content blocks in the Strands response. - Add evidence shape guard (isinstance dict check) and outcome normalizer for live model responses. - Mark live runtime validation complete in docs/public-launch-checklist.md with full run results and adapter rough edges documented. - Exclude launch-validation/ run artifacts from git. - Add adapter changes to CHANGELOG [Unreleased]. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

ShawnC7208 merged commit 8d50b98 into main Apr 18, 2026
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Harden Strands demo adapter and document live validation results#3

Harden Strands demo adapter and document live validation results#3
ShawnC7208 merged 1 commit into
mainfrom
launch-prep

ShawnC7208 commented Apr 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ShawnC7208 commented Apr 18, 2026

Summary

What changed

Live validation verdict

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant