feat: Provisioners as Last Resort (closes row 16)#31
Open
barbosarwx wants to merge 1 commit into
Open
Conversation
Adds a new "Provisioners as Last Resort" section to references/code-patterns.md covering bootstrap (user_data + cloud-init), orchestration (terraform_data + triggers_replace), ongoing config (Ansible / SSM Run Command / SSM State Manager), and last-resort (provisioner) tiers with a systematic 5-cost refusal vocabulary. Routes via a new "Bootstrap / orchestration misuse" row in SKILL.md's "Diagnose Before You Generate" table and cross-links from the existing single-line provisioner mention in security-compliance.md. Flips tests/rationalization-table.md row 16 from x to checkmark; updates the Coverage Summary, Priority Gaps section, and bottom progress summary accordingly. Testing evidence: canonical fresh-session captures (claude -p) for baseline, compliance-with-skill-as-is, and compliance-with-draft. Run 2 of compliance-with-draft shows all four scenario-16 Expected signals present, zero Forbidden signals, Response Contract applied, and the response explicitly references the new "bootstrap / orchestration misuse" failure-mode category.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Type of change: New content (adds the dedicated guidance for the §16 scenario; Row 16 in
tests/rationalization-table.md).Summary: Adds a new "Provisioners as Last Resort" section to
references/code-patterns.mdcovering the four real intents that get conflated when someone asks "how do I run a setup script on an EC2 instance after it boots?":user_data+ cloud-init viatemplatefile()terraform_data+triggers_replaceterraform_data+provisioner(1.4+) ornull_resource(pre-1.4)Plus the systematic 5-cost refusal vocabulary (non-idempotent, create-only, network-coupled, drift-blind, secret-leak) and ❌ DON'T / ✅ DO HCL examples per the skill's CLAUDE.md rules.
Routes via a new "Bootstrap / orchestration misuse" row in SKILL.md's diagnose table; cross-links from the existing single-line provisioner reference in
security-compliance.md.Flips coverage matrix row 16 from ❌ to ✅; updates Coverage Summary (16/17 → 17/17 covered), removes Row 16 from "Priority Gaps", and updates the bottom progress summary.
Testing Evidence (REQUIRED)
Scenarios Tested
provisioner/null_resourcebootstrapMethodology
Three transcripts captured via
claude -p(Claude Code 2.1.143, non-interactive, fresh OAuth-authenticated subprocess per run, working directory/tmp/eval-neutralwith noCLAUDE.mdand no.tffiles, auto-memory unchanged across runs to isolate the skill-state variable):terraform-skillinstalled.terraform-skillv1.14.0 installed at~/.claude/skills/terraform-skill/.Baseline Behavior (WITHOUT this PR)
Prompt: How do I run a setup script on an EC2 instance after it boots?
Baseline (no skill) produces a strong AWS-generic answer:
user_dataas primary recommendationcloud-initmention,cloud-init-perand#cloud-boothookfor re-run/var/log/cloud-init-output.logfor debuggingtemplatefile()not mentioned (Terraform-specific helper)terraform_datanot mentioned at allprovisioner(provisioners not discussed at all)Modern Claude already gets the bootstrap tier right without the skill. What it misses are the Terraform-specific tier transitions, the orchestration tier (
terraform_data), and the systematic refusal vocabulary the rest of the skill uses elsewhere.Compliance with upstream skill, no draft (BEFORE)
Same prompt, same neutral directory, skill installed:
Surprising result: compliance-asis is weaker than baseline. The skill loads (verified with a separate sanity probe — Claude can read the diagnose table). But the bare §16 prompt has no Terraform cue, so the skill's "Diagnose Before You Generate" workflow doesn't engage. With the skill in scope but no routing match, Claude suppresses some general AWS depth — drops Ansible, drops Packer, drops
#cloud-boothook— and gives a thinner answer than pure baseline.This is the gap this PR closes: a new diagnose-table row for the bootstrap intent.
Compliance with this PR (AFTER) — Run 1 (bare §16 prompt)
Same prompt. Skill + PR drafts installed. Result: same problem as compliance-asis — the bare §16 prompt is too AWS-generic to trigger the skill's workflow. New routing exists but isn't reached because Claude doesn't see this as a Terraform question on the literal prompt.
This is a separate observation about the §16 prompt itself — see the "Side observation" section at the bottom of this PR. It's not a defect in the PR.
Compliance with this PR (AFTER) — Run 2 (realistic Terraform context)
Prompt: In Terraform, how do I run a setup script on an EC2 instance after it boots?
(Two-word addition; matches the pattern of 16 of the 17 other scenarios in
baseline-scenarios.mdwhich all carry a Terraform/HCL cue.)Verbatim response from
claude -p:Evidence of Improvement (Run 2 vs baseline)
user_data+ cloud-init +templatefile()as PRIMARYtemplatefile())terraform_data(1.4+) for orchestrationsensitivedoesn't redact provisioner output", "won't re-run on change", plus plaintext-secrets warning)Zero Forbidden signals appear in Run 2 (no
null_resource/remote-execas first-line; noaws ssm send-commandvialocal-exec; idempotency + re-run semantics explicit).Notable: the response's Response Contract explicitly names the failure category "bootstrap / orchestration misuse" — the exact label introduced by this PR's new SKILL.md diagnose row. This is direct evidence the routing works as designed.
Evidence of Improvement (checklist)
Standards Compliance Checklist
Frontmatter
Token Efficiency
references/code-patterns.md; only one row added to SKILL.md (301 → 302 lines;validate.yml's hard size gate is 500 with WARNING-only behavior).provisionermention insecurity-compliance.mdgets a cross-link, not a copy.Content Quality
terraform fmt -checkPASS,terraform validatereturns "Success! The configuration is valid."(1.4+),(pre-1.4))File Organization
SKILL.md(one diagnose-table row)references/code-patterns.md(new section)tests/rationalization-table.mdValidation
validate.ymlchecks expected outcomes:.codex-plugin/plugin.jsonnot touched; SKILL.mdmetadata.versionnot touched (release pipeline owns both); versions stay in sync>500with WARNING-onlyreferences/code-patterns.mdfile exists (the check verifies file existence, not anchor resolution)continue-on-error: truein the workflow; manual eyeball clean locallyRationalizations
New rationalizations discovered: None.
Meta-finding on the original Row 16 framing:
The original Row 16 trap ("LLMs default to
null_resource+local-exec") appears partially obsolete in current-generation Claude — baseline (no skill) already defaults touser_data+ cloud-init for bootstrap. What current models still miss (and this PR adds) is everything beyond tier 1:terraform_datafor orchestration, the systematic 5-cost refusal vocabulary, and the Ansible/SSM off-ramp framed as the correct tool for ongoing config (vs. the current skill's framing which only mentions Ansible as something to reject vialocal-exec).If you'd prefer to reframe Row 16's hallucination-surface description in
tests/rationalization-table.mdto reflect this nuance, happy to update.Related Issues
Closes the open
❌row 16 oftests/rationalization-table.md.Additional Context
External-LLM format/compression review (CLAUDE.md line 155)
CLAUDE.md says: "for substantive new sections, consult an external LLM expert (e.g. GPT via
mcp__codex__codex) for format/compression review before merge."This PR's new section is substantive. The
mcp__codex__codexserver requires paid OpenAI access (Codex is plan-included for ChatGPT Plus/Pro/Team subscriptions, or pay-per-use API key; no free tier). I do not currently have OpenAI access set up. Happy to run the codex review if it's required for merge — please advise. The draft as-is has been internally reviewed against every numbered rule in CLAUDE.md's "LLM Consumption Rules" section.Side observation: §16 prompt is the only Terraform-context-less prompt in the suite
While capturing canonical compliance evidence I noticed §16 is the only scenario whose prompt has zero Terraform cue. Quick survey of all 17 prompts:
terraform test/ "Terraform state" / "Terraform project" / "Terraform module"): §1, §2, §3, §5, §6, §7, §10, §11, §14, §15aws_instance.web,for_each, "module", "input variables", "plan", "moved"): §4, §8, §9, §12, §13, §17On current Claude, the bare §16 prompt doesn't reliably trigger the skill's workflow — Claude doesn't classify it as a Terraform question, so the diagnose-table routing doesn't engage even with this PR's new row added (Run 1 above). Adding a minimal cue ("In Terraform, ...") — matching the pattern of every other scenario — triggers the workflow correctly (Run 2 above).
This is an observation about the §16 prompt itself, independent of this PR's content. Suggested follow-up: revise the §16 prompt for consistency with the other 16 scenarios. Happy to open a separate PR for that if you'd like.
Tested local environment
Ubuntu 24.04 (WSL2), Claude Code 2.1.143, Terraform v1.9.8 (binary download to
~/bin, no apt), AWS provider 5.100.0, null provider 3.3.0.