Add experimental `gh aw experiments darwin` command by Copilot · Pull Request #42081 · github/gh-aw

Copilot · 2026-06-28T12:45:06Z

This adds Darwin Mode as an extension of the existing experiments feature: a new gh aw experiments darwin subcommand that evaluates an experiment’s current variants, archives the current generation, and promotes a winner into the next generation. It keeps the workflow rooted in experiments: rather than introducing a separate optimizer path.

CLI surface
- Adds gh aw experiments darwin <workflow> <experiment>
- Wires the subcommand into the hidden experiments command group
- Supports:
  - --winner to override the promoted variant
  - --variant to define the next generation explicitly
  - --archive-dir to control archive location
  - --apply to persist archive + frontmatter changes
  - --json for machine-readable output
Darwin evaluation and promotion
- Reuses existing experiment state from experiments/<workflow> branch history
- Ranks declared variants by observed counts, preserving declaration order for ties
- Selects a winner and promotes it to control by moving it to the front of the variant list
- Falls back cleanly when experiment state is absent by treating the current declaration as generation zero
Generation archive
- Archives each Darwin run as JSON under .github/experiments/archive/<workflow>/<experiment>/
- Records:
  - workflow/experiment identity
  - promoted winner
  - previous and next variant sets
  - ranking snapshot
  - computed experiment analysis
  - raw experiment state
Workflow mutation
- Updates the existing experiments: frontmatter in-place
- Supports both bare-array and object-form experiments
- Rewrites only the target experiment’s variant population, preserving other experiment metadata
Docs and coverage
- Documents Darwin Mode in the experimental experiments docs
- Adds focused tests for ranking, next-generation construction, archive writing, and frontmatter promotion

Example:

gh aw experiments darwin myworkflow style --winner concise --variant concise --variant detailed --variant step_by_step --apply

This promotes concise to the control slot, archives the current generation, and rewrites the style experiment’s next variant population in the workflow frontmatter.

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

github-actions · 2026-06-28T12:50:06Z

⚠️ PR Code Quality Reviewer failed during code quality review.

github-actions · 2026-06-28T12:50:08Z

🧠 Matt Pocock Skills Reviewer has completed the skills-based review. ✅

github-actions · 2026-06-28T12:50:08Z

✅ Test Quality Sentinel completed test quality analysis.

github-actions · 2026-06-28T12:50:12Z

✅ Design Decision Gate 🏗️ completed the design decision gate check.

Copilot

Pull request overview

Adds an experimental “Darwin mode” workflow-evolution command under the hidden gh aw experiments command group, allowing users to evaluate observed experiment counts, select a winner, archive a generation snapshot, and optionally rewrite the workflow’s experiments: frontmatter to promote the winner and define the next population.

Changes:

Introduces gh aw experiments darwin <workflow> <experiment> with --winner, repeatable --variant, --archive-dir, --apply, and --json output.
Implements Darwin planning: ranking variants by observed counts (tie-breaking by declaration order), winner selection, next-generation construction, optional archive writing, and frontmatter mutation.
Adds focused unit tests for ranking/next-generation/frontmatter update/archive writing, plus docs describing Darwin mode usage.

Show a summary per file

File	Description
pkg/cli/experiments_darwin_command.go	Implements the Darwin subcommand, ranking/promotion logic, archive writing, and workflow frontmatter mutation.
pkg/cli/experiments_darwin_command_test.go	Adds unit tests covering ranking, next-generation construction, archive writing, and frontmatter updates.
pkg/cli/experiments_command.go	Wires the new `darwin` subcommand into the existing hidden `experiments` command group and updates help text.
pkg/cli/experiments_command_test.go	Extends command wiring/flag tests to include the new `darwin` subcommand.
docs/src/content/docs/experimental/experiments.md	Documents Darwin mode usage and behavior within the experiments feature.

Review details

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Files reviewed: 5/5 changed files
Comments generated: 5
Review effort level: Low

+	if strings.TrimSpace(config.Experiment) == "" {
+		return nil, nil, errors.New("experiment name is required")
+	}
+	if strings.TrimSpace(config.ArchiveDir) == "" {
+		config.ArchiveDir = defaultDarwinArchiveDir
+	}


+	nextVariants := buildDarwinNextVariants(winner, expCfg.Variants, ranking, config.NextVariants)
+	if len(nextVariants) < 2 {
+		return nil, nil, fmt.Errorf("darwin generation for %q must contain at least 2 variants", config.Experiment)
+	}
+	archivePath, archivedAt := darwinArchiveLocation(config.ArchiveDir, workflowID, config.Experiment)
+	rankedRows := darwinRankingRows(ranking, expCfg.Variants[0], winner)
+
+	plan := &DarwinPlan{
+		WorkflowID:      workflowID,
+		WorkflowPath:    workflowPath,
+		ExperimentName:  config.Experiment,
+		Branch:          branchName,
+		ArchivePath:     archivePath,
+		ArchivedAt:      archivedAt,
+		Apply:           config.Apply,
+		Winner:          winner,
+		CurrentVariants: slices.Clone(expCfg.Variants),
+		NextVariants:    nextVariants,
+		Ranking:         rankedRows,
+		Analysis:        analysis,
+	}
+	archive := &DarwinArchive{
+		WorkflowID:      workflowID,
+		WorkflowPath:    workflowPath,
+		ExperimentName:  config.Experiment,
+		Branch:          branchName,
+		ArchivedAt:      archivedAt,
+		Winner:          winner,
+		CurrentVariants: slices.Clone(expCfg.Variants),
+		NextVariants:    slices.Clone(nextVariants),
+		Ranking:         rankedRows,
+		Analysis:        analysis,
+		State:           state,
+	}


+	branchName := experimentsBranchPrefix + workflowID
+	details, err := fetchLocalExperimentDetails(branchName, workflowID)
+	if err != nil {
+		state := emptyExperimentState()
+		details = experimentDetailsFromState(workflowID, branchName, state)
+		if !strings.Contains(err.Error(), "not found locally") {
+			return nil, nil, err
+		}
+	}
+	stateRef := "origin/" + branchName
+	if !gitRefExists(stateRef) {
+		stateRef = branchName
+	}
+	state := emptyExperimentState()
+	if gitRefExists(stateRef) {
+		state = readLocalExperimentState(stateRef)
+	}
+


+		Example: `  ` + string(constants.CLIExtensionPrefix) + ` experiments darwin myworkflow style
+  ` + string(constants.CLIExtensionPrefix) + ` experiments darwin myworkflow style --winner concise
+  ` + string(constants.CLIExtensionPrefix) + ` experiments darwin myworkflow style --variant concise --variant detailed --apply
+  ` + string(constants.CLIExtensionPrefix) + ` experiments darwin myworkflow style --archive-dir .github/experiments/archive --json`,
+		Args: cobra.ExactArgs(2),


+	fmt.Fprintln(os.Stderr, console.FormatInfoMessage("Darwin mode: "+plan.WorkflowID+"/"+plan.ExperimentName))
+	fmt.Fprintf(os.Stderr, "  Branch:        %s\n", plan.Branch)
+	fmt.Fprintf(os.Stderr, "  Winner:        %s\n", plan.Winner)
+	fmt.Fprintf(os.Stderr, "  Recommendation:%s\n", plan.Analysis.Recommendation)


github-actions · 2026-06-28T12:56:39Z

Design Decision Gate — ADR Required

This PR makes significant changes to core business logic (621 new lines in pkg/cli/) but does not have a linked Architecture Decision Record (ADR).

Draft ADR committed: docs/adr/42081-add-darwin-mode-evolutionary-experiment-promotion.md — review and complete it before merging.

This PR cannot merge until an ADR is linked in the PR body.

What to do next

Review the draft ADR committed to your branch — it was generated from the PR diff
Complete the missing sections — add context the AI could not infer, refine the decision rationale, and list real alternatives you considered
Commit the finalized ADR to docs/adr/ on your branch
Reference the ADR in this PR body by adding a line such as:

ADR: ADR-42081: Add Darwin Mode for Evolutionary Experiment Generation Promotion

Once an ADR is linked in the PR body, this gate will re-run and verify the implementation matches the decision.

Why ADRs Matter

ADRs create a searchable, permanent record of why the codebase looks the way it does. Future contributors (and your future self) will thank you.

Michael Nygard ADR Format Reference

An ADR must contain these four sections to be considered complete:

Context — What is the problem? What forces are at play?
Decision — What did you decide? Why?
Alternatives Considered — What else could have been done?
Consequences — What are the trade-offs (positive and negative)?

All ADRs are stored in docs/adr/ as Markdown files numbered by PR number (e.g., 42081-add-darwin-mode-evolutionary-experiment-promotion.md for PR #42081).

🏗️ ADR gate enforced by Design Decision Gate 🏗️ · 61.9 AIC · ⌖ 12.6 AIC · ⊞ 8.4K · ◷

github-actions

Darwin Mode review

The overall design is solid — Darwin Mode fits naturally into the existing experiments: feature, the ranking/promotion logic is clear, and the test coverage covers the key paths well. There are a few correctness and reliability issues that should be addressed before merge.

Blocking

#	File	Issue
1	`experiments_darwin_command.go:129`	`buildDarwinPlan` errors are swallowed (`return nil`) — CLI exits 0 on failure

Non-blocking (recommended fixes)

#	File	Issue
2	`experiments_darwin_command.go:427`	`printDarwinPlan` writes all output to `stderr`; human-readable output should go to `stdout`
3	`experiments_darwin_command.go:119`	`StringSlice` splits on commas; `StringArray` is safer for variant names
4	`experiments_darwin_command.go:136`	No rollback if archive succeeds but frontmatter update fails
5	`experiments_darwin_command.go:201`	Fragile error-substring sentinel; prefer a typed `sentinel error`

See inline comments for details and suggested fixes.

🧵 Reviewed using Impeccable skills by Impeccable Skills Reviewer · 59.1 AIC · ⌖ 6.86 AIC · ⊞ 4.8K

github-actions · 2026-06-28T12:56:40Z

+	plan, archive, err := buildDarwinPlan(config)
+	if err != nil {
+		fmt.Fprintln(os.Stderr, console.FormatErrorMessage(err.Error()))
+		return nil


Error silently swallowed — exit code will be 0 on failure.

Returning nil after printing the error means the CLI exits with code 0 even when buildDarwinPlan fails. Callers (scripts, CI jobs) cannot detect the failure. Change to return err so cobra propagates the non-zero exit code:

if err != nil { return err }

If the intent is to suppress the stack trace (showing only the formatted message), consider wrapping as a display error type handled by the root command, consistent with other subcommands in this package.

@copilot please address this.

github-actions · 2026-06-28T12:56:40Z

+}
+
+func printDarwinPlan(plan *DarwinPlan) {
+	fmt.Fprintln(os.Stderr, console.FormatInfoMessage("Darwin mode: "+plan.WorkflowID+"/"+plan.ExperimentName))


Human-readable output written to stderr instead of stdout.

All plan output in printDarwinPlan goes to os.Stderr. Non-error output should go to os.Stdout so users can pipe or redirect results. Sending informational output to stderr is reserved for errors and diagnostics:

// current — informational output on stderr prevents piping fmt.Fprintln(os.Stderr, console.FormatInfoMessage(...)) // fix — write human-readable plan output to stdout fmt.Fprintln(os.Stdout, console.FormatInfoMessage(...))

Note that the JSON path already correctly uses os.Stdout (line 145). The text path should be consistent.

@copilot please address this.

github-actions · 2026-06-28T12:56:40Z

+	cmd.Flags().Bool("apply", false, "Archive the current generation and update the workflow file in place")
+	cmd.Flags().String("winner", "", "Override the promoted winner variant")
+	cmd.Flags().String("archive-dir", defaultDarwinArchiveDir, "Directory where Darwin archives are written when --apply is used")
+	cmd.Flags().StringSlice("variant", nil, "Variant names for the next generation (repeatable)")


StringSlice splits on commas — use StringArray instead.

cobra.StringSlice treats comma-separated values as multiple elements: --variant a,b becomes ["a", "b"] rather than ["a,b"]. This can surprise users and may silently misbehave if a future variant name contains a comma. StringArray treats each --variant flag as a single value:

// current — splits on commas unexpectedly cmd.Flags().StringSlice("variant", nil, "...") // fix — one flag = one value cmd.Flags().StringArray("variant", nil, "...")

The GetStringSlice call on line 102 should also change to GetStringArray.

@copilot please address this.

github-actions · 2026-06-28T12:56:41Z

+		if err := writeDarwinArchive(plan.ArchivePath, archive); err != nil {
+			return fmt.Errorf("failed to write Darwin archive: %w", err)
+		}
+		if err := applyDarwinPromotion(plan.WorkflowPath, plan.ExperimentName, plan.NextVariants); err != nil {


Partial failure leaves archive written but workflow not updated (no rollback).

If writeDarwinArchive succeeds but applyDarwinPromotion fails, the archive JSON is persisted on disk while the workflow frontmatter is not updated — leaving the state inconsistent with no way to know whether to re-run or not.

Options to consider:

Write the archive last (after frontmatter is updated), so a failure before archive creation leaves the workflow clean.

Write the archive to a temp path, apply the frontmatter, then rename the temp file atomically.

Document this limitation with a // TODO noting that partial failure leaves an orphaned archive.

@copilot please address this.

github-actions · 2026-06-28T12:56:41Z

+	if err != nil {
+		state := emptyExperimentState()
+		details = experimentDetailsFromState(workflowID, branchName, state)
+		if !strings.Contains(err.Error(), "not found locally") {


Fragile error-message substring check as a sentinel.

Using strings.Contains(err.Error(), "not found locally") to distinguish "branch not found" from a real error couples this code to the exact phrasing returned by fetchLocalExperimentDetails. If that message ever changes (even capitalisation), this silently breaks by propagating the error instead of falling back gracefully.

Prefer a typed sentinel error (e.g., errors.Is(err, ErrBranchNotFound)) or a boolean return value from fetchLocalExperimentDetails to express the "not found" case explicitly:

// in fetchLocalExperimentDetails: var ErrBranchNotFound = errors.New("experiments branch not found locally") // here: if !errors.Is(err, ErrBranchNotFound) { return nil, nil, err }

@copilot please address this.

github-actions · 2026-06-28T12:59:24Z

🧪 Test Quality Sentinel Report

⚠️ Test Quality Score: 70/100 — Acceptable

Analyzed 10 test(s) across 2 changed test files: 8 design, 2 implementation, 0 hard coding-guideline violations; test inflation detected on experiments_command_test.go.

📊 Metrics & Test Classification (10 tests analyzed)

Metric	Value
New/modified tests analyzed	10
✅ Design tests (behavioral contracts)	8 (80%)
⚠️ Implementation tests (low value)	2 (20%)
Tests with error/edge cases	6 (60%)
Duplicate test clusters	0
Test inflation detected	YES — `experiments_command_test.go` (20 test additions / 4 prod additions = 5:1)
🚨 Coding-guideline violations	0 hard violations; ⚠️ 7 soft — missing assertion messages in `experiments_darwin_command_test.go`

Test	File	Classification	Issues Detected
`TestRankDarwinVariants`	`pkg/cli/experiments_darwin_command_test.go:17`	✅ Design	Missing assertion messages on all asserts
`TestBuildDarwinNextVariantsDefaultsToWinnerFirst`	`pkg/cli/experiments_darwin_command_test.go:30`	✅ Design	Missing assertion message
`TestBuildDarwinNextVariantsUsesRequestedPopulation`	`pkg/cli/experiments_darwin_command_test.go:41`	✅ Design	Missing assertion message
`TestFindOrBuildExperimentStatsAddsMissingDeclaredVariants`	`pkg/cli/experiments_darwin_command_test.go:48`	✅ Design	Missing assertion messages
`TestApplyDarwinPromotionUpdatesBareArrayExperiment`	`pkg/cli/experiments_darwin_command_test.go:62`	✅ Design	—
`TestApplyDarwinPromotionUpdatesObjectExperiment`	`pkg/cli/experiments_darwin_command_test.go:86`	✅ Design	—
`TestWriteDarwinArchive`	`pkg/cli/experiments_darwin_command_test.go:113`	✅ Design	Happy path only — no write-failure error coverage
`TestExperimentsDarwinSubcommandFlags`	`pkg/cli/experiments_command_test.go:340`	⚠️ Implementation	Checks flag registration, not behavioral output
`TestExperimentsDarwinRequiresArgs`	`pkg/cli/experiments_command_test.go:351`	✅ Design	—
`TestNewExperimentsCommand` (+darwin line)	`pkg/cli/experiments_command_test.go:313`	⚠️ Implementation	Checks subcommand wiring, not behavioral output

Go: 10 (*_test.go); JavaScript: 0. Other languages: none detected.

⚠️ Flagged Tests — Soft Issues (9 issue(s))

TestRankDarwinVariants (pkg/cli/experiments_darwin_command_test.go:17) — ⚠️ Missing assertion messages: require.Len(t, ranking, 3), assert.Equal(t, "challenger", ranking[0].Name), assert.Equal(t, "third", ranking[1].Name), assert.Equal(t, "control", ranking[2].Name), assert.True(t, ranking[2].CurrentControl) all lack context strings. Suggested fix: add a descriptive message to each, e.g. assert.Equal(t, "challenger", ranking[0].Name, "highest-count variant should rank first").

TestBuildDarwinNextVariantsDefaultsToWinnerFirst (pkg/cli/experiments_darwin_command_test.go:30) — ⚠️ assert.Equal(t, []string{"challenger", "control", "third"}, next) lacks message. Suggested fix: assert.Equal(t, ..., "winner should be promoted to position 0").

TestBuildDarwinNextVariantsUsesRequestedPopulation (pkg/cli/experiments_darwin_command_test.go:41) — ⚠️ assert.Equal(t, []string{"control", "mutant-a", "mutant-b"}, next) lacks message.

TestFindOrBuildExperimentStatsAddsMissingDeclaredVariants (pkg/cli/experiments_darwin_command_test.go:48) — ⚠️ assert.Equal(t, 4, stats.Total), assert.Equal(t, 4, stats.Variants["control"]), assert.Equal(t, 0, stats.Variants["challenger"]) all lack messages.

TestWriteDarwinArchive (pkg/cli/experiments_darwin_command_test.go:113) — ⚠️ Happy path only. Does not test error scenarios (e.g. non-writable archive directory, corrupt JSON). Suggested fix: add a sub-case using a read-only temp directory to assert an error is returned.

TestExperimentsDarwinSubcommandFlags (pkg/cli/experiments_command_test.go:340) — ⚠️ Implementation test: only checks that flags are registered (cmd.Flag("json") != nil), not that the command behaves correctly when flags are passed. This test would survive a refactor that renames flags without changing behavior. Low but non-zero value as a smoke-test of the CLI surface.

TestNewExperimentsCommand (darwin line) (pkg/cli/experiments_command_test.go:313) — ⚠️ Implementation test: verifies subcommand registration, not observable output. Acceptable as a registration sanity check.

Test Inflation — experiments_command_test.go: 20 test lines added vs 4 production lines added (5:1 ratio, threshold 2:1). The ratio is high because the production wiring is minimal (one line to register the darwin subcommand) while the tests comprehensively cover the new CLI surface. Not a quality problem in practice, but inflates the metric.

Verdict

✅ Check passed. 20% implementation tests (threshold: 30%). All new Go test files have the required //go:build tag; no mock libraries detected. Core darwin logic is well-covered by behavioral tests; the flagged soft issues (missing assertion messages, happy-path-only archive test) are recommended improvements but do not block merge.

References:

§28322748031

🧪 Test quality analysis by Test Quality Sentinel · 83.6 AIC · ⌖ 18 AIC · ⊞ 8.2K · ◷

github-actions

✅ Test Quality Sentinel: 70/100. Test quality is acceptable — 20% of new tests are implementation tests (threshold: 30%).

github-actions · 2026-06-28T13:00:22Z

Darwin Mode Review Summary 🧬

REQUEST_CHANGES — Two correctness issues need addressing before merge; everything else is medium or minor.

🔴 Blocking (fix before merge)

#	File	Line	Issue
1	`experiments_darwin_command.go`	129	`return nil` after `buildDarwinPlan` error → CLI exits 0 on failure
2	`experiments_darwin_command.go`	197	`details.Experiments` is empty when branch only exists on `origin` → ranking silently all-zero

🟡 Medium (should fix)

#	File	Line	Issue
3	`experiments_darwin_command.go`	201	`strings.Contains(err.Error(), ...)` — use typed sentinel error
4	`experiments_darwin_command.go`	119	`StringSlice` → `StringArray` for `--variant`
5	`experiments_darwin_command.go`	132	Side effects applied before JSON marshal — invert order
6	`experiments_darwin_command.go`	51	`Apply bool` belongs in command layer, not `DarwinPlan` output struct

🟢 Minor (nice to have)

Line 366: second-granularity timestamp allows archive collision within the same second
Line 430: Recommendation: missing trailing space in output
Test file: add TestSelectDarwinWinnerRejectsUnknownOverride and TestRankDarwinVariantsAllZero

Overall the architecture is clean — Darwin Mode integrates nicely into the existing experiments: surface and the separation between planning and side effects is a solid foundation. The --apply dry-run default is a great UX choice.

@copilot please address the review comments above.

🧠 Reviewed using Matt Pocock's skills by Matt Pocock Skills Reviewer · 128.7 AIC · ⌖ 10.6 AIC · ⊞ 6.6K · ◷

github-actions

Skills-Based Review 🧠

Applied /tdd, /diagnose, /zoom-out, /grill-with-docs, and /improve-codebase-architecture — requesting changes on a correctness bug and a few medium-priority issues.

📋 Key Themes & Highlights

Blocking Issues

Exit-code bug (line 129): RunExperimentsDarwin returns nil on buildDarwinPlan failure — the CLI exits 0 even when the workflow or experiment is not found. Should be return err.
Ranking silently zeroed (line 197): when the experiments branch only exists on origin, details.Experiments is empty and all variants rank equally, potentially promoting the wrong winner without warning.

Medium Issues

Fragile error matching (line 201): strings.Contains(err.Error(), "not found locally") should use a typed sentinel error.
StringSlice vs StringArray (line 119): comma-separated --variant a,b is split into two variants unexpectedly; StringArray matches the documented one-flag-per-value usage.
Side-effect ordering (line 132): archive + frontmatter mutations happen before JSON marshaling; marshal first, then apply.
Apply in output struct (line 51): CLI flag leaked into DarwinPlan's JSON schema.

Minor Issues

Archive timestamp collision (line 366)
Missing space in Recommendation: output (line 430)
Two missing test cases: selectDarwinWinner error path, and all-zero counts (generation zero)

Positive Highlights

✅ Clean integration into the existing experiments: feature — no new top-level commands or parallel optimizer path
✅ Covers both bare-array and object-form experiment frontmatter
✅ Good separation of buildDarwinPlan (pure computation) from side effects (writeDarwinArchive, applyDarwinPromotion)
✅ Comprehensive archive schema with full ranking snapshot and raw state — useful for auditing
✅ Documentation is clear with concrete examples and an important note about new variant names

🧠 Reviewed using Matt Pocock's skills by Matt Pocock Skills Reviewer · 128.7 AIC · ⌖ 10.6 AIC · ⊞ 6.6K

github-actions · 2026-06-28T13:00:25Z

+	plan, archive, err := buildDarwinPlan(config)
+	if err != nil {
+		fmt.Fprintln(os.Stderr, console.FormatErrorMessage(err.Error()))
+		return nil


[/tdd] RunExperimentsDarwin returns nil on buildDarwinPlan failure, so the CLI always exits with code 0 even when the workflow or experiment is not found. This masks errors from callers and CI scripting.

💡 Fix

Change return nil to return err:

if err != nil { fmt.Fprintln(os.Stderr, console.FormatErrorMessage(err.Error())) return err }

Note that writeDarwinArchive and applyDarwinPromotion errors (lines 134–138) are already propagated correctly — this brings buildDarwinPlan errors into line.

@copilot please address this.

github-actions · 2026-06-28T13:00:25Z

+	if err != nil {
+		state := emptyExperimentState()
+		details = experimentDetailsFromState(workflowID, branchName, state)
+		if !strings.Contains(err.Error(), "not found locally") {


[/improve-codebase-architecture] Error classification via strings.Contains(err.Error(), "not found locally") is fragile — it couples this code to the exact wording of an error message from fetchLocalExperimentDetails. A rename or rewording in that function will silently break the fallback path here.

💡 Suggestion

Introduce a sentinel error in the fetchLocalExperimentDetails package and use errors.Is:

// In the fetch function's package: var ErrBranchNotFoundLocally = errors.New("branch not found locally") // Here: if err != nil { if !errors.Is(err, ErrBranchNotFoundLocally) { return nil, nil, err } details = experimentDetailsFromState(workflowID, branchName, emptyExperimentState()) }

@copilot please address this.

github-actions · 2026-06-28T13:00:25Z

+	cmd.Flags().Bool("apply", false, "Archive the current generation and update the workflow file in place")
+	cmd.Flags().String("winner", "", "Override the promoted winner variant")
+	cmd.Flags().String("archive-dir", defaultDarwinArchiveDir, "Directory where Darwin archives are written when --apply is used")
+	cmd.Flags().StringSlice("variant", nil, "Variant names for the next generation (repeatable)")


[/grill-with-docs] StringSlice silently splits comma-separated input, so --variant a,b is parsed as two variants (a and b). StringArray treats each flag invocation as a single atomic value, which matches the documented usage (--variant concise --variant detailed).

💡 Change

// Before cmd.Flags().StringSlice("variant", nil, "Variant names for the next generation (repeatable)") // After cmd.Flags().StringArray("variant", nil, "Variant names for the next generation (repeatable)")

Also update the flag getter in RunE:

// Before nextVariants, _ := cmd.Flags().GetStringSlice("variant") // After nextVariants, _ := cmd.Flags().GetStringArray("variant")

Variant names in this codebase use underscores (e.g. step_by_step) so a comma collision is unlikely but the semantic mismatch with the docs is still worth fixing.

@copilot please address this.

github-actions · 2026-06-28T13:00:25Z

+		return nil
+	}
+
+	if config.Apply {


[/diagnose] Side effects (writeDarwinArchive + applyDarwinPromotion) are committed to disk before the JSON output block at line 141. If json.MarshalIndent were to fail (unlikely but possible), the archive file and workflow frontmatter would be mutated with no output to the caller. Consider marshaling first and returning early on error before writing to disk.

💡 Suggested ordering

var jsonBytes []byte if config.JSONOutput { var err error jsonBytes, err = json.MarshalIndent(plan, "", " ") if err != nil { return fmt.Errorf("failed to marshal JSON: %w", err) } } if config.Apply { if err := writeDarwinArchive(plan.ArchivePath, archive); err != nil { return fmt.Errorf("failed to write Darwin archive: %w", err) } if err := applyDarwinPromotion(plan.WorkflowPath, plan.ExperimentName, plan.NextVariants); err != nil { return fmt.Errorf("failed to update workflow file: %w", err) } } if config.JSONOutput { fmt.Fprintln(os.Stdout, string(jsonBytes)) return nil }

@copilot please address this.

github-actions · 2026-06-28T13:00:25Z

+	Branch          string               `json:"branch"`
+	ArchivePath     string               `json:"archive_path,omitempty"`
+	ArchivedAt      string               `json:"archived_at,omitempty"`
+	Apply           bool                 `json:"apply"`


[/zoom-out] Apply bool is a command-layer flag, not result data — including it in DarwinPlan (the JSON output type) leaks the CLI invocation detail into the output schema. Consumers of --json output would see "apply": true/false, which is redundant (they already know whether they passed --apply) and may cause confusion.

💡 Suggestion

Remove Apply from DarwinPlan. printDarwinPlan already receives the full plan and the plan.Apply check at line 433 can instead use the config.Apply value passed into RunExperimentsDarwin:

func printDarwinPlan(plan *DarwinPlan, applied bool) { ... if applied { fmt.Fprintf(os.Stderr, " Archive: %s\n", plan.ArchivePath) } }

@copilot please address this.

github-actions · 2026-06-28T13:00:25Z

+}
+
+func darwinArchiveLocation(baseDir, workflowID, experimentName string) (string, string) {
+	timestamp := time.Now().UTC().Format("20060102T150405Z")


[/diagnose] The archive filename uses a second-granularity UTC timestamp. Two Darwin runs for the same workflow+experiment within the same wall-clock second will produce identical paths, and os.WriteFile (line 394) will silently overwrite the earlier archive — losing a generation's history.

💡 Options

Nanosecond precision: time.Now().UTC().Format("20060102T150405.000000000Z") — sortable and collision-resistant in practice.

Short random suffix: append fmt.Sprintf("-%04x", rand.Uint32()&0xffff) for a low-collision identifier.

Existence check: stat the path before writing and return an error if it already exists (os.O_CREATE|os.O_EXCL).

@copilot please address this.

github-actions · 2026-06-28T13:00:25Z

+	fmt.Fprintln(os.Stderr, console.FormatInfoMessage("Darwin mode: "+plan.WorkflowID+"/"+plan.ExperimentName))
+	fmt.Fprintf(os.Stderr, "  Branch:        %s\n", plan.Branch)
+	fmt.Fprintf(os.Stderr, "  Winner:        %s\n", plan.Winner)
+	fmt.Fprintf(os.Stderr, "  Recommendation:%s\n", plan.Analysis.Recommendation)


[/tdd] Minor display bug: " Recommendation:%s\n" is missing a trailing space, producing Recommendation:EXTEND instead of Recommendation: EXTEND. The other fields use " Branch: %s\n" (with alignment padding), so this one stands out.

💡 Fix

// Before fmt.Fprintf(os.Stderr, " Recommendation:%s\n", plan.Analysis.Recommendation) // After fmt.Fprintf(os.Stderr, " Recommendation: %s\n", plan.Analysis.Recommendation)

A snapshot test for printDarwinPlan output would catch regressions like this automatically.

@copilot please address this.

github-actions · 2026-06-28T13:00:25Z

+	assert.Equal(t, "third", ranking[1].Name)
+	assert.Equal(t, "control", ranking[2].Name)
+	assert.True(t, ranking[2].CurrentControl)
+}


[/tdd] selectDarwinWinner has no test for the error path: when --winner names a variant that is not in the declared list, the function returns an error. This branch is untested.

💡 Suggested test

func TestSelectDarwinWinnerRejectsUnknownOverride(t *testing.T) { ranking := []DarwinVariantScore{ {Name: "control", Count: 5}, {Name: "challenger", Count: 3}, } _, err := selectDarwinWinner("unknown", ranking) require.Error(t, err) assert.Contains(t, err.Error(), "not a declared variant") }

@copilot please address this.

github-actions · 2026-06-28T13:00:25Z

+	"github.com/stretchr/testify/require"
+)
+
+func TestRankDarwinVariants(t *testing.T) {


[/tdd] TestRankDarwinVariants covers only a case with mixed counts. The generation-zero scenario — where all variants have zero counts (no experiment runs yet) — should be explicitly tested to confirm that declaration order is the tiebreaker and no panics occur.

💡 Suggested test

func TestRankDarwinVariantsAllZero(t *testing.T) { ranking := rankDarwinVariants( []string{"alpha", "beta", "gamma"}, map[string]int{}, ) require.Len(t, ranking, 3) // With all zero counts, declaration order is preserved assert.Equal(t, "alpha", ranking[0].Name) assert.Equal(t, "beta", ranking[1].Name) assert.Equal(t, "gamma", ranking[2].Name) assert.True(t, ranking[0].CurrentControl) }

@copilot please address this.

github-actions · 2026-06-28T13:00:25Z

+	}
+
+	branchName := experimentsBranchPrefix + workflowID
+	details, err := fetchLocalExperimentDetails(branchName, workflowID)


[/diagnose] fetchLocalExperimentDetails is called without falling back to the remote ref when the local branch doesn't exist. The details.Experiments used for ranking at line 214 may be all-zero even when origin/<branchName> has real counts — while state (read separately at lines 209–212) may carry actual data. This means Darwin could silently rank all variants equally and promote by declaration order on machines that haven't fetched the experiments branch.

💡 Suggestion

Ensure fetchLocalExperimentDetails (or a wrapper) checks origin/<branchName> as a fallback, or build stats from state when details.Experiments is empty:

if len(details.Experiments) == 0 && state != nil { details = experimentDetailsFromState(workflowID, branchName, state) }

Also worth a test: run Darwin with a non-existent local branch but a populated state and confirm the ranking reflects the state's counts.

@copilot please address this.

Apply remaining changes

74b20b7

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Copilot AI assigned Copilot and pelikhan Jun 28, 2026

Copilot created this pull request from a session on behalf of pelikhan June 28, 2026 12:45 View session

pelikhan marked this pull request as ready for review June 28, 2026 12:48

Copilot AI review requested due to automatic review settings June 28, 2026 12:48

Copilot started reviewing on behalf of pelikhan June 28, 2026 12:48 View session

pelikhan closed this Jun 28, 2026

Copilot AI reviewed Jun 28, 2026

View reviewed changes

github-actions Bot requested changes Jun 28, 2026

View reviewed changes

github-actions Bot approved these changes Jun 28, 2026

View reviewed changes

github-actions Bot requested changes Jun 28, 2026

View reviewed changes

Uh oh!

Conversation

Copilot AI commented Jun 28, 2026

Uh oh!

github-actions Bot commented Jun 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Jun 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Jun 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Jun 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Review details

Uh oh!

github-actions Bot commented Jun 28, 2026

Design Decision Gate — ADR Required

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Darwin Mode review

Blocking

Non-blocking (recommended fixes)

Uh oh!

github-actions Bot Jun 28, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Jun 28, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Jun 28, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Jun 28, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Jun 28, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Jun 28, 2026

🧪 Test Quality Sentinel Report

Verdict

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Jun 28, 2026

Darwin Mode Review Summary 🧬

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Skills-Based Review 🧠

Blocking Issues

Medium Issues

Minor Issues

Positive Highlights

Uh oh!

github-actions Bot Jun 28, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Jun 28, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Jun 28, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Jun 28, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Jun 28, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Jun 28, 2026

Choose a reason for hiding this comment

github-actions Bot commented Jun 28, 2026 •

edited

Loading

github-actions Bot commented Jun 28, 2026 •

edited

Loading

github-actions Bot commented Jun 28, 2026 •

edited

Loading

github-actions Bot commented Jun 28, 2026 •

edited

Loading