Skip to content

Add experimental gh aw experiments darwin command#42081

Closed
Copilot wants to merge 1 commit into
mainfrom
copilot/add-darwin-mode-command
Closed

Add experimental gh aw experiments darwin command#42081
Copilot wants to merge 1 commit into
mainfrom
copilot/add-darwin-mode-command

Conversation

Copilot AI commented Jun 28, 2026

Copy link
Copy Markdown
Contributor

This adds Darwin Mode as an extension of the existing experiments feature: a new gh aw experiments darwin subcommand that evaluates an experiment’s current variants, archives the current generation, and promotes a winner into the next generation. It keeps the workflow rooted in experiments: rather than introducing a separate optimizer path.

  • CLI surface

    • Adds gh aw experiments darwin <workflow> <experiment>
    • Wires the subcommand into the hidden experiments command group
    • Supports:
      • --winner to override the promoted variant
      • --variant to define the next generation explicitly
      • --archive-dir to control archive location
      • --apply to persist archive + frontmatter changes
      • --json for machine-readable output
  • Darwin evaluation and promotion

    • Reuses existing experiment state from experiments/<workflow> branch history
    • Ranks declared variants by observed counts, preserving declaration order for ties
    • Selects a winner and promotes it to control by moving it to the front of the variant list
    • Falls back cleanly when experiment state is absent by treating the current declaration as generation zero
  • Generation archive

    • Archives each Darwin run as JSON under .github/experiments/archive/<workflow>/<experiment>/
    • Records:
      • workflow/experiment identity
      • promoted winner
      • previous and next variant sets
      • ranking snapshot
      • computed experiment analysis
      • raw experiment state
  • Workflow mutation

    • Updates the existing experiments: frontmatter in-place
    • Supports both bare-array and object-form experiments
    • Rewrites only the target experiment’s variant population, preserving other experiment metadata
  • Docs and coverage

    • Documents Darwin Mode in the experimental experiments docs
    • Adds focused tests for ranking, next-generation construction, archive writing, and frontmatter promotion

Example:

gh aw experiments darwin myworkflow style --winner concise --variant concise --variant detailed --variant step_by_step --apply

This promotes concise to the control slot, archives the current generation, and rewrites the style experiment’s next variant population in the workflow frontmatter.

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
@pelikhan pelikhan marked this pull request as ready for review June 28, 2026 12:48
Copilot AI review requested due to automatic review settings June 28, 2026 12:48
@github-actions

github-actions Bot commented Jun 28, 2026

Copy link
Copy Markdown
Contributor

⚠️ PR Code Quality Reviewer failed during code quality review.

@github-actions

github-actions Bot commented Jun 28, 2026

Copy link
Copy Markdown
Contributor

🧠 Matt Pocock Skills Reviewer has completed the skills-based review. ✅

@github-actions

github-actions Bot commented Jun 28, 2026

Copy link
Copy Markdown
Contributor

Test Quality Sentinel completed test quality analysis.

@github-actions

github-actions Bot commented Jun 28, 2026

Copy link
Copy Markdown
Contributor

Design Decision Gate 🏗️ completed the design decision gate check.

@pelikhan pelikhan closed this Jun 28, 2026

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an experimental “Darwin mode” workflow-evolution command under the hidden gh aw experiments command group, allowing users to evaluate observed experiment counts, select a winner, archive a generation snapshot, and optionally rewrite the workflow’s experiments: frontmatter to promote the winner and define the next population.

Changes:

  • Introduces gh aw experiments darwin <workflow> <experiment> with --winner, repeatable --variant, --archive-dir, --apply, and --json output.
  • Implements Darwin planning: ranking variants by observed counts (tie-breaking by declaration order), winner selection, next-generation construction, optional archive writing, and frontmatter mutation.
  • Adds focused unit tests for ranking/next-generation/frontmatter update/archive writing, plus docs describing Darwin mode usage.
Show a summary per file
File Description
pkg/cli/experiments_darwin_command.go Implements the Darwin subcommand, ranking/promotion logic, archive writing, and workflow frontmatter mutation.
pkg/cli/experiments_darwin_command_test.go Adds unit tests covering ranking, next-generation construction, archive writing, and frontmatter updates.
pkg/cli/experiments_command.go Wires the new darwin subcommand into the existing hidden experiments command group and updates help text.
pkg/cli/experiments_command_test.go Extends command wiring/flag tests to include the new darwin subcommand.
docs/src/content/docs/experimental/experiments.md Documents Darwin mode usage and behavior within the experiments feature.

Review details

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

  • Files reviewed: 5/5 changed files
  • Comments generated: 5
  • Review effort level: Low

Comment on lines +158 to +163
if strings.TrimSpace(config.Experiment) == "" {
return nil, nil, errors.New("experiment name is required")
}
if strings.TrimSpace(config.ArchiveDir) == "" {
config.ArchiveDir = defaultDarwinArchiveDir
}
Comment on lines +222 to +255
nextVariants := buildDarwinNextVariants(winner, expCfg.Variants, ranking, config.NextVariants)
if len(nextVariants) < 2 {
return nil, nil, fmt.Errorf("darwin generation for %q must contain at least 2 variants", config.Experiment)
}
archivePath, archivedAt := darwinArchiveLocation(config.ArchiveDir, workflowID, config.Experiment)
rankedRows := darwinRankingRows(ranking, expCfg.Variants[0], winner)

plan := &DarwinPlan{
WorkflowID: workflowID,
WorkflowPath: workflowPath,
ExperimentName: config.Experiment,
Branch: branchName,
ArchivePath: archivePath,
ArchivedAt: archivedAt,
Apply: config.Apply,
Winner: winner,
CurrentVariants: slices.Clone(expCfg.Variants),
NextVariants: nextVariants,
Ranking: rankedRows,
Analysis: analysis,
}
archive := &DarwinArchive{
WorkflowID: workflowID,
WorkflowPath: workflowPath,
ExperimentName: config.Experiment,
Branch: branchName,
ArchivedAt: archivedAt,
Winner: winner,
CurrentVariants: slices.Clone(expCfg.Variants),
NextVariants: slices.Clone(nextVariants),
Ranking: rankedRows,
Analysis: analysis,
State: state,
}
Comment on lines +196 to +213
branchName := experimentsBranchPrefix + workflowID
details, err := fetchLocalExperimentDetails(branchName, workflowID)
if err != nil {
state := emptyExperimentState()
details = experimentDetailsFromState(workflowID, branchName, state)
if !strings.Contains(err.Error(), "not found locally") {
return nil, nil, err
}
}
stateRef := "origin/" + branchName
if !gitRefExists(stateRef) {
stateRef = branchName
}
state := emptyExperimentState()
if gitRefExists(stateRef) {
state = readLocalExperimentState(stateRef)
}

Comment on lines +92 to +96
Example: ` ` + string(constants.CLIExtensionPrefix) + ` experiments darwin myworkflow style
` + string(constants.CLIExtensionPrefix) + ` experiments darwin myworkflow style --winner concise
` + string(constants.CLIExtensionPrefix) + ` experiments darwin myworkflow style --variant concise --variant detailed --apply
` + string(constants.CLIExtensionPrefix) + ` experiments darwin myworkflow style --archive-dir .github/experiments/archive --json`,
Args: cobra.ExactArgs(2),
fmt.Fprintln(os.Stderr, console.FormatInfoMessage("Darwin mode: "+plan.WorkflowID+"/"+plan.ExperimentName))
fmt.Fprintf(os.Stderr, " Branch: %s\n", plan.Branch)
fmt.Fprintf(os.Stderr, " Winner: %s\n", plan.Winner)
fmt.Fprintf(os.Stderr, " Recommendation:%s\n", plan.Analysis.Recommendation)
@github-actions

Copy link
Copy Markdown
Contributor

Design Decision Gate — ADR Required

This PR makes significant changes to core business logic (621 new lines in pkg/cli/) but does not have a linked Architecture Decision Record (ADR).

Draft ADR committed: docs/adr/42081-add-darwin-mode-evolutionary-experiment-promotion.md — review and complete it before merging.

This PR cannot merge until an ADR is linked in the PR body.

What to do next
  1. Review the draft ADR committed to your branch — it was generated from the PR diff
  2. Complete the missing sections — add context the AI could not infer, refine the decision rationale, and list real alternatives you considered
  3. Commit the finalized ADR to docs/adr/ on your branch
  4. Reference the ADR in this PR body by adding a line such as:

    ADR: ADR-42081: Add Darwin Mode for Evolutionary Experiment Generation Promotion

Once an ADR is linked in the PR body, this gate will re-run and verify the implementation matches the decision.

Why ADRs Matter

ADRs create a searchable, permanent record of why the codebase looks the way it does. Future contributors (and your future self) will thank you.

Michael Nygard ADR Format Reference

An ADR must contain these four sections to be considered complete:

  • Context — What is the problem? What forces are at play?
  • Decision — What did you decide? Why?
  • Alternatives Considered — What else could have been done?
  • Consequences — What are the trade-offs (positive and negative)?

All ADRs are stored in docs/adr/ as Markdown files numbered by PR number (e.g., 42081-add-darwin-mode-evolutionary-experiment-promotion.md for PR #42081).

🏗️ ADR gate enforced by Design Decision Gate 🏗️ · 61.9 AIC · ⌖ 12.6 AIC · ⊞ 8.4K ·

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Darwin Mode review

The overall design is solid — Darwin Mode fits naturally into the existing experiments: feature, the ranking/promotion logic is clear, and the test coverage covers the key paths well. There are a few correctness and reliability issues that should be addressed before merge.

Blocking

# File Issue
1 experiments_darwin_command.go:129 buildDarwinPlan errors are swallowed (return nil) — CLI exits 0 on failure

Non-blocking (recommended fixes)

# File Issue
2 experiments_darwin_command.go:427 printDarwinPlan writes all output to stderr; human-readable output should go to stdout
3 experiments_darwin_command.go:119 StringSlice splits on commas; StringArray is safer for variant names
4 experiments_darwin_command.go:136 No rollback if archive succeeds but frontmatter update fails
5 experiments_darwin_command.go:201 Fragile error-substring sentinel; prefer a typed sentinel error

See inline comments for details and suggested fixes.

🧵 Reviewed using Impeccable skills by Impeccable Skills Reviewer · 59.1 AIC · ⌖ 6.86 AIC · ⊞ 4.8K

plan, archive, err := buildDarwinPlan(config)
if err != nil {
fmt.Fprintln(os.Stderr, console.FormatErrorMessage(err.Error()))
return nil

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Error silently swallowed — exit code will be 0 on failure.

Returning nil after printing the error means the CLI exits with code 0 even when buildDarwinPlan fails. Callers (scripts, CI jobs) cannot detect the failure. Change to return err so cobra propagates the non-zero exit code:

if err != nil {
    return err
}

If the intent is to suppress the stack trace (showing only the formatted message), consider wrapping as a display error type handled by the root command, consistent with other subcommands in this package.

@copilot please address this.

}

func printDarwinPlan(plan *DarwinPlan) {
fmt.Fprintln(os.Stderr, console.FormatInfoMessage("Darwin mode: "+plan.WorkflowID+"/"+plan.ExperimentName))

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Human-readable output written to stderr instead of stdout.

All plan output in printDarwinPlan goes to os.Stderr. Non-error output should go to os.Stdout so users can pipe or redirect results. Sending informational output to stderr is reserved for errors and diagnostics:

// current — informational output on stderr prevents piping
fmt.Fprintln(os.Stderr, console.FormatInfoMessage(...))

// fix — write human-readable plan output to stdout
fmt.Fprintln(os.Stdout, console.FormatInfoMessage(...))

Note that the JSON path already correctly uses os.Stdout (line 145). The text path should be consistent.

@copilot please address this.

cmd.Flags().Bool("apply", false, "Archive the current generation and update the workflow file in place")
cmd.Flags().String("winner", "", "Override the promoted winner variant")
cmd.Flags().String("archive-dir", defaultDarwinArchiveDir, "Directory where Darwin archives are written when --apply is used")
cmd.Flags().StringSlice("variant", nil, "Variant names for the next generation (repeatable)")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

StringSlice splits on commas — use StringArray instead.

cobra.StringSlice treats comma-separated values as multiple elements: --variant a,b becomes ["a", "b"] rather than ["a,b"]. This can surprise users and may silently misbehave if a future variant name contains a comma. StringArray treats each --variant flag as a single value:

// current — splits on commas unexpectedly
cmd.Flags().StringSlice("variant", nil, "...")

// fix — one flag = one value
cmd.Flags().StringArray("variant", nil, "...")

The GetStringSlice call on line 102 should also change to GetStringArray.

@copilot please address this.

if err := writeDarwinArchive(plan.ArchivePath, archive); err != nil {
return fmt.Errorf("failed to write Darwin archive: %w", err)
}
if err := applyDarwinPromotion(plan.WorkflowPath, plan.ExperimentName, plan.NextVariants); err != nil {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partial failure leaves archive written but workflow not updated (no rollback).

If writeDarwinArchive succeeds but applyDarwinPromotion fails, the archive JSON is persisted on disk while the workflow frontmatter is not updated — leaving the state inconsistent with no way to know whether to re-run or not.

Options to consider:

  1. Write the archive last (after frontmatter is updated), so a failure before archive creation leaves the workflow clean.
  2. Write the archive to a temp path, apply the frontmatter, then rename the temp file atomically.
  3. Document this limitation with a // TODO noting that partial failure leaves an orphaned archive.

@copilot please address this.

if err != nil {
state := emptyExperimentState()
details = experimentDetailsFromState(workflowID, branchName, state)
if !strings.Contains(err.Error(), "not found locally") {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fragile error-message substring check as a sentinel.

Using strings.Contains(err.Error(), "not found locally") to distinguish "branch not found" from a real error couples this code to the exact phrasing returned by fetchLocalExperimentDetails. If that message ever changes (even capitalisation), this silently breaks by propagating the error instead of falling back gracefully.

Prefer a typed sentinel error (e.g., errors.Is(err, ErrBranchNotFound)) or a boolean return value from fetchLocalExperimentDetails to express the "not found" case explicitly:

// in fetchLocalExperimentDetails:
var ErrBranchNotFound = errors.New("experiments branch not found locally")

// here:
if !errors.Is(err, ErrBranchNotFound) {
    return nil, nil, err
}

@copilot please address this.

@github-actions

Copy link
Copy Markdown
Contributor

🧪 Test Quality Sentinel Report

⚠️ Test Quality Score: 70/100 — Acceptable

Analyzed 10 test(s) across 2 changed test files: 8 design, 2 implementation, 0 hard coding-guideline violations; test inflation detected on experiments_command_test.go.

📊 Metrics & Test Classification (10 tests analyzed)
Metric Value
New/modified tests analyzed 10
✅ Design tests (behavioral contracts) 8 (80%)
⚠️ Implementation tests (low value) 2 (20%)
Tests with error/edge cases 6 (60%)
Duplicate test clusters 0
Test inflation detected YES — experiments_command_test.go (20 test additions / 4 prod additions = 5:1)
🚨 Coding-guideline violations 0 hard violations; ⚠️ 7 soft — missing assertion messages in experiments_darwin_command_test.go
Test File Classification Issues Detected
TestRankDarwinVariants pkg/cli/experiments_darwin_command_test.go:17 ✅ Design Missing assertion messages on all asserts
TestBuildDarwinNextVariantsDefaultsToWinnerFirst pkg/cli/experiments_darwin_command_test.go:30 ✅ Design Missing assertion message
TestBuildDarwinNextVariantsUsesRequestedPopulation pkg/cli/experiments_darwin_command_test.go:41 ✅ Design Missing assertion message
TestFindOrBuildExperimentStatsAddsMissingDeclaredVariants pkg/cli/experiments_darwin_command_test.go:48 ✅ Design Missing assertion messages
TestApplyDarwinPromotionUpdatesBareArrayExperiment pkg/cli/experiments_darwin_command_test.go:62 ✅ Design
TestApplyDarwinPromotionUpdatesObjectExperiment pkg/cli/experiments_darwin_command_test.go:86 ✅ Design
TestWriteDarwinArchive pkg/cli/experiments_darwin_command_test.go:113 ✅ Design Happy path only — no write-failure error coverage
TestExperimentsDarwinSubcommandFlags pkg/cli/experiments_command_test.go:340 ⚠️ Implementation Checks flag registration, not behavioral output
TestExperimentsDarwinRequiresArgs pkg/cli/experiments_command_test.go:351 ✅ Design
TestNewExperimentsCommand (+darwin line) pkg/cli/experiments_command_test.go:313 ⚠️ Implementation Checks subcommand wiring, not behavioral output

Go: 10 (*_test.go); JavaScript: 0. Other languages: none detected.

⚠️ Flagged Tests — Soft Issues (9 issue(s))

TestRankDarwinVariants (pkg/cli/experiments_darwin_command_test.go:17) — ⚠️ Missing assertion messages: require.Len(t, ranking, 3), assert.Equal(t, "challenger", ranking[0].Name), assert.Equal(t, "third", ranking[1].Name), assert.Equal(t, "control", ranking[2].Name), assert.True(t, ranking[2].CurrentControl) all lack context strings. Suggested fix: add a descriptive message to each, e.g. assert.Equal(t, "challenger", ranking[0].Name, "highest-count variant should rank first").

TestBuildDarwinNextVariantsDefaultsToWinnerFirst (pkg/cli/experiments_darwin_command_test.go:30) — ⚠️ assert.Equal(t, []string{"challenger", "control", "third"}, next) lacks message. Suggested fix: assert.Equal(t, ..., "winner should be promoted to position 0").

TestBuildDarwinNextVariantsUsesRequestedPopulation (pkg/cli/experiments_darwin_command_test.go:41) — ⚠️ assert.Equal(t, []string{"control", "mutant-a", "mutant-b"}, next) lacks message.

TestFindOrBuildExperimentStatsAddsMissingDeclaredVariants (pkg/cli/experiments_darwin_command_test.go:48) — ⚠️ assert.Equal(t, 4, stats.Total), assert.Equal(t, 4, stats.Variants["control"]), assert.Equal(t, 0, stats.Variants["challenger"]) all lack messages.

TestWriteDarwinArchive (pkg/cli/experiments_darwin_command_test.go:113) — ⚠️ Happy path only. Does not test error scenarios (e.g. non-writable archive directory, corrupt JSON). Suggested fix: add a sub-case using a read-only temp directory to assert an error is returned.

TestExperimentsDarwinSubcommandFlags (pkg/cli/experiments_command_test.go:340) — ⚠️ Implementation test: only checks that flags are registered (cmd.Flag("json") != nil), not that the command behaves correctly when flags are passed. This test would survive a refactor that renames flags without changing behavior. Low but non-zero value as a smoke-test of the CLI surface.

TestNewExperimentsCommand (darwin line) (pkg/cli/experiments_command_test.go:313) — ⚠️ Implementation test: verifies subcommand registration, not observable output. Acceptable as a registration sanity check.

Test Inflation — experiments_command_test.go: 20 test lines added vs 4 production lines added (5:1 ratio, threshold 2:1). The ratio is high because the production wiring is minimal (one line to register the darwin subcommand) while the tests comprehensively cover the new CLI surface. Not a quality problem in practice, but inflates the metric.

Verdict

Check passed. 20% implementation tests (threshold: 30%). All new Go test files have the required //go:build tag; no mock libraries detected. Core darwin logic is well-covered by behavioral tests; the flagged soft issues (missing assertion messages, happy-path-only archive test) are recommended improvements but do not block merge.

References:

🧪 Test quality analysis by Test Quality Sentinel · 83.6 AIC · ⌖ 18 AIC · ⊞ 8.2K ·

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Test Quality Sentinel: 70/100. Test quality is acceptable — 20% of new tests are implementation tests (threshold: 30%).

@github-actions

Copy link
Copy Markdown
Contributor

Darwin Mode Review Summary 🧬

REQUEST_CHANGES — Two correctness issues need addressing before merge; everything else is medium or minor.

🔴 Blocking (fix before merge)
# File Line Issue
1 experiments_darwin_command.go 129 return nil after buildDarwinPlan error → CLI exits 0 on failure
2 experiments_darwin_command.go 197 details.Experiments is empty when branch only exists on origin → ranking silently all-zero
🟡 Medium (should fix)
# File Line Issue
3 experiments_darwin_command.go 201 strings.Contains(err.Error(), ...) — use typed sentinel error
4 experiments_darwin_command.go 119 StringSliceStringArray for --variant
5 experiments_darwin_command.go 132 Side effects applied before JSON marshal — invert order
6 experiments_darwin_command.go 51 Apply bool belongs in command layer, not DarwinPlan output struct
🟢 Minor (nice to have)
  • Line 366: second-granularity timestamp allows archive collision within the same second
  • Line 430: Recommendation: missing trailing space in output
  • Test file: add TestSelectDarwinWinnerRejectsUnknownOverride and TestRankDarwinVariantsAllZero

Overall the architecture is clean — Darwin Mode integrates nicely into the existing experiments: surface and the separation between planning and side effects is a solid foundation. The --apply dry-run default is a great UX choice.

@copilot please address the review comments above.

🧠 Reviewed using Matt Pocock's skills by Matt Pocock Skills Reviewer · 128.7 AIC · ⌖ 10.6 AIC · ⊞ 6.6K ·

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Skills-Based Review 🧠

Applied /tdd, /diagnose, /zoom-out, /grill-with-docs, and /improve-codebase-architecture — requesting changes on a correctness bug and a few medium-priority issues.

📋 Key Themes & Highlights

Blocking Issues

  • Exit-code bug (line 129): RunExperimentsDarwin returns nil on buildDarwinPlan failure — the CLI exits 0 even when the workflow or experiment is not found. Should be return err.
  • Ranking silently zeroed (line 197): when the experiments branch only exists on origin, details.Experiments is empty and all variants rank equally, potentially promoting the wrong winner without warning.

Medium Issues

  • Fragile error matching (line 201): strings.Contains(err.Error(), "not found locally") should use a typed sentinel error.
  • StringSlice vs StringArray (line 119): comma-separated --variant a,b is split into two variants unexpectedly; StringArray matches the documented one-flag-per-value usage.
  • Side-effect ordering (line 132): archive + frontmatter mutations happen before JSON marshaling; marshal first, then apply.
  • Apply in output struct (line 51): CLI flag leaked into DarwinPlan's JSON schema.

Minor Issues

  • Archive timestamp collision (line 366)
  • Missing space in Recommendation: output (line 430)
  • Two missing test cases: selectDarwinWinner error path, and all-zero counts (generation zero)

Positive Highlights

  • ✅ Clean integration into the existing experiments: feature — no new top-level commands or parallel optimizer path
  • ✅ Covers both bare-array and object-form experiment frontmatter
  • ✅ Good separation of buildDarwinPlan (pure computation) from side effects (writeDarwinArchive, applyDarwinPromotion)
  • ✅ Comprehensive archive schema with full ranking snapshot and raw state — useful for auditing
  • ✅ Documentation is clear with concrete examples and an important note about new variant names

🧠 Reviewed using Matt Pocock's skills by Matt Pocock Skills Reviewer · 128.7 AIC · ⌖ 10.6 AIC · ⊞ 6.6K

plan, archive, err := buildDarwinPlan(config)
if err != nil {
fmt.Fprintln(os.Stderr, console.FormatErrorMessage(err.Error()))
return nil

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/tdd] RunExperimentsDarwin returns nil on buildDarwinPlan failure, so the CLI always exits with code 0 even when the workflow or experiment is not found. This masks errors from callers and CI scripting.

💡 Fix

Change return nil to return err:

if err != nil {
    fmt.Fprintln(os.Stderr, console.FormatErrorMessage(err.Error()))
    return err
}

Note that writeDarwinArchive and applyDarwinPromotion errors (lines 134–138) are already propagated correctly — this brings buildDarwinPlan errors into line.

@copilot please address this.

if err != nil {
state := emptyExperimentState()
details = experimentDetailsFromState(workflowID, branchName, state)
if !strings.Contains(err.Error(), "not found locally") {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/improve-codebase-architecture] Error classification via strings.Contains(err.Error(), "not found locally") is fragile — it couples this code to the exact wording of an error message from fetchLocalExperimentDetails. A rename or rewording in that function will silently break the fallback path here.

💡 Suggestion

Introduce a sentinel error in the fetchLocalExperimentDetails package and use errors.Is:

// In the fetch function's package:
var ErrBranchNotFoundLocally = errors.New("branch not found locally")

// Here:
if err != nil {
    if !errors.Is(err, ErrBranchNotFoundLocally) {
        return nil, nil, err
    }
    details = experimentDetailsFromState(workflowID, branchName, emptyExperimentState())
}

@copilot please address this.

cmd.Flags().Bool("apply", false, "Archive the current generation and update the workflow file in place")
cmd.Flags().String("winner", "", "Override the promoted winner variant")
cmd.Flags().String("archive-dir", defaultDarwinArchiveDir, "Directory where Darwin archives are written when --apply is used")
cmd.Flags().StringSlice("variant", nil, "Variant names for the next generation (repeatable)")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/grill-with-docs] StringSlice silently splits comma-separated input, so --variant a,b is parsed as two variants (a and b). StringArray treats each flag invocation as a single atomic value, which matches the documented usage (--variant concise --variant detailed).

💡 Change
// Before
cmd.Flags().StringSlice("variant", nil, "Variant names for the next generation (repeatable)")

// After
cmd.Flags().StringArray("variant", nil, "Variant names for the next generation (repeatable)")

Also update the flag getter in RunE:

// Before
nextVariants, _ := cmd.Flags().GetStringSlice("variant")
// After
nextVariants, _ := cmd.Flags().GetStringArray("variant")

Variant names in this codebase use underscores (e.g. step_by_step) so a comma collision is unlikely but the semantic mismatch with the docs is still worth fixing.

@copilot please address this.

return nil
}

if config.Apply {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/diagnose] Side effects (writeDarwinArchive + applyDarwinPromotion) are committed to disk before the JSON output block at line 141. If json.MarshalIndent were to fail (unlikely but possible), the archive file and workflow frontmatter would be mutated with no output to the caller. Consider marshaling first and returning early on error before writing to disk.

💡 Suggested ordering
var jsonBytes []byte
if config.JSONOutput {
    var err error
    jsonBytes, err = json.MarshalIndent(plan, "", "  ")
    if err != nil {
        return fmt.Errorf("failed to marshal JSON: %w", err)
    }
}

if config.Apply {
    if err := writeDarwinArchive(plan.ArchivePath, archive); err != nil {
        return fmt.Errorf("failed to write Darwin archive: %w", err)
    }
    if err := applyDarwinPromotion(plan.WorkflowPath, plan.ExperimentName, plan.NextVariants); err != nil {
        return fmt.Errorf("failed to update workflow file: %w", err)
    }
}

if config.JSONOutput {
    fmt.Fprintln(os.Stdout, string(jsonBytes))
    return nil
}

@copilot please address this.

Branch string `json:"branch"`
ArchivePath string `json:"archive_path,omitempty"`
ArchivedAt string `json:"archived_at,omitempty"`
Apply bool `json:"apply"`

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/zoom-out] Apply bool is a command-layer flag, not result data — including it in DarwinPlan (the JSON output type) leaks the CLI invocation detail into the output schema. Consumers of --json output would see "apply": true/false, which is redundant (they already know whether they passed --apply) and may cause confusion.

💡 Suggestion

Remove Apply from DarwinPlan. printDarwinPlan already receives the full plan and the plan.Apply check at line 433 can instead use the config.Apply value passed into RunExperimentsDarwin:

func printDarwinPlan(plan *DarwinPlan, applied bool) {
    ...
    if applied {
        fmt.Fprintf(os.Stderr, "  Archive: %s\n", plan.ArchivePath)
    }
}

@copilot please address this.

}

func darwinArchiveLocation(baseDir, workflowID, experimentName string) (string, string) {
timestamp := time.Now().UTC().Format("20060102T150405Z")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/diagnose] The archive filename uses a second-granularity UTC timestamp. Two Darwin runs for the same workflow+experiment within the same wall-clock second will produce identical paths, and os.WriteFile (line 394) will silently overwrite the earlier archive — losing a generation's history.

💡 Options
  1. Nanosecond precision: time.Now().UTC().Format("20060102T150405.000000000Z") — sortable and collision-resistant in practice.
  2. Short random suffix: append fmt.Sprintf("-%04x", rand.Uint32()&0xffff) for a low-collision identifier.
  3. Existence check: stat the path before writing and return an error if it already exists (os.O_CREATE|os.O_EXCL).

@copilot please address this.

fmt.Fprintln(os.Stderr, console.FormatInfoMessage("Darwin mode: "+plan.WorkflowID+"/"+plan.ExperimentName))
fmt.Fprintf(os.Stderr, " Branch: %s\n", plan.Branch)
fmt.Fprintf(os.Stderr, " Winner: %s\n", plan.Winner)
fmt.Fprintf(os.Stderr, " Recommendation:%s\n", plan.Analysis.Recommendation)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/tdd] Minor display bug: " Recommendation:%s\n" is missing a trailing space, producing Recommendation:EXTEND instead of Recommendation: EXTEND. The other fields use " Branch: %s\n" (with alignment padding), so this one stands out.

💡 Fix
// Before
fmt.Fprintf(os.Stderr, "  Recommendation:%s\n", plan.Analysis.Recommendation)

// After
fmt.Fprintf(os.Stderr, "  Recommendation: %s\n", plan.Analysis.Recommendation)

A snapshot test for printDarwinPlan output would catch regressions like this automatically.

@copilot please address this.

assert.Equal(t, "third", ranking[1].Name)
assert.Equal(t, "control", ranking[2].Name)
assert.True(t, ranking[2].CurrentControl)
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/tdd] selectDarwinWinner has no test for the error path: when --winner names a variant that is not in the declared list, the function returns an error. This branch is untested.

💡 Suggested test
func TestSelectDarwinWinnerRejectsUnknownOverride(t *testing.T) {
    ranking := []DarwinVariantScore{
        {Name: "control", Count: 5},
        {Name: "challenger", Count: 3},
    }
    _, err := selectDarwinWinner("unknown", ranking)
    require.Error(t, err)
    assert.Contains(t, err.Error(), "not a declared variant")
}

@copilot please address this.

"github.com/stretchr/testify/require"
)

func TestRankDarwinVariants(t *testing.T) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/tdd] TestRankDarwinVariants covers only a case with mixed counts. The generation-zero scenario — where all variants have zero counts (no experiment runs yet) — should be explicitly tested to confirm that declaration order is the tiebreaker and no panics occur.

💡 Suggested test
func TestRankDarwinVariantsAllZero(t *testing.T) {
    ranking := rankDarwinVariants(
        []string{"alpha", "beta", "gamma"},
        map[string]int{},
    )
    require.Len(t, ranking, 3)
    // With all zero counts, declaration order is preserved
    assert.Equal(t, "alpha", ranking[0].Name)
    assert.Equal(t, "beta", ranking[1].Name)
    assert.Equal(t, "gamma", ranking[2].Name)
    assert.True(t, ranking[0].CurrentControl)
}

@copilot please address this.

}

branchName := experimentsBranchPrefix + workflowID
details, err := fetchLocalExperimentDetails(branchName, workflowID)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/diagnose] fetchLocalExperimentDetails is called without falling back to the remote ref when the local branch doesn't exist. The details.Experiments used for ranking at line 214 may be all-zero even when origin/<branchName> has real counts — while state (read separately at lines 209–212) may carry actual data. This means Darwin could silently rank all variants equally and promote by declaration order on machines that haven't fetched the experiments branch.

💡 Suggestion

Ensure fetchLocalExperimentDetails (or a wrapper) checks origin/<branchName> as a fallback, or build stats from state when details.Experiments is empty:

if len(details.Experiments) == 0 && state != nil {
    details = experimentDetailsFromState(workflowID, branchName, state)
}

Also worth a test: run Darwin with a non-existent local branch but a populated state and confirm the ranking reflects the state's counts.

@copilot please address this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants