refactor: split large monolithic files identified by semantic clustering analysis by Copilot · Pull Request #4813 · github/gh-aw-mcpg

Copilot · 2026-04-29T16:34:57Z

Two files flagged by automated semantic function clustering (wasm.go at 1,225 lines, guard_policy.go at 797 lines) each mixed 4 distinct concerns. Split both along their natural seams.

`internal/guard/wasm.go` → 3 files

File	Responsibility
`wasm.go`	Struct, lifecycle, host functions, public dispatch (LabelAgent/LabelResource/LabelResponse/Close)
`wasm_payload.go`	Policy payload building and validation before WASM call
`wasm_parse.go`	Response parsing, WASM memory management (alloc/dealloc/call helpers)

The split mirrors the existing test file naming (wasm_test.go, wasm_parse_test.go, wasm_response_parse_test.go) — the logical separation was already in authors' heads.

`internal/config/guard_policy.go` → 3 files

File	Responsibility
`guard_policy.go`	Struct definitions, JSON marshal/unmarshal, `IsWriteSinkPolicy`
`guard_policy_validation.go`	`Validate`, `Normalize`, and character/token helpers
`guard_policy_parse.go`	`Parse`, `Build`, `NormalizeScopeKind`

Not addressed: `getTracer()` duplication

The identical one-liner on UnifiedServer and proxyHandler was left as-is — extracting a shared tracerHolder would require a new cross-package dependency for two trivial methods, not worth the structural cost.

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

example.com
- Triggering command: /tmp/go-build3049887614/b509/launcher.test /tmp/go-build3049887614/b509/launcher.test -test.testlogfile=/tmp/go-build3049887614/b509/testlog.txt -test.paniconexit0 -test.timeout=10m0s -test.v=true 2/compile.go 2/equals.go x_amd64/compile (dns block)
- Triggering command: /tmp/go-build3765283910/b509/launcher.test /tmp/go-build3765283910/b509/launcher.test -test.testlogfile=/tmp/go-build3765283910/b509/testlog.txt -test.paniconexit0 -test.timeout=10m0s -test.v=true ache/go/1.25.9/x64/src/os/user 64/src/net/http/httputil/dump.go .cfg --global user.name x_amd64/cgo ache/go/1.25.9/x64/pkg/tool/linux_amd64/compile -o 9887614/b468/_pkg_.a -trimpath .cfg -p g/grpc/internal/--version -lang=go1.24 ache/go/1.25.9/x64/pkg/tool/linux_amd64/vet (dns block)
- Triggering command: /tmp/go-build27267908/b513/launcher.test /tmp/go-build27267908/b513/launcher.test -test.testlogfile=/tmp/go-build27267908/b513/testlog.txt -test.paniconexit0 -test.timeout=10m0s estl�� --version x_amd64/asm ache/go/1.25.9/x64/pkg/tool/linux_amd64/vet rnal/errors/errobash .cfg 64/pkg/tool/linu--noprofile ache/go/1.25.9/x64/pkg/tool/linu-buildtags (dns block)
invalid-host-that-does-not-exist-12345.com
- Triggering command: /tmp/go-build3049887614/b491/config.test /tmp/go-build3049887614/b491/config.test -test.testlogfile=/tmp/go-build3049887614/b491/testlog.txt -test.paniconexit0 -test.timeout=10m0s -test.v=true ktype/networktype.go 6UTz/5ecS_4XL9_qgoogle.golang.org/grpc/credentials x_amd64/asm (dns block)
- Triggering command: /tmp/go-build3765283910/b491/config.test /tmp/go-build3765283910/b491/config.test -test.testlogfile=/tmp/go-build3765283910/b491/testlog.txt -test.paniconexit0 -test.timeout=10m0s -test.v=true ACCEPT /tmp/go-build3049887614/b100/vet.cfg 64/pkg/tool/linux_amd64/vet 9887614/b346/ --global ctor 64/pkg/tool/linux_amd64/vet -I t/example_test.go t/gateway_integration_test.go 64/pkg/tool/linux_amd64/compile --gdwarf-5 --64 -o 64/pkg/tool/linusecurity (dns block)
- Triggering command: /tmp/go-build27267908/b495/config.test /tmp/go-build27267908/b495/config.test -test.testlogfile=/tmp/go-build27267908/b495/testlog.txt -test.paniconexit0 -test.timeout=10m0s .cfg�� 9887614/b552/_pkg_.a -tests x_amd64/vet 64/src/runtime/c/opt/hostedtoolcache/go/1.25.9/x64/pkg/tool/linux_amd64/vet (dns block)
nonexistent.local
- Triggering command: /tmp/go-build3049887614/b509/launcher.test /tmp/go-build3049887614/b509/launcher.test -test.testlogfile=/tmp/go-build3049887614/b509/testlog.txt -test.paniconexit0 -test.timeout=10m0s -test.v=true 2/compile.go 2/equals.go x_amd64/compile (dns block)
- Triggering command: /tmp/go-build3765283910/b509/launcher.test /tmp/go-build3765283910/b509/launcher.test -test.testlogfile=/tmp/go-build3765283910/b509/testlog.txt -test.paniconexit0 -test.timeout=10m0s -test.v=true ache/go/1.25.9/x64/src/os/user 64/src/net/http/httputil/dump.go .cfg --global user.name x_amd64/cgo ache/go/1.25.9/x64/pkg/tool/linux_amd64/compile -o 9887614/b468/_pkg_.a -trimpath .cfg -p g/grpc/internal/--version -lang=go1.24 ache/go/1.25.9/x64/pkg/tool/linux_amd64/vet (dns block)
- Triggering command: /tmp/go-build27267908/b513/launcher.test /tmp/go-build27267908/b513/launcher.test -test.testlogfile=/tmp/go-build27267908/b513/testlog.txt -test.paniconexit0 -test.timeout=10m0s estl�� --version x_amd64/asm ache/go/1.25.9/x64/pkg/tool/linux_amd64/vet rnal/errors/errobash .cfg 64/pkg/tool/linu--noprofile ache/go/1.25.9/x64/pkg/tool/linu-buildtags (dns block)
slow.example.com
- Triggering command: /tmp/go-build3049887614/b509/launcher.test /tmp/go-build3049887614/b509/launcher.test -test.testlogfile=/tmp/go-build3049887614/b509/testlog.txt -test.paniconexit0 -test.timeout=10m0s -test.v=true 2/compile.go 2/equals.go x_amd64/compile (dns block)
- Triggering command: /tmp/go-build3765283910/b509/launcher.test /tmp/go-build3765283910/b509/launcher.test -test.testlogfile=/tmp/go-build3765283910/b509/testlog.txt -test.paniconexit0 -test.timeout=10m0s -test.v=true ache/go/1.25.9/x64/src/os/user 64/src/net/http/httputil/dump.go .cfg --global user.name x_amd64/cgo ache/go/1.25.9/x64/pkg/tool/linux_amd64/compile -o 9887614/b468/_pkg_.a -trimpath .cfg -p g/grpc/internal/--version -lang=go1.24 ache/go/1.25.9/x64/pkg/tool/linux_amd64/vet (dns block)
- Triggering command: /tmp/go-build27267908/b513/launcher.test /tmp/go-build27267908/b513/launcher.test -test.testlogfile=/tmp/go-build27267908/b513/testlog.txt -test.paniconexit0 -test.timeout=10m0s estl�� --version x_amd64/asm ache/go/1.25.9/x64/pkg/tool/linux_amd64/vet rnal/errors/errobash .cfg 64/pkg/tool/linu--noprofile ache/go/1.25.9/x64/pkg/tool/linu-buildtags (dns block)
this-host-does-not-exist-12345.com
- Triggering command: /tmp/go-build3049887614/b518/mcp.test /tmp/go-build3049887614/b518/mcp.test -test.testlogfile=/tmp/go-build3049887614/b518/testlog.txt -test.paniconexit0 -test.timeout=10m0s -test.v=true rnal/encoding/ta-errorsas .cfg x_amd64/compile -p path/filepath -lang=go1.25 x_amd64/compile (dns block)
- Triggering command: /tmp/go-build3765283910/b518/mcp.test /tmp/go-build3765283910/b518/mcp.test -test.testlogfile=/tmp/go-build3765283910/b518/testlog.txt -test.paniconexit0 -test.timeout=10m0s -test.v=true olang.org/grpc@v1.80.0/health/gr-c=4 olang.org/grpc@v1.80.0/health/gr-nolocalimports x_amd64/compile -p g/protobuf/runti/usr/bin/runc -lang=go1.25 x_amd64/compile -W 3sHF/gPXvu-EnuLHrnQI63sHF pkg/mod/go.opentelemetry.io/otel@v1.43.0/internal/global/instrum-ifaceassert x_amd64/vet . --gdwarf2 --64 x_amd64/vet (dns block)
- Triggering command: /tmp/go-build27267908/b522/mcp.test /tmp/go-build27267908/b522/mcp.test -test.testlogfile=/tmp/go-build27267908/b522/testlog.txt -test.paniconexit0 -test.timeout=10m0s (dns block)

If you need me to access, download, or install something from one of these locations, you can either:

Configure Actions setup steps to set up my environment, which run before the firewall is enabled
Add the appropriate URLs or hosts to the custom allowlist in this repository's Copilot coding agent settings (admins only)

- Split internal/guard/wasm.go (1225 lines) into: - wasm.go: core struct, lifecycle, host functions, public operations - wasm_payload.go: policy payload building functions - wasm_parse.go: response parsing utilities and WASM call helpers - Split internal/config/guard_policy.go (797 lines) into: - guard_policy.go: struct definitions and JSON serialization - guard_policy_validation.go: Validate*, Normalize* functions and helpers - guard_policy_parse.go: Parse*, Build*, NormalizeScopeKind functions All tests pass (make agent-finished)." Agent-Logs-Url: https://github.com/github/gh-aw-mcpg/sessions/f6540a78-8492-4b25-8424-d316c017d230 Co-authored-by: lpcox <15877973+lpcox@users.noreply.github.com>

Copilot

Pull request overview

Refactors two previously large files by splitting them into smaller, responsibility-focused units to improve navigability and isolate payload-building/validation and parsing concerns.

Changes:

Split internal/guard/wasm.go into wasm.go (guard lifecycle/dispatch) + wasm_payload.go (label_agent payload building/validation) + wasm_parse.go (response parsing + WASM call/memory helpers).
Split internal/config/guard_policy.go into guard_policy.go (types + JSON marshal/unmarshal) + guard_policy_validation.go (Validate*/Normalize*) + guard_policy_parse.go (Parse*/Build* helpers).
Kept behavior largely consistent while relocating logic to better-matched files.

Show a summary per file

File	Description
internal/guard/wasm.go	Removes payload/parse/call helper implementations from the monolith, leaving the guard struct, lifecycle, and public dispatch methods.
internal/guard/wasm_payload.go	Introduces policy normalization + strict label_agent payload validation + payload augmentation (trusted bots/users).
internal/guard/wasm_parse.go	Introduces response parsing helpers and WASM call/memory management helpers (alloc/dealloc, retries, trap detection).
internal/config/guard_policy.go	Keeps core policy types and JSON marshal/unmarshal behavior (including legacy key support).
internal/config/guard_policy_validation.go	Moves allow-only/write-sink validation + normalization and scope/token helpers into a dedicated file.
internal/config/guard_policy_parse.go	Moves policy parsing/building helpers and scope_kind normalization into a dedicated file.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comments suppressed due to low confidence (2)

internal/guard/wasm_payload.go:148

The comment says approval-labels must be a non-empty array when present, but an empty array currently passes validation. Either add a len(arr)==0 check or update the comment to reflect that empty is allowed.

	// Validate approval-labels if present: must be a non-empty array of non-empty strings.
	if approvalLabelsRaw, ok := allowOnly["approval-labels"]; ok {
		arr, ok := approvalLabelsRaw.([]interface{})
		if !ok {
			return nil, fmt.Errorf("invalid approval-labels value: expected array of strings")
		}
		for _, entry := range arr {
			if s, ok := entry.(string); !ok || strings.TrimSpace(s) == "" {
				return nil, fmt.Errorf("invalid approval-labels value: each entry must be a non-empty string")
			}
		}
	}

internal/guard/wasm_payload.go:179

The comment says trusted-users must be a non-empty array when present, but the implementation allows an empty array. If empty should be invalid, add a len(arr)==0 check; otherwise adjust the comment so it matches the intended semantics.

	// Validate trusted-users if present inside allow-only.
	// Must be a non-empty array of non-empty strings when present.
	if trustedUsersRaw, ok := allowOnly["trusted-users"]; ok {
		arr, ok := trustedUsersRaw.([]interface{})
		if !ok {
			return nil, fmt.Errorf("invalid trusted-users value: expected array of strings")
		}
		for _, entry := range arr {
			if s, ok := entry.(string); !ok || strings.TrimSpace(s) == "" {
				return nil, fmt.Errorf("invalid trusted-users value: each entry must be a non-empty string")
			}
		}
	}

Files reviewed: 6/6 changed files
Comments generated: 6

Copilot · 2026-04-29T17:33:01Z

+	if !ok {
+		return nil, fmt.Errorf("label_agent policy must use top-level allow-only object (received policy.allow-only)")
+	}


The error message when the policy is missing the allow-only/allowonly key is misleading: it says "(received policy.allow-only)" even though allow-only wasn't present. Consider changing it to something like "missing top-level allow-only" or include the actual top-level keys received to aid debugging.

Copilot · 2026-04-29T17:33:01Z

+	// Validate blocked-users if present: must be a non-empty array of non-empty strings.
+	if blockedUsersRaw, ok := allowOnly["blocked-users"]; ok {
+		arr, ok := blockedUsersRaw.([]interface{})
+		if !ok {
+			return nil, fmt.Errorf("invalid blocked-users value: expected array of strings")
+		}
+		for _, entry := range arr {
+			if s, ok := entry.(string); !ok || strings.TrimSpace(s) == "" {
+				return nil, fmt.Errorf("invalid blocked-users value: each entry must be a non-empty string")
+			}
+		}
+	}


The comment says blocked-users must be a non-empty array when present, but the code doesn't reject an empty array (it only validates element values). Either enforce len(arr) > 0 or relax the comment so it matches the actual validation.

This issue also appears in the following locations of the same file:

line 137

line 167

Copilot · 2026-04-29T17:33:02Z

+		// If allow-only is absent, the injection is skipped and buildStrictLabelAgentPayload
+		// will reject the payload when called with the missing allow-only key.
+		users := make([]interface{}, len(trustedUsers))
+		for i, u := range trustedUsers {
+			users[i] = u
+		}
+		// Inject into allow-only object if present
+		if allowOnly, ok := payload["allow-only"].(map[string]interface{}); ok {
+			allowOnly["trusted-users"] = users
+		}


BuildLabelAgentPayload only injects trusted-users into payload["allow-only"]. If callers provide a legacy top-level "allowonly" policy map (which buildStrictLabelAgentPayload accepts), trusted-users will not be injected. Consider normalizing "allowonly" → "allow-only" inside this helper or injecting into both keys to make the behavior consistent.

Suggested change

// If allow-only is absent, the injection is skipped and buildStrictLabelAgentPayload

// will reject the payload when called with the missing allow-only key.

users := make([]interface{}, len(trustedUsers))

for i, u := range trustedUsers {

users[i] = u

}

// Inject into allow-only object if present

if allowOnly, ok := payload["allow-only"].(map[string]interface{}); ok {

allowOnly["trusted-users"] = users

}

// Support both the canonical "allow-only" key and the legacy "allowonly"

// key so trusted-users is injected consistently for either accepted shape.

users := make([]interface{}, len(trustedUsers))

for i, u := range trustedUsers {

users[i] = u

}

if allowOnly, ok := payload["allow-only"].(map[string]interface{}); ok {

allowOnly["trusted-users"] = users

}

if allowOnlyLegacy, ok := payload["allowonly"].(map[string]interface{}); ok {

allowOnlyLegacy["trusted-users"] = users

}

Copilot · 2026-04-29T17:33:02Z

+// isValidAllowOnlyRepos returns true if repos is either a recognised string
+// shorthand ("all" or "public") or a non-empty array of strings.
+func isValidAllowOnlyRepos(repos interface{}) bool {
+	switch value := repos.(type) {
+	case string:
+		trimmed := strings.TrimSpace(strings.ToLower(value))
+		return trimmed == "all" || trimmed == "public"
+	case []interface{}:
+		if len(value) == 0 {
+			return false
+		}
+		for _, entry := range value {
+			if _, ok := entry.(string); !ok {
+				return false
+			}
+		}
+		return true


isValidAllowOnlyRepos accepts any string entries in the repos array, including empty/whitespace-only strings. That can let an invalid scope through even though the error message says "non-empty array of scoped strings". Consider trimming and rejecting empty strings (and optionally validating scope syntax) before returning true.

Copilot · 2026-04-29T17:33:02Z

+// checkBoolFailure returns a non-nil error if the given raw response map
+// contains field key set to false, extracting the "error" message if present.
+func checkBoolFailure(raw map[string]interface{}, resultJSON []byte, key string) error {
+	val, ok := raw[key].(bool)
+	if !ok || val {
+		return nil // field absent or true — not a failure
+	}
+	if message, msgOK := raw["error"].(string); msgOK && strings.TrimSpace(message) != "" {
+		logWasm.Printf("label_agent response indicated failure: error=%s, response=%s", message, string(resultJSON))
+		return fmt.Errorf("label_agent rejected policy: %s", message)
+	}
+	logWasm.Printf("label_agent response indicated non-success status: response=%s", string(resultJSON))


checkBoolFailure logs the full raw label_agent response JSON via logWasm.Printf. Since logger.Logger doesn't sanitize values by default, this can leak sensitive policy details into debug/file logs. Consider sanitizing the JSON (e.g., via internal/logger/sanitize) or logging a truncated/summarized version instead.

Suggested change

// checkBoolFailure returns a non-nil error if the given raw response map

// contains field key set to false, extracting the "error" message if present.

func checkBoolFailure(raw map[string]interface{}, resultJSON []byte, key string) error {

val, ok := raw[key].(bool)

if !ok || val {

return nil // field absent or true — not a failure

}

if message, msgOK := raw["error"].(string); msgOK && strings.TrimSpace(message) != "" {

logWasm.Printf("label_agent response indicated failure: error=%s, response=%s", message, string(resultJSON))

return fmt.Errorf("label_agent rejected policy: %s", message)

}

logWasm.Printf("label_agent response indicated non-success status: response=%s", string(resultJSON))

// summarizeLabelAgentResponseForLog returns a minimal JSON summary of a

// label_agent response suitable for diagnostic logging without including the

// full raw payload.

func summarizeLabelAgentResponseForLog(raw map[string]interface{}, key string) string {

summary := map[string]interface{}{

"field_count": len(raw),

}

if val, ok := raw[key].(bool); ok {

summary[key] = val

}

if message, ok := raw["error"].(string); ok && strings.TrimSpace(message) != "" {

summary["error"] = message

}

encoded, err := json.Marshal(summary)

if err != nil {

return fmt.Sprintf("{\"field_count\":%d}", len(raw))

}

return string(encoded)

}

// checkBoolFailure returns a non-nil error if the given raw response map

// contains field key set to false, extracting the "error" message if present.

func checkBoolFailure(raw map[string]interface{}, resultJSON []byte, key string) error {

_ = resultJSON

val, ok := raw[key].(bool)

if !ok || val {

return nil // field absent or true — not a failure

}

responseSummary := summarizeLabelAgentResponseForLog(raw, key)

if message, msgOK := raw["error"].(string); msgOK && strings.TrimSpace(message) != "" {

logWasm.Printf("label_agent response indicated failure: error=%s, response_summary=%s", message, responseSummary)

return fmt.Errorf("label_agent rejected policy: %s", message)

}

logWasm.Printf("label_agent response indicated non-success status: response_summary=%s", responseSummary)

Copilot · 2026-04-29T17:33:03Z

+	// Preferred path: use guard allocator if exported to avoid overlapping
+	// host-managed buffers with guard heap allocations.
+	allocFn := g.module.ExportedFunction("alloc")
+	deallocFn := g.module.ExportedFunction("dealloc")
+	if allocFn != nil {
+		// Use a non-cancelable context for cleanup to avoid leaking WASM heap
+		// allocations if the request context is canceled or times out.
+		cleanupCtx := context.WithoutCancel(ctx)
+
+		inputPtr, err := g.wasmAlloc(ctx, allocFn, inputSize)
+		if err != nil {
+			return nil, 0, fmt.Errorf("failed to allocate WASM input buffer: %w", err)
+		}
+		defer g.wasmDealloc(cleanupCtx, deallocFn, inputPtr, inputSize)
+
+		outputPtr, err := g.wasmAlloc(ctx, allocFn, outputSize)
+		if err != nil {
+			return nil, 0, fmt.Errorf("failed to allocate WASM output buffer: %w", err)
+		}
+		defer g.wasmDealloc(cleanupCtx, deallocFn, outputPtr, outputSize)


tryCallWasmFunction uses the module's exported alloc function whenever it exists, but it doesn't require dealloc to be present. If a module ever exports alloc without dealloc, this path will leak WASM heap allocations (wasmDealloc becomes a no-op). Consider only taking the allocator path when both alloc and dealloc are exported, otherwise fall back to the host-managed buffer strategy (or return an explicit error).

Initial plan

4c6a674

Copilot AI assigned Copilot and lpcox Apr 29, 2026

Copilot started work on behalf of lpcox April 29, 2026 16:39 View session

Copilot AI linked an issue Apr 29, 2026 that may be closed by this pull request

[refactor] Semantic Function Clustering Analysis: Large Files and Refactoring Opportunities #4778

Closed

4 tasks

Copilot AI changed the title ~~[WIP] Refactor large files and consolidate identical methods~~ refactor: split large monolithic files identified by semantic clustering analysis Apr 29, 2026

Copilot finished work on behalf of lpcox April 29, 2026 16:56

Copilot AI requested a review from lpcox April 29, 2026 16:56

lpcox marked this pull request as ready for review April 29, 2026 17:27

Copilot AI review requested due to automatic review settings April 29, 2026 17:27

Copilot started reviewing on behalf of lpcox April 29, 2026 17:27 View session

Copilot AI reviewed Apr 29, 2026

View reviewed changes

lpcox merged commit 303ac98 into main Apr 29, 2026
32 checks passed

lpcox deleted the copilot/refactor-large-files-and-methods branch April 29, 2026 17:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: split large monolithic files identified by semantic clustering analysis#4813

refactor: split large monolithic files identified by semantic clustering analysis#4813
lpcox merged 2 commits intomainfrom
copilot/refactor-large-files-and-methods

Copilot AI commented Apr 29, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 29, 2026

Uh oh!

Copilot AI Apr 29, 2026

Uh oh!

Copilot AI Apr 29, 2026

Uh oh!

Copilot AI Apr 29, 2026

Uh oh!

Copilot AI Apr 29, 2026

Uh oh!

Copilot AI Apr 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Copilot AI commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

internal/guard/wasm.go → 3 files

internal/config/guard_policy.go → 3 files

Not addressed: getTracer() duplication

I tried to connect to the following addresses, but was blocked by firewall rules:

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Copilot's findings

Uh oh!

Copilot AI Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Copilot AI commented Apr 29, 2026 •

edited

Loading

`internal/guard/wasm.go` → 3 files

`internal/config/guard_policy.go` → 3 files

Not addressed: `getTracer()` duplication