Skip to content

refactor: split large monolithic files identified by semantic clustering analysis#4813

Merged
lpcox merged 2 commits intomainfrom
copilot/refactor-large-files-and-methods
Apr 29, 2026
Merged

refactor: split large monolithic files identified by semantic clustering analysis#4813
lpcox merged 2 commits intomainfrom
copilot/refactor-large-files-and-methods

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Apr 29, 2026

Two files flagged by automated semantic function clustering (wasm.go at 1,225 lines, guard_policy.go at 797 lines) each mixed 4 distinct concerns. Split both along their natural seams.

internal/guard/wasm.go → 3 files

File Responsibility
wasm.go Struct, lifecycle, host functions, public dispatch (LabelAgent/LabelResource/LabelResponse/Close)
wasm_payload.go Policy payload building and validation before WASM call
wasm_parse.go Response parsing, WASM memory management (alloc/dealloc/call helpers)

The split mirrors the existing test file naming (wasm_test.go, wasm_parse_test.go, wasm_response_parse_test.go) — the logical separation was already in authors' heads.

internal/config/guard_policy.go → 3 files

File Responsibility
guard_policy.go Struct definitions, JSON marshal/unmarshal, IsWriteSinkPolicy
guard_policy_validation.go Validate*, Normalize*, and character/token helpers
guard_policy_parse.go Parse*, Build*, NormalizeScopeKind

Not addressed: getTracer() duplication

The identical one-liner on UnifiedServer and proxyHandler was left as-is — extracting a shared tracerHolder would require a new cross-package dependency for two trivial methods, not worth the structural cost.

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

  • example.com
    • Triggering command: /tmp/go-build3049887614/b509/launcher.test /tmp/go-build3049887614/b509/launcher.test -test.testlogfile=/tmp/go-build3049887614/b509/testlog.txt -test.paniconexit0 -test.timeout=10m0s -test.v=true 2/compile.go 2/equals.go x_amd64/compile (dns block)
    • Triggering command: /tmp/go-build3765283910/b509/launcher.test /tmp/go-build3765283910/b509/launcher.test -test.testlogfile=/tmp/go-build3765283910/b509/testlog.txt -test.paniconexit0 -test.timeout=10m0s -test.v=true ache/go/1.25.9/x64/src/os/user 64/src/net/http/httputil/dump.go .cfg --global user.name x_amd64/cgo ache/go/1.25.9/x64/pkg/tool/linux_amd64/compile -o 9887614/b468/_pkg_.a -trimpath .cfg -p g/grpc/internal/--version -lang=go1.24 ache/go/1.25.9/x64/pkg/tool/linux_amd64/vet (dns block)
    • Triggering command: /tmp/go-build27267908/b513/launcher.test /tmp/go-build27267908/b513/launcher.test -test.testlogfile=/tmp/go-build27267908/b513/testlog.txt -test.paniconexit0 -test.timeout=10m0s estl�� --version x_amd64/asm ache/go/1.25.9/x64/pkg/tool/linux_amd64/vet rnal/errors/errobash .cfg 64/pkg/tool/linu--noprofile ache/go/1.25.9/x64/pkg/tool/linu-buildtags (dns block)
  • invalid-host-that-does-not-exist-12345.com
    • Triggering command: /tmp/go-build3049887614/b491/config.test /tmp/go-build3049887614/b491/config.test -test.testlogfile=/tmp/go-build3049887614/b491/testlog.txt -test.paniconexit0 -test.timeout=10m0s -test.v=true ktype/networktype.go 6UTz/5ecS_4XL9_qgoogle.golang.org/grpc/credentials x_amd64/asm (dns block)
    • Triggering command: /tmp/go-build3765283910/b491/config.test /tmp/go-build3765283910/b491/config.test -test.testlogfile=/tmp/go-build3765283910/b491/testlog.txt -test.paniconexit0 -test.timeout=10m0s -test.v=true ACCEPT /tmp/go-build3049887614/b100/vet.cfg 64/pkg/tool/linux_amd64/vet 9887614/b346/ --global ctor 64/pkg/tool/linux_amd64/vet -I t/example_test.go t/gateway_integration_test.go 64/pkg/tool/linux_amd64/compile --gdwarf-5 --64 -o 64/pkg/tool/linusecurity (dns block)
    • Triggering command: /tmp/go-build27267908/b495/config.test /tmp/go-build27267908/b495/config.test -test.testlogfile=/tmp/go-build27267908/b495/testlog.txt -test.paniconexit0 -test.timeout=10m0s .cfg�� 9887614/b552/_pkg_.a -tests x_amd64/vet 64/src/runtime/c/opt/hostedtoolcache/go/1.25.9/x64/pkg/tool/linux_amd64/vet (dns block)
  • nonexistent.local
    • Triggering command: /tmp/go-build3049887614/b509/launcher.test /tmp/go-build3049887614/b509/launcher.test -test.testlogfile=/tmp/go-build3049887614/b509/testlog.txt -test.paniconexit0 -test.timeout=10m0s -test.v=true 2/compile.go 2/equals.go x_amd64/compile (dns block)
    • Triggering command: /tmp/go-build3765283910/b509/launcher.test /tmp/go-build3765283910/b509/launcher.test -test.testlogfile=/tmp/go-build3765283910/b509/testlog.txt -test.paniconexit0 -test.timeout=10m0s -test.v=true ache/go/1.25.9/x64/src/os/user 64/src/net/http/httputil/dump.go .cfg --global user.name x_amd64/cgo ache/go/1.25.9/x64/pkg/tool/linux_amd64/compile -o 9887614/b468/_pkg_.a -trimpath .cfg -p g/grpc/internal/--version -lang=go1.24 ache/go/1.25.9/x64/pkg/tool/linux_amd64/vet (dns block)
    • Triggering command: /tmp/go-build27267908/b513/launcher.test /tmp/go-build27267908/b513/launcher.test -test.testlogfile=/tmp/go-build27267908/b513/testlog.txt -test.paniconexit0 -test.timeout=10m0s estl�� --version x_amd64/asm ache/go/1.25.9/x64/pkg/tool/linux_amd64/vet rnal/errors/errobash .cfg 64/pkg/tool/linu--noprofile ache/go/1.25.9/x64/pkg/tool/linu-buildtags (dns block)
  • slow.example.com
    • Triggering command: /tmp/go-build3049887614/b509/launcher.test /tmp/go-build3049887614/b509/launcher.test -test.testlogfile=/tmp/go-build3049887614/b509/testlog.txt -test.paniconexit0 -test.timeout=10m0s -test.v=true 2/compile.go 2/equals.go x_amd64/compile (dns block)
    • Triggering command: /tmp/go-build3765283910/b509/launcher.test /tmp/go-build3765283910/b509/launcher.test -test.testlogfile=/tmp/go-build3765283910/b509/testlog.txt -test.paniconexit0 -test.timeout=10m0s -test.v=true ache/go/1.25.9/x64/src/os/user 64/src/net/http/httputil/dump.go .cfg --global user.name x_amd64/cgo ache/go/1.25.9/x64/pkg/tool/linux_amd64/compile -o 9887614/b468/_pkg_.a -trimpath .cfg -p g/grpc/internal/--version -lang=go1.24 ache/go/1.25.9/x64/pkg/tool/linux_amd64/vet (dns block)
    • Triggering command: /tmp/go-build27267908/b513/launcher.test /tmp/go-build27267908/b513/launcher.test -test.testlogfile=/tmp/go-build27267908/b513/testlog.txt -test.paniconexit0 -test.timeout=10m0s estl�� --version x_amd64/asm ache/go/1.25.9/x64/pkg/tool/linux_amd64/vet rnal/errors/errobash .cfg 64/pkg/tool/linu--noprofile ache/go/1.25.9/x64/pkg/tool/linu-buildtags (dns block)
  • this-host-does-not-exist-12345.com
    • Triggering command: /tmp/go-build3049887614/b518/mcp.test /tmp/go-build3049887614/b518/mcp.test -test.testlogfile=/tmp/go-build3049887614/b518/testlog.txt -test.paniconexit0 -test.timeout=10m0s -test.v=true rnal/encoding/ta-errorsas .cfg x_amd64/compile -p path/filepath -lang=go1.25 x_amd64/compile (dns block)
    • Triggering command: /tmp/go-build3765283910/b518/mcp.test /tmp/go-build3765283910/b518/mcp.test -test.testlogfile=/tmp/go-build3765283910/b518/testlog.txt -test.paniconexit0 -test.timeout=10m0s -test.v=true olang.org/grpc@v1.80.0/health/gr-c=4 olang.org/grpc@v1.80.0/health/gr-nolocalimports x_amd64/compile -p g/protobuf/runti/usr/bin/runc -lang=go1.25 x_amd64/compile -W 3sHF/gPXvu-EnuLHrnQI63sHF pkg/mod/go.opentelemetry.io/otel@v1.43.0/internal/global/instrum-ifaceassert x_amd64/vet . --gdwarf2 --64 x_amd64/vet (dns block)
    • Triggering command: /tmp/go-build27267908/b522/mcp.test /tmp/go-build27267908/b522/mcp.test -test.testlogfile=/tmp/go-build27267908/b522/testlog.txt -test.paniconexit0 -test.timeout=10m0s (dns block)

If you need me to access, download, or install something from one of these locations, you can either:

- Split internal/guard/wasm.go (1225 lines) into:
  - wasm.go: core struct, lifecycle, host functions, public operations
  - wasm_payload.go: policy payload building functions
  - wasm_parse.go: response parsing utilities and WASM call helpers

- Split internal/config/guard_policy.go (797 lines) into:
  - guard_policy.go: struct definitions and JSON serialization
  - guard_policy_validation.go: Validate*, Normalize* functions and helpers
  - guard_policy_parse.go: Parse*, Build*, NormalizeScopeKind functions

All tests pass (make agent-finished)."

Agent-Logs-Url: https://github.com/github/gh-aw-mcpg/sessions/f6540a78-8492-4b25-8424-d316c017d230

Co-authored-by: lpcox <15877973+lpcox@users.noreply.github.com>
Copilot AI changed the title [WIP] Refactor large files and consolidate identical methods refactor: split large monolithic files identified by semantic clustering analysis Apr 29, 2026
Copilot AI requested a review from lpcox April 29, 2026 16:56
@lpcox lpcox marked this pull request as ready for review April 29, 2026 17:27
Copilot AI review requested due to automatic review settings April 29, 2026 17:27
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Refactors two previously large files by splitting them into smaller, responsibility-focused units to improve navigability and isolate payload-building/validation and parsing concerns.

Changes:

  • Split internal/guard/wasm.go into wasm.go (guard lifecycle/dispatch) + wasm_payload.go (label_agent payload building/validation) + wasm_parse.go (response parsing + WASM call/memory helpers).
  • Split internal/config/guard_policy.go into guard_policy.go (types + JSON marshal/unmarshal) + guard_policy_validation.go (Validate*/Normalize*) + guard_policy_parse.go (Parse*/Build* helpers).
  • Kept behavior largely consistent while relocating logic to better-matched files.
Show a summary per file
File Description
internal/guard/wasm.go Removes payload/parse/call helper implementations from the monolith, leaving the guard struct, lifecycle, and public dispatch methods.
internal/guard/wasm_payload.go Introduces policy normalization + strict label_agent payload validation + payload augmentation (trusted bots/users).
internal/guard/wasm_parse.go Introduces response parsing helpers and WASM call/memory management helpers (alloc/dealloc, retries, trap detection).
internal/config/guard_policy.go Keeps core policy types and JSON marshal/unmarshal behavior (including legacy key support).
internal/config/guard_policy_validation.go Moves allow-only/write-sink validation + normalization and scope/token helpers into a dedicated file.
internal/config/guard_policy_parse.go Moves policy parsing/building helpers and scope_kind normalization into a dedicated file.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comments suppressed due to low confidence (2)

internal/guard/wasm_payload.go:148

  • The comment says approval-labels must be a non-empty array when present, but an empty array currently passes validation. Either add a len(arr)==0 check or update the comment to reflect that empty is allowed.
	// Validate approval-labels if present: must be a non-empty array of non-empty strings.
	if approvalLabelsRaw, ok := allowOnly["approval-labels"]; ok {
		arr, ok := approvalLabelsRaw.([]interface{})
		if !ok {
			return nil, fmt.Errorf("invalid approval-labels value: expected array of strings")
		}
		for _, entry := range arr {
			if s, ok := entry.(string); !ok || strings.TrimSpace(s) == "" {
				return nil, fmt.Errorf("invalid approval-labels value: each entry must be a non-empty string")
			}
		}
	}

internal/guard/wasm_payload.go:179

  • The comment says trusted-users must be a non-empty array when present, but the implementation allows an empty array. If empty should be invalid, add a len(arr)==0 check; otherwise adjust the comment so it matches the intended semantics.
	// Validate trusted-users if present inside allow-only.
	// Must be a non-empty array of non-empty strings when present.
	if trustedUsersRaw, ok := allowOnly["trusted-users"]; ok {
		arr, ok := trustedUsersRaw.([]interface{})
		if !ok {
			return nil, fmt.Errorf("invalid trusted-users value: expected array of strings")
		}
		for _, entry := range arr {
			if s, ok := entry.(string); !ok || strings.TrimSpace(s) == "" {
				return nil, fmt.Errorf("invalid trusted-users value: each entry must be a non-empty string")
			}
		}
	}
  • Files reviewed: 6/6 changed files
  • Comments generated: 6

Comment on lines +69 to +71
if !ok {
return nil, fmt.Errorf("label_agent policy must use top-level allow-only object (received policy.allow-only)")
}
Copy link

Copilot AI Apr 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error message when the policy is missing the allow-only/allowonly key is misleading: it says "(received policy.allow-only)" even though allow-only wasn't present. Consider changing it to something like "missing top-level allow-only" or include the actual top-level keys received to aid debugging.

Copilot uses AI. Check for mistakes.
Comment on lines +124 to +135
// Validate blocked-users if present: must be a non-empty array of non-empty strings.
if blockedUsersRaw, ok := allowOnly["blocked-users"]; ok {
arr, ok := blockedUsersRaw.([]interface{})
if !ok {
return nil, fmt.Errorf("invalid blocked-users value: expected array of strings")
}
for _, entry := range arr {
if s, ok := entry.(string); !ok || strings.TrimSpace(s) == "" {
return nil, fmt.Errorf("invalid blocked-users value: each entry must be a non-empty string")
}
}
}
Copy link

Copilot AI Apr 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment says blocked-users must be a non-empty array when present, but the code doesn't reject an empty array (it only validates element values). Either enforce len(arr) > 0 or relax the comment so it matches the actual validation.

This issue also appears in the following locations of the same file:

  • line 137
  • line 167

Copilot uses AI. Check for mistakes.
Comment on lines +258 to +267
// If allow-only is absent, the injection is skipped and buildStrictLabelAgentPayload
// will reject the payload when called with the missing allow-only key.
users := make([]interface{}, len(trustedUsers))
for i, u := range trustedUsers {
users[i] = u
}
// Inject into allow-only object if present
if allowOnly, ok := payload["allow-only"].(map[string]interface{}); ok {
allowOnly["trusted-users"] = users
}
Copy link

Copilot AI Apr 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BuildLabelAgentPayload only injects trusted-users into payload["allow-only"]. If callers provide a legacy top-level "allowonly" policy map (which buildStrictLabelAgentPayload accepts), trusted-users will not be injected. Consider normalizing "allowonly" → "allow-only" inside this helper or injecting into both keys to make the behavior consistent.

Suggested change
// If allow-only is absent, the injection is skipped and buildStrictLabelAgentPayload
// will reject the payload when called with the missing allow-only key.
users := make([]interface{}, len(trustedUsers))
for i, u := range trustedUsers {
users[i] = u
}
// Inject into allow-only object if present
if allowOnly, ok := payload["allow-only"].(map[string]interface{}); ok {
allowOnly["trusted-users"] = users
}
// Support both the canonical "allow-only" key and the legacy "allowonly"
// key so trusted-users is injected consistently for either accepted shape.
users := make([]interface{}, len(trustedUsers))
for i, u := range trustedUsers {
users[i] = u
}
if allowOnly, ok := payload["allow-only"].(map[string]interface{}); ok {
allowOnly["trusted-users"] = users
}
if allowOnlyLegacy, ok := payload["allowonly"].(map[string]interface{}); ok {
allowOnlyLegacy["trusted-users"] = users
}

Copilot uses AI. Check for mistakes.
Comment on lines +273 to +289
// isValidAllowOnlyRepos returns true if repos is either a recognised string
// shorthand ("all" or "public") or a non-empty array of strings.
func isValidAllowOnlyRepos(repos interface{}) bool {
switch value := repos.(type) {
case string:
trimmed := strings.TrimSpace(strings.ToLower(value))
return trimmed == "all" || trimmed == "public"
case []interface{}:
if len(value) == 0 {
return false
}
for _, entry := range value {
if _, ok := entry.(string); !ok {
return false
}
}
return true
Copy link

Copilot AI Apr 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isValidAllowOnlyRepos accepts any string entries in the repos array, including empty/whitespace-only strings. That can let an invalid scope through even though the error message says "non-empty array of scoped strings". Consider trimming and rejecting empty strings (and optionally validating scope syntax) before returning true.

Copilot uses AI. Check for mistakes.
Comment on lines +295 to +306
// checkBoolFailure returns a non-nil error if the given raw response map
// contains field key set to false, extracting the "error" message if present.
func checkBoolFailure(raw map[string]interface{}, resultJSON []byte, key string) error {
val, ok := raw[key].(bool)
if !ok || val {
return nil // field absent or true — not a failure
}
if message, msgOK := raw["error"].(string); msgOK && strings.TrimSpace(message) != "" {
logWasm.Printf("label_agent response indicated failure: error=%s, response=%s", message, string(resultJSON))
return fmt.Errorf("label_agent rejected policy: %s", message)
}
logWasm.Printf("label_agent response indicated non-success status: response=%s", string(resultJSON))
Copy link

Copilot AI Apr 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

checkBoolFailure logs the full raw label_agent response JSON via logWasm.Printf. Since logger.Logger doesn't sanitize values by default, this can leak sensitive policy details into debug/file logs. Consider sanitizing the JSON (e.g., via internal/logger/sanitize) or logging a truncated/summarized version instead.

Suggested change
// checkBoolFailure returns a non-nil error if the given raw response map
// contains field key set to false, extracting the "error" message if present.
func checkBoolFailure(raw map[string]interface{}, resultJSON []byte, key string) error {
val, ok := raw[key].(bool)
if !ok || val {
return nil // field absent or true — not a failure
}
if message, msgOK := raw["error"].(string); msgOK && strings.TrimSpace(message) != "" {
logWasm.Printf("label_agent response indicated failure: error=%s, response=%s", message, string(resultJSON))
return fmt.Errorf("label_agent rejected policy: %s", message)
}
logWasm.Printf("label_agent response indicated non-success status: response=%s", string(resultJSON))
// summarizeLabelAgentResponseForLog returns a minimal JSON summary of a
// label_agent response suitable for diagnostic logging without including the
// full raw payload.
func summarizeLabelAgentResponseForLog(raw map[string]interface{}, key string) string {
summary := map[string]interface{}{
"field_count": len(raw),
}
if val, ok := raw[key].(bool); ok {
summary[key] = val
}
if message, ok := raw["error"].(string); ok && strings.TrimSpace(message) != "" {
summary["error"] = message
}
encoded, err := json.Marshal(summary)
if err != nil {
return fmt.Sprintf("{\"field_count\":%d}", len(raw))
}
return string(encoded)
}
// checkBoolFailure returns a non-nil error if the given raw response map
// contains field key set to false, extracting the "error" message if present.
func checkBoolFailure(raw map[string]interface{}, resultJSON []byte, key string) error {
_ = resultJSON
val, ok := raw[key].(bool)
if !ok || val {
return nil // field absent or true — not a failure
}
responseSummary := summarizeLabelAgentResponseForLog(raw, key)
if message, msgOK := raw["error"].(string); msgOK && strings.TrimSpace(message) != "" {
logWasm.Printf("label_agent response indicated failure: error=%s, response_summary=%s", message, responseSummary)
return fmt.Errorf("label_agent rejected policy: %s", message)
}
logWasm.Printf("label_agent response indicated non-success status: response_summary=%s", responseSummary)

Copilot uses AI. Check for mistakes.
Comment on lines +180 to +199
// Preferred path: use guard allocator if exported to avoid overlapping
// host-managed buffers with guard heap allocations.
allocFn := g.module.ExportedFunction("alloc")
deallocFn := g.module.ExportedFunction("dealloc")
if allocFn != nil {
// Use a non-cancelable context for cleanup to avoid leaking WASM heap
// allocations if the request context is canceled or times out.
cleanupCtx := context.WithoutCancel(ctx)

inputPtr, err := g.wasmAlloc(ctx, allocFn, inputSize)
if err != nil {
return nil, 0, fmt.Errorf("failed to allocate WASM input buffer: %w", err)
}
defer g.wasmDealloc(cleanupCtx, deallocFn, inputPtr, inputSize)

outputPtr, err := g.wasmAlloc(ctx, allocFn, outputSize)
if err != nil {
return nil, 0, fmt.Errorf("failed to allocate WASM output buffer: %w", err)
}
defer g.wasmDealloc(cleanupCtx, deallocFn, outputPtr, outputSize)
Copy link

Copilot AI Apr 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tryCallWasmFunction uses the module's exported alloc function whenever it exists, but it doesn't require dealloc to be present. If a module ever exports alloc without dealloc, this path will leak WASM heap allocations (wasmDealloc becomes a no-op). Consider only taking the allocator path when both alloc and dealloc are exported, otherwise fall back to the host-managed buffer strategy (or return an explicit error).

Copilot uses AI. Check for mistakes.
@lpcox lpcox merged commit 303ac98 into main Apr 29, 2026
32 checks passed
@lpcox lpcox deleted the copilot/refactor-large-files-and-methods branch April 29, 2026 17:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[refactor] Semantic Function Clustering Analysis: Large Files and Refactoring Opportunities

3 participants