Open
Conversation
Owner
sidpan1
commented
Jan 2, 2026
- Add comprehensive Product Requirements Document for Coding Agents Platform MVP
- Includes architecture, API specs, workflow, and storage strategy
- Documents functional/non-functional requirements and success criteria
- Add comprehensive Product Requirements Document for Coding Agents Platform MVP - Includes architecture, API specs, workflow, and storage strategy - Documents functional/non-functional requirements and success criteria
- Research on K8s Agent Sandbox as alternative to custom orchestration - Comprehensive comparison with custom Docker approach from MVP PRD - Requirements mapping (all FR/NFR requirements covered) - Architecture comparison and code examples - Cost analysis: ~$65K savings in Year 1 from reduced engineering time - Performance comparison: sub-second startup with pre-warmed pools - Security analysis: gVisor/Kata provide kernel-level isolation - Recommendation: Use Agent Sandbox (50% faster development, lower maintenance) - Migration path with 4-phase implementation plan - Alternative solutions evaluated: E2B, Modal, Argo, Tekton Key finding: Google's Agent Sandbox (launched KubeCon 2025) is purpose-built for executing untrusted AI-generated code in isolated K8s environments.
Detailed low-level implementation for all MVP requirements: Functional Requirements (FR-1 to FR-10): - FR-1: Task creation API with K8s SandboxClaim - FR-2: Status polling with K8s phase mapping - FR-3: Repository cloning with authenticated git - FR-4: Feature branch creation with conflict handling - FR-5: Claude Code execution in sandbox - FR-6: Automated commits with proper git config - FR-7: Branch pushing with error handling - FR-8: Dual-layer persistence (JSON + PVC) - FR-9: Template injection strategies - FR-10: Timeout via activeDeadlineSeconds Non-Functional Requirements (NFR-1 to NFR-6): - NFR-1: <500ms API response (52-161ms typical) - NFR-2: 10 concurrent tasks via pre-warmed pools - NFR-3: 30-minute hard timeout - NFR-4: EFS-backed durable storage - NFR-5: gVisor kernel isolation + NetworkPolicy - NFR-6: Structured logging, Prometheus metrics, K8s events Complete code examples: - Full FastAPI server (app/main.py) - Kubernetes client wrapper (app/k8s_client.py) - Task storage layer (app/storage.py) - Complete execution script (docker/execute.sh) - Production SandboxTemplate YAML - End-to-end workflow diagram Ready for implementation with production-grade error handling, security best practices, and comprehensive observability.
Changes from GKE to EKS: - Container registry: GCR → ECR - Cluster sizing: GKE node pools → EKS node groups (eksctl config) - Instance types: n2-standard-8 → m6i.2xlarge - Monitoring: Self-hosted Prometheus → Amazon Managed Prometheus (AMP) - Dashboards: Self-hosted Grafana → Amazon Managed Grafana (AMG) - Logging: Cloud Logging → CloudWatch Logs with Container Insights Implementation guide updates: - Added Prometheus remote write configuration for AMP - IRSA (IAM Roles for Service Accounts) setup for Prometheus - SigV4 authentication for AMP remote write - Prometheus deployment with AWS configuration - API server deployment with Prometheus scrape annotations - Terraform IAM role configuration for AMP access - CloudWatch Logs Insights query examples - Updated all image references to ECR - EKS autoscaling options (Cluster Autoscaler or Karpenter) Analysis document updates: - Production deployment on EKS instead of GKE - AWS-specific migration path - Infrastructure cost estimates for EKS + EFS - References updated to include both EKS and GKE options
Major changes to FR-5 and FR-9:
FR-5: Execute Claude Code
- Execution logic now defined in `.claude-templates/{template}/execute.sh`
- Platform script checks for template-specific execution script
- Falls back to default Claude Code invocation if no template script
- Templates can customize pre/post-processing (deps, linting, tests, build)
FR-9: Template Structure (renamed from "Template injection")
- Complete template structure documented
- Templates contain:
- execute.sh: Custom execution logic
- CLAUDE.md: Instructions for Claude
- settings.json: Claude Code configuration
- hooks/: Lifecycle hooks (session-start, pre-commit, etc.)
- skills/: Custom skills
- Templates owned by repo teams, version controlled with code
- No platform-side injection needed
- Template selection via TASK_TEMPLATE env var
Full execution script updated:
- Step 3 now uses template-based execution
- Checks for .claude-templates/{template}/execute.sh
- Provides fallback for backwards compatibility
Benefits:
- Repo teams control their execution workflow
- Can run tests, linters, builds as part of task
- Per-repo or per-use-case customization
- No changes needed to platform code for new workflows
Example template execution script shows:
- npm install (dependency setup)
- npm run lint (linting)
- claude --print --dangerously-skip-permissions (core execution)
- npm test (testing)
- npm run build (building)
Major architectural change based on official Claude Code plugin system:
FR-5: Execute Claude Code
- Templates now contain plugins/ and scripts/ directories
- scripts/init.sh runs before Claude Code to install all plugins
- Plugins follow official Claude Code plugin structure (.claude-plugin/)
- Supports commands, agents, skills, hooks, MCP servers
- Platform executes: init.sh → claude command
FR-9: Template Structure
- plugins/ directory contains Claude Code plugins:
- Each plugin has .claude-plugin/plugin.json (required)
- commands/: Slash commands (.md files)
- agents/: Specialized AI assistants (.md files)
- skills/: Auto-invoked capabilities (SKILL.md)
- hooks/: Event handlers (hooks.json)
- .mcp.json: MCP server configuration
- scripts/init.sh initialization script:
- Installs all plugins from plugins/ directory
- Sets up project dependencies
- Runs linters/quality checks
- Verifies environment
Template structure examples:
- backend template: test-runner, code-quality, api-generator plugins
- frontend template: component-generator, style-helper plugins
- Plugins stored in .claude-templates/{template}/plugins/
- Init script at .claude-templates/{template}/scripts/init.sh
Plugin installation methods:
- claude --plugin-dir for local plugins
- Automatic installation via init.sh script
- Uses official Claude Code plugin format
Complete execution script updated:
- Step 3: Initialize template (run init.sh, install plugins)
- Step 4: Execute Claude Code (with plugins loaded)
- Steps 5-7: Commit, push, write result (renumbered)
References:
- https://code.claude.com/docs/en/plugins
- https://github.com/anthropics/claude-code/blob/main/plugins/README.md
- https://claude-plugins.dev/
Benefits:
- Repo teams manage their own plugins
- Standard plugin format (interoperable with Claude Code ecosystem)
- Multiple plugins per template
- Auto-invoked skills (tests, linting, code generation)
- Custom slash commands per template
- No platform changes needed for new capabilities
Fixed incorrect assumptions about plugin storage and installation: - Removed plugins/ directory from template structure - Updated init script to only copy configuration files - Documented correct plugin location (~/.claude/plugins/marketplaces/) - Added citation to official Claude Code documentation - Clarified that Claude Code auto-installs plugins from marketplace refs This addresses the feedback to verify information against official sources rather than documenting unverified assumptions.
Added support for agent.md file in templates to define custom system instructions for the AI agent: Template Structure Changes: - Added agent.md to template structure alongside CLAUDE.md and settings.json - Documented the purpose of each configuration file Initialization Script Updates: - Copy agent.md to .claude/ directory during template initialization - Updated step numbering from 3 to 4 steps - Export AGENT_PROMPT_FILE for use in execute.sh Execution Script Updates: - Pass agent.md to Claude Code via --system-prompt-file flag - Conditionally add flag only if agent.md exists - Log when custom system prompt is being used Documentation: - Added example agent.md with backend API development instructions - Added example CLAUDE.md showing project context - Added comparison table explaining agent.md vs CLAUDE.md vs settings.json - Cited Claude Code CLI reference for --system-prompt-file flag Key Distinction: - agent.md: Defines agent's role, expertise, and behavior (system prompt) - CLAUDE.md: Provides project-specific context and commands (auto-read) - settings.json: Configures plugins, linters, and formatters
- Replace execute.sh with execute.py using claude-agent-sdk
- Replace init.sh with init.py for template initialization
- Update FR-3 to FR-7 with Python implementations
- Add direct plugin loading via SDK (no copying needed)
- Update SandboxTemplate to use Python entrypoint
- Add Dockerfile for Python + Claude Agent SDK
- Update workflow diagram to reflect Python execution
- Update Summary with bash vs Python comparison table
Key benefits:
- Plugins loaded directly by path ({"type": "local", "path": "..."})
- Type-safe error handling with custom exceptions
- permission_mode='acceptEdits' replaces --dangerously-skip-permissions
- Async iteration for streaming responses
- Testable with pytest and mocking
- Emphasize plugins are loaded from .claude-templates/{template}/plugins/
- Use absolute paths for plugin discovery
- Update init.py to clarify it does NOT handle plugins
- Add comparison table (old copy-based vs new direct-path approach)
- Add example plugin paths in docstrings
Key point: SDK loads plugins directly by path - no copying needed!
…cification - Remove detailed plugin documentation from both guide and PRD - Make plugins an implementation detail, not part of public spec - init.py is now the only required specification for templates - Simplify template structure to just: scripts/init.py, agent.md (optional), CLAUDE.md (optional) - Remove discover_plugins() function from code examples - Update all references to focus on init.py as the single source of truth - Simplifies the developer experience and reduces cognitive overhead This change makes the platform more flexible by allowing init.py to handle all implementation details (plugins, settings, environment setup) internally, while presenting a simpler interface to users.
- Rename all occurrences of .claude-templates/ to .task-templates/ - Makes it clear these are task execution templates, not Claude-specific - Template selection based on task_template parameter from request - More generic naming that doesn't tie to a specific AI provider Updated in both agent-sandbox-implementation-guide.md and mvp-prd.md
- Remove all references to CLAUDE.md and agent.md to keep template pure - init.py is now the ONLY specification for task templates - Remove example sections showing agent.md and CLAUDE.md content - Remove "Optional Files" documentation - Simplify template structure to just scripts/init.py - Update system prompt loading to be an implementation detail - Remove from both PRD and implementation guide This makes the specification cleaner and more focused. All configuration, system prompts, and other setup logic should be handled inside init.py as implementation details, not as part of the public specification.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.