Forge is a 6-layer AI development orchestration system designed for scalability, modularity, and extensibility. Each layer has a specific responsibility and communicates through well-defined interfaces.
- Separation of Concerns - Each layer handles one aspect of development
- Progressive Enhancement - Layers build upon previous layers
- Fail-Fast - Errors caught early prevent wasted resources
- Parallel Execution - Independent tasks run concurrently
- Pattern-Driven - Knowledge patterns guide generation
- Test-Driven - Testing integrated at every step
User Input (Natural Language)
↓
┌────────────────────────────────────┐
│ Layer 1: Decomposition │ ← KnowledgeForge Patterns
│ - Conversational planning │
│ - Task breakdown │
│ - Dependency analysis │
└────────────────────────────────────┘
↓ TaskPlan
┌────────────────────────────────────┐
│ Layer 2: Planning │ ← Tech Stack Selection
│ - File structure design │
│ - Technology selection │
│ - Module organization │
└────────────────────────────────────┘
↓ ProjectPlan
┌────────────────────────────────────┐
│ Layer 3: Generation │ ← Multi-Agent Generation
│ - Parallel code generation │
│ - Pattern application │
│ - Quality assurance │
└────────────────────────────────────┘
↓ GeneratedCode
┌────────────────────────────────────┐
│ Layer 4: Testing │ ← Docker Isolation
│ - Unit tests │
│ - Integration tests │
│ - Security scanning │
│ - Performance benchmarking │
└────────────────────────────────────┘
↓ TestResults
┌────────────────────────────────────┐
│ Layer 5: Review │ ← Iterative Refinement
│ - Failure analysis │
│ - Fix generation │
│ - Learning database │
└────────────────────────────────────┘
↓ Fixes (if needed, loop to Layer 4)
┌────────────────────────────────────┐
│ Layer 6: Deployment │ ← Git & Deployment
│ - Git workflows │
│ - PR creation │
│ - Deployment configs │
└────────────────────────────────────┘
↓
Production-Ready Code
Purpose: Transform user requirements into actionable tasks
Components:
DecompositionLayer- Main orchestratorConversationalPlanner- Interactive requirement gatheringTaskDecomposer- Break down into tasksDependencyAnalyzer- Build dependency graph
Key Data Structures:
@dataclass
class Task:
id: str
title: str
description: str
dependencies: List[str]
complexity: Complexity
estimated_time: int # minutes
tech_stack: List[str]
file_outputs: List[str]Pattern Usage:
- Loads KB3 patterns for best practices
- Uses pattern library to identify task types
- Applies complexity estimation patterns
Flow:
- User provides natural language description
- System asks clarifying questions
- Decomposes into tasks with KF patterns
- Analyzes dependencies
- Estimates complexity
- Returns TaskPlan
Purpose: Design project structure and select technologies
Components:
PlanningLayer- Structure designerTechStackSelector- Technology selectionFileStructureGenerator- Directory layoutDependencyResolver- Package management
Key Data Structures:
@dataclass
class ProjectPlan:
tasks: List[Task]
file_structure: Dict[str, FileSpec]
tech_stack: TechStack
dependencies: List[str]
entry_points: List[str]Pattern Usage:
- Project structure patterns
- Tech stack best practices
- Naming conventions
Flow:
- Receives TaskPlan
- Selects appropriate tech stack
- Designs file structure
- Plans module organization
- Returns ProjectPlan
Purpose: Generate code using AI and patterns
Components:
GenerationLayer- Generation orchestratorCodeGenAPI- API-based generationClaudeCodeGenerator- Claude-based generationGeneratorFactory- Provider abstractionQualityChecker- Code validation
Providers:
- Anthropic Claude (primary)
- OpenAI GPT-4 (fallback)
Key Features:
- Parallel Generation - Multiple tasks simultaneously
- Pattern Integration - KF patterns in prompts
- Context Management - Cross-file dependencies
- Quality Checks - Syntax validation, best practices
Flow:
- Receives ProjectPlan
- Groups tasks by dependencies
- Generates code in parallel waves
- Validates each output
- Returns GeneratedCode
Purpose: Comprehensive testing and validation
Components:
TestingOrchestrator- Test coordinationDockerRunner- Isolated test executionTestGenerator- Test code generationSecurityScanner- Vulnerability detectionPerformanceBenchmark- Performance testing
Test Types:
- Unit Tests - Individual function/class testing
- Integration Tests - Component interaction testing
- Security Scans - Vulnerability detection
- Performance Tests - Latency/throughput benchmarks
Docker Isolation:
- Each test suite runs in isolated container
- Clean environment per test
- Reproducible results
- Resource limits
Flow:
- Receives GeneratedCode
- Generates test code
- Builds Docker environment
- Runs test suites
- Scans for vulnerabilities
- Benchmarks performance
- Returns ComprehensiveTestReport
Purpose: Iterative refinement until tests pass
Components:
ReviewLayer- Iteration controllerFailureAnalyzer- Root cause detectionFixGenerator- AI-powered fix generationLearningDatabase- Success pattern storage
Iteration Process:
- Run tests
- Analyze failures (14 failure types)
- Generate fixes (top 3 per iteration)
- Apply fixes
- Repeat (max 5 iterations)
Failure Types:
- Syntax errors
- Import errors
- Type errors
- Logic errors (assertions)
- Security vulnerabilities
- Performance degradation
Learning System:
- Stores successful fix patterns
- Tracks fix success rate
- Calculates average iterations
- Improves over time
Flow:
- Receives test failures
- Categorizes and analyzes
- Generates targeted fixes
- Applies fixes
- Re-runs tests
- Updates learning database
Purpose: Git workflows and deployment automation
Components:
ForgeRepository- Git operationsGitHubClient- PR managementDeploymentGenerator- Platform configsConventionalCommit- Commit formatting
Features:
- Branch Management -
forge/*naming - Conventional Commits - Structured messages
- PR Creation - With checklists
- Multi-Platform - 5 deployment targets
Platforms:
- fly.io
- Vercel
- AWS Lambda
- Docker/Docker Compose
- Kubernetes
Flow:
- Create feature branch
- Generate deployment configs
- Commit with conventional format
- Push to remote
- Create PR with checklist
User Description
↓
┌─────────────────────┐
│ 1. Decomposition │
└─────────────────────┘
↓ TaskPlan
┌─────────────────────┐
│ 2. Planning │
└─────────────────────┘
↓ ProjectPlan
┌─────────────────────┐
│ 3. Generation │ ← Parallel execution
└─────────────────────┘
↓ GeneratedCode
┌─────────────────────┐
│ 4. Testing │ ← Docker isolation
└─────────────────────┘
↓ TestResults
│
├─ Tests Pass ──────────────┐
│ ↓
└─ Tests Fail ┌─────────────────────┐
↓ │ 6. Deployment │
┌─────────────────────┐└─────────────────────┘
│ 5. Review & Fix │ ↓
└─────────────────────┘ Production
↓
(Loop to Testing)
Forge maintains state across layers using StateManager:
class StateManager:
def save_task_plan(project_id: str, plan: TaskPlan)
def load_task_plan(project_id: str) -> TaskPlan
def save_project_plan(project_id: str, plan: ProjectPlan)
def load_project_plan(project_id: str) -> ProjectPlan
def save_generated_code(project_id: str, code: GeneratedCode)
def load_generated_code(project_id: str) -> GeneratedCode
def save_test_results(project_id: str, results: TestResults)
def load_test_results(project_id: str) -> TestResultsState stored in .forge/state/<project_id>/:
task_plan.jsonproject_plan.jsongenerated_code/test_results.json
PatternStore - Centralized pattern access
class PatternStore:
def search_patterns(query: str) -> List[Pattern]
def get_pattern_by_id(id: str) -> Pattern
def get_similar_patterns(pattern: Pattern) -> List[Pattern]Uses semantic search with embeddings for relevance.
ForgeError - Base exception class
class ForgeError(Exception):
"""Base exception for Forge"""
class DecompositionError(ForgeError):
"""Task decomposition errors"""
class GenerationError(ForgeError):
"""Code generation errors"""
class TestingError(ForgeError):
"""Testing errors"""All errors include:
- Clear error message
- Fix suggestions
- Documentation links
- Example solutions
Structured logging throughout:
logger.info("Starting code generation", extra={
"project_id": project_id,
"task_count": len(tasks),
"provider": "anthropic"
})Log levels:
DEBUG- Detailed diagnostic infoINFO- Normal operation milestonesWARNING- Unexpected but handledERROR- Operation failuresCRITICAL- System failures
Tasks executed in dependency waves:
Wave 1: [Task A, Task B, Task C] ← No dependencies
↓
Wave 2: [Task D, Task E] ← Depend on Wave 1
↓
Wave 3: [Task F] ← Depends on Wave 2
Max parallelism: 4 workers (configurable)
Pattern Embeddings - Cached to avoid recomputation
~/.forge/cache/
embeddings/
patterns.pkl
last_updated.txt
Test Results - Cached until code changes
.forge/cache/<project_id>/
test_results_<hash>.json
Streaming - Large files processed incrementally Batching - Tasks grouped to reduce API calls Cleanup - Temporary files deleted after use
class CustomGenerator(BaseGenerator):
def generate_code(
self,
task: Task,
context: GenerationContext
) -> GeneratedCode:
# Custom generation logic
pass
# Register
GeneratorFactory.register("custom", CustomGenerator)class CustomTestRunner(BaseTestRunner):
def run_tests(
self,
code_dir: Path,
test_dir: Path
) -> TestResult:
# Custom test execution
passclass CustomPlatform(DeploymentGenerator):
def generate_configs(
self,
config: DeploymentConfig
) -> List[Path]:
# Generate platform configs
pass- Never logged or stored in files
- Environment variables only
- Encrypted in memory if possible
- All tests run in isolated Docker containers
- Resource limits enforced
- Network isolation optional
- Security scanning mandatory
- Vulnerability database checks
- Best practice validation
- Stateless generation workers
- Task queue for distribution
- Shared state storage
- Configurable worker count
- Memory limits per task
- Timeout controls
- Task completion time
- API call count/latency
- Test pass rate
- Fix success rate
- Memory usage
- Error rates
- Pattern store connectivity
- API provider status
- Docker daemon status
- Disk space available
- Web UI - Visual project planning
- Cloud Execution - Serverless generation
- Team Features - Shared projects
- Plugin System - Third-party extensions
- IDE Integration - VSCode/PyCharm plugins
- Microservices for layers
- Message queue between layers
- Distributed state management
- Multi-tenancy support