From b8dfe8db6307fc8fae05507e7042173e3f111f77 Mon Sep 17 00:00:00 2001 From: TG-Techie Date: Thu, 19 Feb 2026 21:17:27 -0500 Subject: [PATCH 01/12] init commit --- .gitignore | 1 + LICENSE | 201 +++++++++++++++++++++++++++++++ NLSpec - Grouning Document.md | 220 ++++++++++++++++++++++++++++++++++ 3 files changed, 422 insertions(+) create mode 100644 .gitignore create mode 100644 LICENSE create mode 100644 NLSpec - Grouning Document.md diff --git a/.gitignore b/.gitignore new file mode 100644 index 0000000..496ee2c --- /dev/null +++ b/.gitignore @@ -0,0 +1 @@ +.DS_Store \ No newline at end of file diff --git a/LICENSE b/LICENSE new file mode 100644 index 0000000..261eeb9 --- /dev/null +++ b/LICENSE @@ -0,0 +1,201 @@ + Apache License + Version 2.0, January 2004 + http://www.apache.org/licenses/ + + TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION + + 1. Definitions. + + "License" shall mean the terms and conditions for use, reproduction, + and distribution as defined by Sections 1 through 9 of this document. + + "Licensor" shall mean the copyright owner or entity authorized by + the copyright owner that is granting the License. + + "Legal Entity" shall mean the union of the acting entity and all + other entities that control, are controlled by, or are under common + control with that entity. For the purposes of this definition, + "control" means (i) the power, direct or indirect, to cause the + direction or management of such entity, whether by contract or + otherwise, or (ii) ownership of fifty percent (50%) or more of the + outstanding shares, or (iii) beneficial ownership of such entity. + + "You" (or "Your") shall mean an individual or Legal Entity + exercising permissions granted by this License. + + "Source" form shall mean the preferred form for making modifications, + including but not limited to software source code, documentation + source, and configuration files. + + "Object" form shall mean any form resulting from mechanical + transformation or translation of a Source form, including but + not limited to compiled object code, generated documentation, + and conversions to other media types. + + "Work" shall mean the work of authorship, whether in Source or + Object form, made available under the License, as indicated by a + copyright notice that is included in or attached to the work + (an example is provided in the Appendix below). + + "Derivative Works" shall mean any work, whether in Source or Object + form, that is based on (or derived from) the Work and for which the + editorial revisions, annotations, elaborations, or other modifications + represent, as a whole, an original work of authorship. For the purposes + of this License, Derivative Works shall not include works that remain + separable from, or merely link (or bind by name) to the interfaces of, + the Work and Derivative Works thereof. + + "Contribution" shall mean any work of authorship, including + the original version of the Work and any modifications or additions + to that Work or Derivative Works thereof, that is intentionally + submitted to Licensor for inclusion in the Work by the copyright owner + or by an individual or Legal Entity authorized to submit on behalf of + the copyright owner. For the purposes of this definition, "submitted" + means any form of electronic, verbal, or written communication sent + to the Licensor or its representatives, including but not limited to + communication on electronic mailing lists, source code control systems, + and issue tracking systems that are managed by, or on behalf of, the + Licensor for the purpose of discussing and improving the Work, but + excluding communication that is conspicuously marked or otherwise + designated in writing by the copyright owner as "Not a Contribution." + + "Contributor" shall mean Licensor and any individual or Legal Entity + on behalf of whom a Contribution has been received by Licensor and + subsequently incorporated within the Work. + + 2. Grant of Copyright License. Subject to the terms and conditions of + this License, each Contributor hereby grants to You a perpetual, + worldwide, non-exclusive, no-charge, royalty-free, irrevocable + copyright license to reproduce, prepare Derivative Works of, + publicly display, publicly perform, sublicense, and distribute the + Work and such Derivative Works in Source or Object form. + + 3. Grant of Patent License. Subject to the terms and conditions of + this License, each Contributor hereby grants to You a perpetual, + worldwide, non-exclusive, no-charge, royalty-free, irrevocable + (except as stated in this section) patent license to make, have made, + use, offer to sell, sell, import, and otherwise transfer the Work, + where such license applies only to those patent claims licensable + by such Contributor that are necessarily infringed by their + Contribution(s) alone or by combination of their Contribution(s) + with the Work to which such Contribution(s) was submitted. If You + institute patent litigation against any entity (including a + cross-claim or counterclaim in a lawsuit) alleging that the Work + or a Contribution incorporated within the Work constitutes direct + or contributory patent infringement, then any patent licenses + granted to You under this License for that Work shall terminate + as of the date such litigation is filed. + + 4. Redistribution. You may reproduce and distribute copies of the + Work or Derivative Works thereof in any medium, with or without + modifications, and in Source or Object form, provided that You + meet the following conditions: + + (a) You must give any other recipients of the Work or + Derivative Works a copy of this License; and + + (b) You must cause any modified files to carry prominent notices + stating that You changed the files; and + + (c) You must retain, in the Source form of any Derivative Works + that You distribute, all copyright, patent, trademark, and + attribution notices from the Source form of the Work, + excluding those notices that do not pertain to any part of + the Derivative Works; and + + (d) If the Work includes a "NOTICE" text file as part of its + distribution, then any Derivative Works that You distribute must + include a readable copy of the attribution notices contained + within such NOTICE file, excluding those notices that do not + pertain to any part of the Derivative Works, in at least one + of the following places: within a NOTICE text file distributed + as part of the Derivative Works; within the Source form or + documentation, if provided along with the Derivative Works; or, + within a display generated by the Derivative Works, if and + wherever such third-party notices normally appear. The contents + of the NOTICE file are for informational purposes only and + do not modify the License. You may add Your own attribution + notices within Derivative Works that You distribute, alongside + or as an addendum to the NOTICE text from the Work, provided + that such additional attribution notices cannot be construed + as modifying the License. + + You may add Your own copyright statement to Your modifications and + may provide additional or different license terms and conditions + for use, reproduction, or distribution of Your modifications, or + for any such Derivative Works as a whole, provided Your use, + reproduction, and distribution of the Work otherwise complies with + the conditions stated in this License. + + 5. Submission of Contributions. Unless You explicitly state otherwise, + any Contribution intentionally submitted for inclusion in the Work + by You to the Licensor shall be under the terms and conditions of + this License, without any additional terms or conditions. + Notwithstanding the above, nothing herein shall supersede or modify + the terms of any separate license agreement you may have executed + with Licensor regarding such Contributions. + + 6. Trademarks. This License does not grant permission to use the trade + names, trademarks, service marks, or product names of the Licensor, + except as required for reasonable and customary use in describing the + origin of the Work and reproducing the content of the NOTICE file. + + 7. Disclaimer of Warranty. Unless required by applicable law or + agreed to in writing, Licensor provides the Work (and each + Contributor provides its Contributions) on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or + implied, including, without limitation, any warranties or conditions + of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A + PARTICULAR PURPOSE. You are solely responsible for determining the + appropriateness of using or redistributing the Work and assume any + risks associated with Your exercise of permissions under this License. + + 8. Limitation of Liability. In no event and under no legal theory, + whether in tort (including negligence), contract, or otherwise, + unless required by applicable law (such as deliberate and grossly + negligent acts) or agreed to in writing, shall any Contributor be + liable to You for damages, including any direct, indirect, special, + incidental, or consequential damages of any character arising as a + result of this License or out of the use or inability to use the + Work (including but not limited to damages for loss of goodwill, + work stoppage, computer failure or malfunction, or any and all + other commercial damages or losses), even if such Contributor + has been advised of the possibility of such damages. + + 9. Accepting Warranty or Additional Liability. While redistributing + the Work or Derivative Works thereof, You may choose to offer, + and charge a fee for, acceptance of support, warranty, indemnity, + or other liability obligations and/or rights consistent with this + License. However, in accepting such obligations, You may act only + on Your own behalf and on Your sole responsibility, not on behalf + of any other Contributor, and only if You agree to indemnify, + defend, and hold each Contributor harmless for any liability + incurred by, or claims asserted against, such Contributor by reason + of your accepting any such warranty or additional liability. + + END OF TERMS AND CONDITIONS + + APPENDIX: How to apply the Apache License to your work. + + To apply the Apache License to your work, attach the following + boilerplate notice, with the fields enclosed by brackets "[]" + replaced with your own identifying information. (Don't include + the brackets!) The text should be enclosed in the appropriate + comment syntax for the file format. We also recommend that a + file or class name and description of purpose be included on the + same "printed page" as the copyright notice for easier + identification within third-party archives. + + Copyright [yyyy] [name of copyright owner] + + Licensed under the Apache License, Version 2.0 (the "License"); + you may not use this file except in compliance with the License. + You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. diff --git a/NLSpec - Grouning Document.md b/NLSpec - Grouning Document.md new file mode 100644 index 0000000..7f620a7 --- /dev/null +++ b/NLSpec - Grouning Document.md @@ -0,0 +1,220 @@ +# What an NLSpec Is + +A grounding document for agents that read, write, implement, and evaluate natural language specifications. + +--- + +## The Nature of the Artifact + +An NLSpec is a **prescriptive, generative document written in natural language that fully determines the construction of a software system**. It sits between human intent and working code. It is the authoritative source of truth from which implementation is derived — not a description of something that already exists, not a proposal for discussion, not a guide for users. It is the thing the code must satisfy. + +The relationship is directional and non-negotiable: + +**Intent → NLSpec → Implementation** + +Intent is what someone wants. It's rough, directional, incomplete. "I need a unified LLM client." "Build me a pipeline runner that uses DOT graphs." Intent contains the *why* and broad *what*, but leaves *how* and *precisely what* unresolved. + +Implementation is code. It's maximally specific, executable, testable. It contains every decision, including ones the spec author never thought about (buffer sizes, import ordering, variable names). + +The NLSpec occupies the space between. Its job is to resolve every decision that matters for correctness and interoperability while leaving every decision that doesn't matter to the implementer. This is a hard, deliberate line to draw, and drawing it well is what separates a good spec from a bad one. + +An NLSpec is not a fuzzy artifact. The word "natural language" might suggest informality. It does not. The language is natural; the precision is engineering-grade. When a spec says "the adapter must inject `cache_control` breakpoints automatically for agentic workloads," that sentence has the same binding force as a type signature. It is a requirement. The implementation either does it or violates the spec. + + +## Why NLSpecs Work When They Work + +An NLSpec enables faithful implementation — by a human or an agent — when it has five properties. These are not stylistic preferences. They are structural requirements. Remove any one and the spec fails at its job. + +### 1. Behavioral Completeness + +Every externally observable behavior of the system is specified. Not every internal implementation detail — every *behavior*. The distinction matters. + +A spec that says "the Client routes requests to the correct provider adapter based on the `provider` field" is behaviorally complete for routing. It doesn't say whether routing uses a hash map, a switch statement, or a linear scan. It doesn't need to. The behavior — correct dispatch — is fully determined. The mechanism is not, and that's fine, because the mechanism doesn't affect callers. + +A spec that says "handle errors appropriately" is behaviorally incomplete. It forces the implementer to make judgment calls about what "appropriate" means, and different implementers will make different calls, producing implementations that are mutually incompatible. That's a spec failure. + +Behavioral completeness means: **if two competent implementers independently build from this spec without communicating, their implementations are interchangeable from the perspective of any caller.** Internal structure may differ. Observable behavior does not. + +### 2. Unambiguous Interfaces + +Every boundary where components meet is specified precisely. Interfaces include: function signatures, data structures, wire formats, error types, event schemas, configuration surfaces, and state contracts. + +This is where NLSpecs do their heaviest lifting. A spec defines a `Response` record with fields `id`, `model`, `provider`, `message`, `finish_reason`, `usage`, `raw`, `warnings`, `rate_limit`. Each field has a type. Each type is defined elsewhere in the spec. The relationships between them are explicit. An implementer reading this can write the data structure in any language without guessing. + +The same precision applies to behavioral interfaces. A spec that defines an edge selection algorithm — "Step 1: condition-matching edges. Step 2: preferred label match. Step 3: suggested next IDs. Step 4: highest weight. Step 5: lexical tiebreak" — has eliminated all ambiguity about what the engine does at a routing decision point. The algorithm is deterministic. Two implementations will select the same edge given the same inputs. + +### 3. Explicit Defaults and Boundaries + +Every configurable value has a default. Every range has bounds. Every optional parameter has documented behavior when omitted. + +This sounds obvious. In practice, it's where most specs fall apart. A spec that introduces a `timeout` parameter without specifying the default, the maximum, and what happens when the timeout fires has created three implementation-divergence points. A good NLSpec closes all three: "default 10 seconds, maximum 10 minutes, on timeout: SIGTERM to process group, wait 2 seconds, SIGKILL, return partial output with timeout message." + +Defaults are not details. Defaults are the behavior most users experience. Leaving them unspecified means leaving the most common case to the implementer's imagination. + +### 4. Mapping Tables for Translation + +When a system interacts with multiple external systems that model the same concept differently, the spec provides explicit translation tables. + +A unified LLM client talks to OpenAI, Anthropic, and Gemini. Each has a different name for "the model stopped generating." OpenAI says `stop`. Anthropic says `end_turn`. Gemini says `STOP`. The spec doesn't say "map these appropriately." It provides a table: OpenAI `stop` → unified `stop`. Anthropic `end_turn` → unified `stop`. Gemini `STOP` → unified `stop`. Row by row, exhaustively. + +This matters because translation is where subtle bugs hide. An implementer who doesn't know that Gemini has no dedicated "tool_calls" finish reason will fail to detect tool calls in Gemini responses. The spec prevents this by specifying: "Gemini does not have a dedicated 'tool_calls' finish reason. The adapter infers it from the presence of `functionCall` parts in the response." That sentence eliminates an entire class of bugs. + +### 5. Testable Acceptance Criteria + +The spec includes a "Definition of Done" — a concrete, checkable list of properties the implementation must satisfy. This is not a test plan (which defines *how* to test). It's a contract: *what must be true* when the implementation is correct. + +Good acceptance criteria are binary. "Simple text generation works across all providers" is checkable. "The system handles errors well" is not. The cross-provider parity matrix — a grid of test cases × providers where every cell must pass — is the gold standard. It makes completeness visible and gaps impossible to ignore. + +Acceptance criteria serve a second purpose: they bound the scope. If something isn't in the Definition of Done, it's not required. An implementer who builds everything in the checklist and nothing more has done the job. This protects against scope creep and gold-plating. + + +## What an NLSpec Is Not + +Several artifacts look like NLSpecs. They share surface features — prose, technical content, descriptions of systems. They are not the same thing. The distinctions are functional, not stylistic. + +**A README** describes something that exists. An NLSpec describes something that should be built. The arrow of authority is reversed: a README is derived from code; code is derived from an NLSpec. When code and README conflict, you update the README. When code and NLSpec conflict, you update the code. + +**A design document** proposes an approach for evaluation. It says "here's how we could build this; let's discuss." An NLSpec says "here's how this will be built; go build it." A design doc invites disagreement before commitment. An NLSpec is the result of that disagreement being resolved. Shipping a design doc to an implementer produces negotiation. Shipping an NLSpec produces code. + +**An architecture decision record (ADR)** captures *why* a decision was made, after the decision is made. An NLSpec captures *what* was decided and *how it behaves*. ADRs are retrospective. NLSpecs are prospective. A good NLSpec may include rationale (the Attractor specs have "Design Decision Rationale" appendices), but the rationale serves understanding, not authority. The spec is authoritative whether or not you agree with the rationale. + +**API reference documentation** describes the surface of an existing system for consumers. An NLSpec describes the surface *and internals* of a system that doesn't exist yet, for builders. API docs say "here's what you can call and what it returns." NLSpecs say that, plus "here's the algorithm inside, here's the error hierarchy, here's the retry policy, here's the provider translation table, here's what happens at every edge case." + +**A test plan** defines how to verify a system. An NLSpec defines what the system must do. A test plan is derived from a spec; a spec is not derived from a test plan. The Definition of Done in a spec looks like a test plan but serves a different purpose: it's an acceptance contract, not a testing strategy. It says *what* to verify, not *how* to verify it. + +**A tutorial or guide** teaches a human how to use or build something, optimizing for learning. It may omit details, simplify, reorder for pedagogy. An NLSpec optimizes for completeness and precision, not for learning. It may be hard to read linearly. That's acceptable. A spec that sacrifices precision for readability has failed at its primary job. + +The critical distinction across all of these: **an NLSpec is the source of truth that generates an implementation. Every other document type is either upstream of it (intent, design docs) or downstream of it (API docs, READMEs, tutorials, test plans). Confusing the direction produces the wrong artifact.** + + +## What Completeness Means + +An NLSpec is complete when an agent reading it has everything needed to act correctly, and nothing is left to arbitrary choice that shouldn't be. + +This is not the same as specifying everything. A spec that dictates variable names, indentation, internal function decomposition, and memory allocation strategy is not more complete — it's more brittle and less useful. It has wasted precision on decisions that don't affect correctness or interoperability. + +Completeness operates at three levels: + +**Behavioral completeness.** Every observable behavior is determined. For any valid input, the spec determines the correct output (or the correct set of acceptable outputs, when nondeterminism is intentional). For any error condition, the spec determines the error type, whether it's retryable, and what the caller sees. + +**Interface completeness.** Every point where two components meet is defined with enough precision that the components can be built independently and still connect correctly. Data structures have fields with types. Functions have signatures with parameter and return types. Protocols have message formats and sequencing rules. + +**Boundary completeness.** Every limit, default, timeout, maximum, and edge case is specified. What happens when the context window overflows? What happens when a tool call references an unknown tool? What happens when all retry attempts are exhausted? A complete spec has answers. + +The judgment call is always: **does this decision affect correctness or interoperability?** If yes, specify it. If no, leave it to the implementer. When in doubt, specify it. Over-specification is less harmful than under-specification, because over-specification produces implementations that are merely constrained, while under-specification produces implementations that are incompatible. + + +## Intentional vs. Accidental Ambiguity + +Every NLSpec contains ambiguity. Some is intentional and valuable. Some is accidental and destructive. Distinguishing the two is a core skill for anyone working with specs. + +### Intentional Ambiguity + +Intentional ambiguity is **freedom granted to the implementer in areas where the spec author has determined that any reasonable choice is acceptable.** It is characterized by: + +- **The spec could have specified this but chose not to.** The absence of a requirement is a deliberate design decision. "The library does not invent its own model namespace" — this is the spec explicitly choosing not to specify a model naming convention, and saying why. +- **Different implementations making different choices here remain interoperable.** If one implementation uses a hash map for provider routing and another uses a match statement, no caller can tell the difference. The ambiguity is below the abstraction boundary. +- **The spec often signals it explicitly.** Phrases like "implementations may," "the mechanism is not specified," "any combination of them," or "this is an implementation detail" are markers of intentional ambiguity. + +Examples: choice of programming language, internal data structure selection, HTTP client library, concurrency primitives, file system layout for source code, error message wording (as long as the error type is correct). + +### Accidental Ambiguity + +Accidental ambiguity is **a gap in the spec where the author intended to specify behavior but failed to, or where the specified behavior admits multiple incompatible interpretations.** It is characterized by: + +- **Different implementations making different choices here produce incompatible or incorrect behavior.** If one implementation retries on timeout and another doesn't, callers experience different reliability characteristics from the same spec. +- **The spec appears to assume something it never states.** "Process the messages in order" — does "in order" mean insertion order, chronological order, or priority order? The author probably meant one of these but didn't say which. +- **An implementer must guess to proceed.** Intentional ambiguity lets the implementer choose freely. Accidental ambiguity forces the implementer to guess what the author meant. The difference is felt in the implementer's experience: freedom feels like freedom; guessing feels like anxiety. + +### How to Tell the Difference + +The test is: **would two implementations that resolve this ambiguity differently be interchangeable to a caller?** + +If yes, the ambiguity is intentional or at least harmless. If no, the ambiguity is accidental and the spec needs to be tightened. + +There is a gray zone. A spec might leave ambiguous whether an error message says "File not found: /path/to/file" or "No such file: /path/to/file." Strictly, these are different outputs. Practically, no caller depends on error message text. The ambiguity is harmless. Judgment is required; the test is a guide, not a theorem. + + +## The Structure of NLSpecs in Practice + +NLSpecs share structural patterns not because someone prescribed a template, but because these patterns solve recurring problems in specification writing. + +**Progressive disclosure of detail.** Specs open with an overview (what is this system, what problem does it solve, what are the design principles) and progressively deepen into architecture, data models, algorithms, and edge cases. This isn't pedagogical ordering — it's dependency ordering. You can't understand the edge selection algorithm until you understand what nodes and edges are. You can't understand what nodes and edges are until you understand what a pipeline is. + +**Pseudocode for algorithms, prose for contracts.** When behavior is algorithmic (a loop, a selection process, a retry policy), pseudocode specifies it precisely. When behavior is contractual (an interface, a data structure, an error hierarchy), prose with structured definitions specifies it. The choice of representation matches the nature of the thing being specified. + +**Tables for mappings.** When the spec must define how concept X in system A corresponds to concept Y in system B, a table is the right representation. Tables are exhaustive, scannable, and make gaps visible. Prose descriptions of mappings hide gaps. + +**Appendices for rationale and reference.** The main body of the spec says *what*. Appendices explain *why* (design decision rationale) and provide reference material (complete attribute tables, format specifications, examples). The main body is authoritative. Appendices are supportive. + +**A Definition of Done.** The spec ends with a concrete, checkable list of acceptance criteria. This list is the operational definition of "this implementation is correct." It transforms the spec from a document you interpret into a contract you satisfy. + + +## How NLSpecs Relate to Each Other + +Complex systems require multiple NLSpecs that reference each other. The coding agent loop spec depends on the unified LLM client spec. The pipeline runner spec can use either as a backend. These relationships have rules: + +**Dependency is explicit.** A spec that depends on another says so directly: "This spec layers on top of the Unified LLM Client Specification, which handles all LLM communication." The dependency is named, the interface boundary is clear, and the importing spec states exactly which types and functions it uses from the dependency. + +**Specs don't reach into each other's internals.** The coding agent loop spec uses `Client.complete()` and `Client.stream()` from the LLM SDK. It does not reach into the SDK's provider adapters, SSE parsing, or retry logic. The boundary between specs is an API, not a gentleman's agreement. + +**Composition is through interfaces, not inheritance.** The pipeline runner doesn't extend the coding agent — it defines a `CodergenBackend` interface and says "implement this however you want; the pipeline doesn't care." This is the same principle as intentional ambiguity applied at the system level: the spec determines the contract, not the mechanism. + + +--- + +## Appendix A: When Implementing from an NLSpec + +When you are an agent (or human) building code from a spec, these are the judgment calls that matter: + +**The spec is not a suggestion.** Every requirement in the spec is a requirement. If the spec says "the adapter must generate synthetic unique IDs," you generate synthetic unique IDs. You don't decide it's unnecessary. You don't skip it because your language makes it awkward. If you believe the spec is wrong, you raise the issue — you don't silently deviate. + +**Intentional ambiguity is your design space.** Where the spec doesn't specify, you choose. Choose well: prefer simplicity, prefer the idiomatic approach for your language, prefer the choice that makes future spec compliance easier. But do choose — don't treat every unspecified detail as a question to ask. + +**The Definition of Done is your completion criterion.** When every item in the checklist passes, you're done. When items remain unchecked, you're not. Resist the urge to add features the spec doesn't require, even good ones. Your job is faithful implementation, not improvement. + +**When the spec is ambiguous and you can't tell if it's intentional:** apply the interchangeability test. If your choice doesn't affect callers, make the choice and move on. If it might affect callers, flag it. The phrase you want is: "The spec doesn't specify X. I chose Y because Z, but this is a point where the spec may need tightening." + +**Map the spec's structure to your implementation's structure.** The spec's sections often correspond to modules, packages, or files. The spec's data models correspond to types. The spec's algorithms correspond to functions. This correspondence should be legible — someone reading the spec should be able to find the corresponding code without a treasure map. + +**Provider-specific translation tables are not optional.** If the spec provides a mapping table, implement every row. Don't implement "the common ones" and plan to add others later. The table exists because every row matters; the uncommon cases are often where the subtlest bugs hide. + + +## Appendix B: When Authoring an NLSpec + +When you are writing a spec, these are the judgment calls that matter: + +**Specify behavior, not mechanism.** Say what the system does, not how it does it internally — unless the internal mechanism is load-bearing. A retry policy's backoff formula is load-bearing (it affects observable timing). A retry policy's implementation using a loop vs. recursion is not. + +**When you don't care, say so.** Intentional ambiguity should be visible. "The library does not prescribe internal data structures for provider routing" is better than silence, because silence could be an accidental gap. + +**When in doubt, over-specify.** An implementer can always choose to ignore a spec detail that turns out to be unnecessary. An implementer cannot invent a spec detail that turns out to be missing. Over-specification constrains; under-specification breaks. + +**Defaults are requirements.** Every configurable value needs a default. Every optional parameter needs defined behavior when omitted. "Default: 10 seconds" is a requirement as binding as any other. If you don't specify the default, you've left the most common user experience to chance. + +**Tables beat prose for mappings.** When you're describing how concept X maps to concept Y across multiple systems, use a table. Tables are exhaustive by visual inspection — you can see when a row is missing. Prose descriptions of the same mapping hide gaps behind sentences. + +**Include a Definition of Done.** Without it, the spec has no operational boundary. An implementer doesn't know when they're finished. A reviewer doesn't know what to check. A Definition of Done converts a document into a contract. + +**Design rationale belongs in an appendix.** The main body says *what*. Rationale says *why*. Mixing them makes the spec harder to use as a reference. An implementer re-reading the spec for the third time doesn't need to re-encounter the justification for every design choice. + +**Write for the implementer who disagrees with you.** The spec should be followable even by someone who thinks your design is wrong. This means the spec must be precise enough that "I would have done it differently" doesn't lead to "so I did it differently." Precision is the antidote to well-intentioned deviation. + + +## Appendix C: When Evaluating an NLSpec + +When you are assessing whether a spec is good — whether as a reviewer, an editor, or an agent deciding whether a spec is ready to implement from — these are the judgment calls that matter: + +**Apply the two-implementer test.** For every requirement, ask: if two competent implementers built this independently, would their implementations be interchangeable to a caller? If yes, the spec is sufficiently precise at that point. If no, find the ambiguity and flag it. + +**Check the boundaries.** The highest-risk areas in any spec are interfaces between components, default values, error handling, and edge cases. These are where accidental ambiguity hides. Read these sections with maximum skepticism. + +**Look for missing rows in mapping tables.** If a spec maps concepts across systems and a known case isn't in the table, that's a gap. If Gemini has no "tool_calls" finish reason and the finish reason mapping table doesn't account for that, the spec is incomplete. + +**Verify the Definition of Done against the spec body.** Every requirement in the spec body should have a corresponding item in the Definition of Done. Requirements that aren't testable aren't enforceable. Checklist items that don't trace back to spec requirements are scope creep. + +**Distinguish "hard to implement" from "ambiguous."** A requirement can be perfectly clear and extremely difficult. "Execute all tool calls concurrently, wait for all results, send all results in a single continuation request, preserve ordering, handle partial failures gracefully" — this is hard to implement correctly. It is not ambiguous. Don't flag difficulty as a spec problem. + +**Assess whether intentional ambiguity is actually intentional.** When the spec is silent on something, ask: is this silence deliberate (the spec author doesn't care about this choice) or accidental (the spec author forgot)? Clues: if the surrounding area is highly specified and one detail is missing, it's probably accidental. If the spec explicitly says "this is left to the implementer" or "the mechanism is not specified," it's deliberate. + +**Check temporal assumptions.** Specs that reference specific model names, API versions, or provider capabilities will go stale. Good specs handle this by providing both concrete current values (for immediate use) and the principle for updating them (so future maintainers know what to change). A spec that says "use GPT-5.2" without context is fragile. A spec that says "prefer the latest available model; at time of writing, GPT-5.2" is maintainable. From 397685c9ea39ab6a4dfe9baff6b14a9ef0d2787f Mon Sep 17 00:00:00 2001 From: TG-Techie Date: Thu, 19 Feb 2026 21:43:36 -0500 Subject: [PATCH 02/12] v0.2.1 --- NLSpec - Grouning Document.md | 48 ++++++++++++++++++++++++++++++++++- README.md | 44 ++++++++++++++++++++++++++++++++ cspell.config.yaml | 8 ++++++ 3 files changed, 99 insertions(+), 1 deletion(-) create mode 100644 README.md create mode 100644 cspell.config.yaml diff --git a/NLSpec - Grouning Document.md b/NLSpec - Grouning Document.md index 7f620a7..fbf2fa9 100644 --- a/NLSpec - Grouning Document.md +++ b/NLSpec - Grouning Document.md @@ -93,7 +93,7 @@ An NLSpec is complete when an agent reading it has everything needed to act corr This is not the same as specifying everything. A spec that dictates variable names, indentation, internal function decomposition, and memory allocation strategy is not more complete — it's more brittle and less useful. It has wasted precision on decisions that don't affect correctness or interoperability. -Completeness operates at three levels: +Completeness operates at three levels. These are not strictly disjoint — boundary completeness is a special case of behavioral completeness at the edges of the input space. The distinction is worth preserving because each level corresponds to a different failure mode in practice, and each tends to be overlooked for different reasons: **Behavioral completeness.** Every observable behavior is determined. For any valid input, the spec determines the correct output (or the correct set of acceptable outputs, when nondeterminism is intentional). For any error condition, the spec determines the error type, whether it's retryable, and what the caller sees. @@ -103,6 +103,20 @@ Completeness operates at three levels: The judgment call is always: **does this decision affect correctness or interoperability?** If yes, specify it. If no, leave it to the implementer. When in doubt, specify it. Over-specification is less harmful than under-specification, because over-specification produces implementations that are merely constrained, while under-specification produces implementations that are incompatible. +### The Precision-Completeness Distinction + +The interchangeability test — would two independent implementations be interchangeable to a caller? — measures **precision**. It asks whether the spec eliminates ambiguity at each point it addresses. But a spec can be perfectly precise at every point it addresses and still be incomplete. Precision means the spec says things clearly. Completeness means the spec says enough things. + +A spec that precisely defines request routing, response normalization, and error hierarchy — but never mentions retry behavior — is precise and incomplete. Two implementations would agree on everything the spec covers and diverge on everything it doesn't. The spec eliminated ambiguity within its scope but left scope gaps. + +**The recreatability test.** If the implementation were destroyed and only the specification remained, could a competent implementer faithfully recreate the system? This is a completeness proof. It doesn't ask whether the spec is clear (precision). It asks whether the spec is *sufficient* — whether it contains, in itself, everything needed to generate a faithful implementation. + +The recreatability test does not require the spec to mention everything. It requires that anything the spec *doesn't* mention falls cleanly into the category of intentional ambiguity — genuine implementation choices where any reasonable decision is acceptable. If a competent implementer, working from the spec alone, would reach a point where they cannot proceed without information the spec doesn't contain, and the missing information isn't a free choice — that's a completeness defect. + +This is what makes the NLSpec's claim to be a "prescriptive, generative document that fully determines the construction of a software system" falsifiable. The document asserts that the spec fully determines construction. The recreatability test is how you check that assertion. Without it, "fully determines" is a claim. With it, it's a testable property. + +The two tests are complementary. Precision without completeness produces a spec that is clear about what it covers but silent on critical behaviors. Completeness without precision produces a spec that covers everything but leaves room for incompatible interpretations. A well-formed spec passes both: it addresses everything that isn't a free choice (complete), and it addresses each thing without ambiguity (precise). + ## Intentional vs. Accidental Ambiguity @@ -115,6 +129,7 @@ Intentional ambiguity is **freedom granted to the implementer in areas where the - **The spec could have specified this but chose not to.** The absence of a requirement is a deliberate design decision. "The library does not invent its own model namespace" — this is the spec explicitly choosing not to specify a model naming convention, and saying why. - **Different implementations making different choices here remain interoperable.** If one implementation uses a hash map for provider routing and another uses a match statement, no caller can tell the difference. The ambiguity is below the abstraction boundary. - **The spec often signals it explicitly.** Phrases like "implementations may," "the mechanism is not specified," "any combination of them," or "this is an implementation detail" are markers of intentional ambiguity. +- **Boundary clarifications sharpen the edges of the spec's scope.** A spec may explicitly exclude adjacent functionality to prevent misreading — "this spec does not cover prompt construction or conversation memory" — not because those are unimportant, but because their absence from this spec is a deliberate boundary, not an accidental gap. Such exclusions are a form of intentional ambiguity: the spec is intentionally silent on those areas and is naming that silence so it isn't mistaken for an omission. Examples: choice of programming language, internal data structure selection, HTTP client library, concurrency primitives, file system layout for source code, error message wording (as long as the error type is correct). @@ -161,6 +176,27 @@ Complex systems require multiple NLSpecs that reference each other. The coding a **Composition is through interfaces, not inheritance.** The pipeline runner doesn't extend the coding agent — it defines a `CodergenBackend` interface and says "implement this however you want; the pipeline doesn't care." This is the same principle as intentional ambiguity applied at the system level: the spec determines the contract, not the mechanism. +## Specs in Iterative Development + +The `Intent → NLSpec → Implementation` chain is not a single pass. In practice — particularly in conversational and agent-assisted development — intent is revealed progressively. A user may not know what they want until they see what they don't want. Requirements emerge, shift, and sharpen as work proceeds. + +This does not weaken the spec-first principle. It changes the cadence. + +In iterative development, implementation can serve as an epistemological tool — you build something small to discover whether a concept is sound, whether an interface feels right, whether an edge case matters. This is not implementation-of-spec. It is exploration. It lives in the intent phase of the `Intent → NLSpec → Implementation` chain, even though it looks like code. Its purpose is to surface insight, not to satisfy requirements. + +When exploration reveals something real — a requirement, a constraint, a behavioral expectation — that insight enters the spec. But it enters **as if the exploration never happened.** The spec is a vacuum artifact. It is written in a world where only intent and domain knowledge exist, not prototypes. The spec does not say "based on our prototype, we discovered X." It says "X." It absorbs the insight and presents it as a freestanding requirement, authoritative on its own terms. + +The discipline is: **at no point should the implementation encode a behavioral requirement that the spec does not reflect.** When exploration produces insight, the spec absorbs it before or simultaneously with the implementation encoding it. The spec leads or keeps pace. It never trails. And when the spec absorbs a new requirement, it must do so in a way that preserves self-consistency — the spec at every point in time is a coherent, complete document, not a patchwork of incremental additions. + +This connects to the recreatability test. If the spec has successfully absorbed every insight from iterative exploration, then the exploration artifacts (prototypes, experiments, intermediate code) can be discarded. The spec alone is sufficient to regenerate the system. If it can't — if there's knowledge in the prototype that never made it into the spec — the spec has a completeness defect, regardless of how that knowledge was originally discovered. + +A spec in an iterative context is a living document, but "living" does not mean "loose." It means the spec accretes deliberately as requirements are discovered, while remaining internally coherent at every revision. Early versions may specify only core behaviors. As edge cases surface, the spec grows to cover them. This is healthy — provided each addition is deliberate and the spec remains self-consistent. + +Self-consistency is non-negotiable. A spec that has been updated piecemeal can develop internal contradictions. At any point in time, the spec must be internally coherent. If an update to one section contradicts another, the contradiction must be resolved in the spec before implementation proceeds. + +The spec captures the current state of decisions, not the history of how decisions were made. When a requirement changes — "actually, retry three times, not five" — the spec is updated to say three. Rationale for changes, if worth preserving, belongs in commit messages, ADRs, or appendix notes — not in the spec body. The spec is always a present-tense document. + + --- ## Appendix A: When Implementing from an NLSpec @@ -175,6 +211,16 @@ When you are an agent (or human) building code from a spec, these are the judgme **When the spec is ambiguous and you can't tell if it's intentional:** apply the interchangeability test. If your choice doesn't affect callers, make the choice and move on. If it might affect callers, flag it. The phrase you want is: "The spec doesn't specify X. I chose Y because Z, but this is a point where the spec may need tightening." +**Understand the failure modes of a spec.** Not all spec problems are the same, and each kind requires different handling: + +- **Ambiguity** — the spec admits multiple incompatible interpretations at a specific point. The implementer must judge which interpretation is intended, or flag it. This is the most common failure mode and is addressed by the interchangeability test above. + +- **Malformation (self-contradiction)** — the spec asserts two incompatible requirements. Section A says retry three times; Section B says retry five times. This is a document-level defect. Self-contradiction in a specification is never intentional — it is always an error in the document itself. No amount of implementer judgment resolves a contradiction, because any implementation satisfies one requirement by violating the other. Only the spec author can decide which side of the contradiction reflects actual intent. When you identify a self-contradiction, flag it and request repair. Do not pick a side silently. + +- **Incorrectness** — the spec is internally consistent but prescribes wrong behavior. The spec says "timeout after 60 seconds" and that's unambiguous, but the correct value for the domain is 10 seconds. This failure mode may not be detectable from within the spec itself — it requires domain knowledge external to the document. The onus of correctness is on the spec author, not the implementer. An implementer faithfully implementing an incorrect spec has done their job correctly; the spec was wrong, not the implementation. + +The response to each failure mode is different. Ambiguity calls for judgment. Contradiction calls for repair. Incorrectness calls for domain authority. An operating environment (project conventions, agent instructions, team process) should provide specific protocols for what to do when each failure mode is encountered — particularly when the spec author is unavailable and the defect is blocking. That operational guidance is outside the scope of this grounding document, but its necessity should be anticipated. + **Map the spec's structure to your implementation's structure.** The spec's sections often correspond to modules, packages, or files. The spec's data models correspond to types. The spec's algorithms correspond to functions. This correspondence should be legible — someone reading the spec should be able to find the corresponding code without a treasure map. **Provider-specific translation tables are not optional.** If the spec provides a mapping table, implement every row. Don't implement "the common ones" and plan to add others later. The table exists because every row matters; the uncommon cases are often where the subtlest bugs hide. diff --git a/README.md b/README.md new file mode 100644 index 0000000..4c2385a --- /dev/null +++ b/README.md @@ -0,0 +1,44 @@ +# NLSpec Grounding Document + +## What This Is + +A grounding reference that establishes what natural language +specifications (NLSpecs) are, what makes them effective, and how to +author, implement, and evaluate them. It is written for both human +engineers and AI agents. + +## Version + +0.2.1 + +## Provenance + +The NLSpec concept and term originate from StrongDM, whose work +on natural language specifications is prior art for this document. + +This document was developed independently and in parallel, arriving +at many of the same principles through convergent engineering +practice. It uses StrongDM's framing as a starting point and extends +it to address contexts their original work did not cover — +particularly iterative and conversational development workflows, +agent-oriented implementation, and the relationship between +exploratory implementation and specification authorship. + +The result shares lineage with StrongDM's NLSpec concept but carries +its own emphases and extensions. The NLSpec name is retained as the +term of art for this class of artifact. + +## Document Structure + +The grounding document (`nlspec.md`) is the primary artifact. It is +self-contained. This README provides metadata and context that do not +belong in the document body. + +## Relationship to Operational Documents + +This grounding document defines *what specs are*. Operational +documents (such as project-specific code maxims, agent instructions, +or team process docs) define *how specs are used in a specific +context*. Operational documents should reference this grounding +document for philosophical and definitional content rather than +re-deriving it. diff --git a/cspell.config.yaml b/cspell.config.yaml new file mode 100644 index 0000000..d3109ff --- /dev/null +++ b/cspell.config.yaml @@ -0,0 +1,8 @@ +version: '0.2' +ignorePaths: [] +dictionaryDefinitions: [] +dictionaries: [] +words: + - recreatability +ignoreWords: [] +import: [] From a572b08fbb0195b64be35d0eaf5b0bc591351c62 Mon Sep 17 00:00:00 2001 From: TG-Techie Date: Thu, 19 Feb 2026 21:43:56 -0500 Subject: [PATCH 03/12] fix file name --- NLSpec - Grouning Document.md => NLSpec - Grounding Document.md | 0 1 file changed, 0 insertions(+), 0 deletions(-) rename NLSpec - Grouning Document.md => NLSpec - Grounding Document.md (100%) diff --git a/NLSpec - Grouning Document.md b/NLSpec - Grounding Document.md similarity index 100% rename from NLSpec - Grouning Document.md rename to NLSpec - Grounding Document.md From 11e2aa670484817ef6fb65bdaae8dc9093c87562 Mon Sep 17 00:00:00 2001 From: TG-Techie Date: Thu, 19 Feb 2026 22:27:13 -0500 Subject: [PATCH 04/12] add prior art --- Prior Art/StrongDM - attractor-spec.md | 2083 ++++++++++++++++ .../StrongDM - coding-agent-loop-spec.md | 1451 +++++++++++ Prior Art/StrongDM - unified-llm-spec.md | 2153 +++++++++++++++++ Prior Art/TG-Techie - CodeMaxims.md | 172 ++ 4 files changed, 5859 insertions(+) create mode 100644 Prior Art/StrongDM - attractor-spec.md create mode 100644 Prior Art/StrongDM - coding-agent-loop-spec.md create mode 100644 Prior Art/StrongDM - unified-llm-spec.md create mode 100644 Prior Art/TG-Techie - CodeMaxims.md diff --git a/Prior Art/StrongDM - attractor-spec.md b/Prior Art/StrongDM - attractor-spec.md new file mode 100644 index 0000000..0846d86 --- /dev/null +++ b/Prior Art/StrongDM - attractor-spec.md @@ -0,0 +1,2083 @@ +# Attractor Specification + +A DOT-based pipeline runner that uses directed graphs (defined in Graphviz DOT syntax) to orchestrate multi-stage AI workflows. Each node in the graph is an AI task (LLM call, human review, conditional branch, parallel fan-out, etc.) and edges define the flow between them. + +--- + +## Table of Contents + +1. [Overview and Goals](#1-overview-and-goals) +2. [DOT DSL Schema](#2-dot-dsl-schema) +3. [Pipeline Execution Engine](#3-pipeline-execution-engine) +4. [Node Handlers](#4-node-handlers) +5. [State and Context](#5-state-and-context) +6. [Human-in-the-Loop (Interviewer Pattern)](#6-human-in-the-loop-interviewer-pattern) +7. [Validation and Linting](#7-validation-and-linting) +8. [Model Stylesheet](#8-model-stylesheet) +9. [Transforms and Extensibility](#9-transforms-and-extensibility) +10. [Condition Expression Language](#10-condition-expression-language) +11. [Definition of Done](#11-definition-of-done) + +--- + +## 1. Overview and Goals + +### 1.1 Problem Statement + +AI-powered software workflows -- code generation, code review, testing, deployment planning -- often require multiple LLM calls chained together with conditional logic, human approvals, and parallel execution. Without a structured orchestration layer, developers either write fragile imperative scripts or build ad-hoc state machines that are difficult to visualize, version, or debug. + +Attractor solves this by letting pipeline authors define multi-stage AI workflows as directed graphs using Graphviz DOT syntax. The graph is the workflow: nodes are tasks, edges are transitions, and attributes configure behavior. The result is a declarative, visual, version-controllable pipeline definition that an execution engine can traverse deterministically. + +### 1.2 Why DOT Syntax + +DOT is chosen as the pipeline definition format for several reasons: + +- **DOT is inherently a graph description language.** Workflow pipelines are directed graphs. Using DOT means the structure (nodes and edges) maps directly to the language's primary construct, rather than being encoded in a data format like YAML or JSON that has no native concept of graphs. +- **Existing tooling.** DOT files can be rendered to SVG/PNG with standard Graphviz tooling, giving pipeline authors immediate visual feedback. Editors, linters, and parsers already exist. +- **Declarative and human-readable.** A `.dot` file is a complete, self-contained workflow definition that can be version-controlled, diffed, and reviewed in pull requests. +- **Constrained extensibility.** By restricting to a well-defined DOT subset (directed graphs only, typed attributes, no HTML labels), the DSL remains predictable while being extensible through custom attributes. + +For reference on DOT syntax, see the Graphviz DOT language specification: https://graphviz.org/doc/info/lang.html + +### 1.3 Design Principles + +**Declarative pipelines.** The `.dot` file declares what the workflow looks like and what each stage should do. The execution engine decides how and when to run each stage. Pipeline authors do not write control flow; they declare graph structure. + +**Pluggable handlers.** Each node type (LLM call, human gate, parallel fan-out) is backed by a handler that implements a common interface. New node types are added by registering new handlers. The execution engine does not know about handler internals. + +**Checkpoint and resume.** After each node completes, the execution engine saves a serializable checkpoint. If the process crashes, execution resumes from the last checkpoint. + +**Human-in-the-loop.** The pipeline can pause at designated nodes, present choices to a human operator, and route based on the human's decision. This supports approval gates, code review, and manual override -- critical for AI workflows where automated judgment may not be sufficient. + +**Edge-based routing.** Transitions between nodes are controlled by conditions, labels, and weights on edges, with runtime condition evaluation. + +### 1.4 Layering and LLM Backends + +Attractor defines the orchestration layer: graph definition, traversal, state management, and extensibility. It does NOT require any specific LLM integration. The codergen handler (Section 4.5) needs a way to call an LLM and get a response -- how you provide that is up to you. + +The codergen handler takes a backend that conforms to the `CodergenBackend` interface (Section 4.5). What that backend does internally is entirely up to the implementor -- use the companion [Coding Agent Loop](./coding-agent-loop-spec.md) and [Unified LLM Client](./unified-llm-spec.md) specs, spawn CLI agents (Claude Code, Codex, Gemini CLI) in subprocesses, run agents in tmux panes with a manager attaching to them, call an LLM API directly, or anything else. The pipeline definition (the DOT file) does not change regardless of backend choice. + +Attractor pipelines are driven by an event stream (Section 9.6). TUI, web, and IDE frontends consume events and submit human-in-the-loop answers. The pipeline engine is headless; the presentation layer is separate. + +--- + +## 2. DOT DSL Schema + +### 2.1 Supported Subset + +Attractor accepts a strict subset of the Graphviz DOT language. The restrictions exist for predictability: one graph per file, directed edges only, no HTML labels, and typed attributes with defaults. + +### 2.2 BNF-Style Grammar + +``` +Graph ::= 'digraph' Identifier '{' Statement* '}' + +Statement ::= GraphAttrStmt + | NodeDefaults + | EdgeDefaults + | SubgraphStmt + | NodeStmt + | EdgeStmt + | GraphAttrDecl + +GraphAttrStmt ::= 'graph' AttrBlock ';'? +NodeDefaults ::= 'node' AttrBlock ';'? +EdgeDefaults ::= 'edge' AttrBlock ';'? +GraphAttrDecl ::= Identifier '=' Value ';'? + +SubgraphStmt ::= 'subgraph' Identifier? '{' Statement* '}' + +NodeStmt ::= Identifier AttrBlock? ';'? +EdgeStmt ::= Identifier ( '->' Identifier )+ AttrBlock? ';'? + +AttrBlock ::= '[' Attr ( ',' Attr )* ']' +Attr ::= Key '=' Value + +Key ::= Identifier | QualifiedId +QualifiedId ::= Identifier ( '.' Identifier )+ + +Value ::= String | Integer | Float | Boolean | Duration +Identifier ::= [A-Za-z_][A-Za-z0-9_]* +String ::= '"' ( '\\"' | '\\n' | '\\t' | '\\\\' | [^"\\] )* '"' +Integer ::= '-'? [0-9]+ +Float ::= '-'? [0-9]* '.' [0-9]+ +Boolean ::= 'true' | 'false' +Duration ::= Integer ( 'ms' | 's' | 'm' | 'h' | 'd' ) + +Direction ::= 'TB' | 'LR' | 'BT' | 'RL' +``` + +### 2.3 Key Constraints + +- **One digraph per file.** Multiple graphs, undirected graphs, and `strict` modifiers are rejected. +- **Bare identifiers for node IDs.** Node IDs must match `[A-Za-z_][A-Za-z0-9_]*`. Human-readable names go in the `label` attribute. +- **Commas required between attributes.** Inside attribute blocks, commas separate key-value pairs for unambiguous parsing. +- **Directed edges only.** `->` is the only edge operator. `--` (undirected) is rejected. +- **Comments supported.** Both `// line` and `/* block */` comments are stripped before parsing. +- **Semicolons optional.** Statement-terminating semicolons are accepted but not required. + +### 2.4 Value Types + +| Type | Syntax | Examples | +|----------|---------------------------------|--------------------------------------| +| String | Double-quoted with escapes | `"Hello world"`, `"line1\nline2"` | +| Integer | Optional sign, digits | `42`, `-1`, `0` | +| Float | Decimal number | `0.5`, `-3.14` | +| Boolean | Literal keywords | `true`, `false` | +| Duration | Integer + unit suffix | `900s`, `15m`, `2h`, `250ms`, `1d` | + +### 2.5 Graph-Level Attributes + +Graph attributes are declared in a `graph [ ... ]` block or as top-level `key = value` declarations. They configure the entire workflow. + +| Key | Type | Default | Description | +|---------------------------|----------|-----------|-------------| +| `goal` | String | `""` | Human-readable goal for the pipeline. Exposed as `$goal` in prompt templates and mirrored into the run context as `graph.goal`. | +| `label` | String | `""` | Display name for the graph (used in visualization). | +| `model_stylesheet` | String | `""` | CSS-like stylesheet for per-node LLM model/provider defaults. See Section 8. | +| `default_max_retry` | Integer | `50` | Global retry ceiling for nodes that omit `max_retries`. | +| `retry_target` | String | `""` | Node ID to jump to if exit is reached with unsatisfied goal gates. | +| `fallback_retry_target` | String | `""` | Secondary jump target if `retry_target` is missing or invalid. | +| `default_fidelity` | String | `""` | Default context fidelity mode (see Section 5.4). | + +### 2.6 Node Attributes + +| Key | Type | Default | Description | +|---------------------|----------|-----------------|-------------| +| `label` | String | node ID | Display name shown in UI, prompts, and telemetry. | +| `shape` | String | `"box"` | Graphviz shape. Determines the default handler type (see mapping table below). | +| `type` | String | `""` | Explicit handler type override. Takes precedence over shape-based resolution. | +| `prompt` | String | `""` | Primary instruction for the stage. Supports `$goal` variable expansion. Falls back to `label` if empty for LLM stages. | +| `max_retries` | Integer | `0` | Number of additional attempts beyond the initial execution. `max_retries=3` means up to 4 total executions. | +| `goal_gate` | Boolean | `false` | If `true`, this node must reach SUCCESS before the pipeline can exit. | +| `retry_target` | String | `""` | Node ID to jump to if this node fails and retries are exhausted. | +| `fallback_retry_target` | String | `""` | Secondary retry target. | +| `fidelity` | String | inherited | Context fidelity mode for this node's LLM session. See Section 5.4. | +| `thread_id` | String | derived | Explicit thread identifier for LLM session reuse under `full` fidelity. | +| `class` | String | `""` | Comma-separated class names for model stylesheet targeting. | +| `timeout` | Duration | unset | Maximum execution time for this node. | +| `llm_model` | String | inherited | LLM model identifier. Overridable by stylesheet. | +| `llm_provider` | String | auto-detected | LLM provider key. Auto-detected from model if unset. | +| `reasoning_effort` | String | `"high"` | LLM reasoning effort: `low`, `medium`, `high`. | +| `auto_status` | Boolean | `false` | If `true` and the handler writes no status, the engine auto-generates a SUCCESS outcome. | +| `allow_partial` | Boolean | `false` | Accept PARTIAL_SUCCESS when retries are exhausted instead of failing. | + +### 2.7 Edge Attributes + +| Key | Type | Default | Description | +|--------------|----------|---------|-------------| +| `label` | String | `""` | Human-facing caption and routing key. Used for preferred-label matching in edge selection. | +| `condition` | String | `""` | Boolean guard expression evaluated against the current context and outcome. See Section 10. | +| `weight` | Integer | `0` | Numeric priority for edge selection. Higher weight wins among equally eligible edges. | +| `fidelity` | String | unset | Override fidelity mode for the target node. Highest precedence in fidelity resolution. | +| `thread_id` | String | unset | Override thread ID for session reuse at the target node. | +| `loop_restart` | Boolean | `false` | When `true`, terminates the current run and re-launches with a fresh log directory. | + +### 2.8 Shape-to-Handler-Type Mapping + +The `shape` attribute on a node determines which handler executes it, unless overridden by an explicit `type` attribute. This table defines the canonical mapping: + +| Shape | Handler Type | Description | +|-------------------|-----------------------|-------------| +| `Mdiamond` | `start` | Pipeline entry point. No-op handler. Every graph must have exactly one. | +| `Msquare` | `exit` | Pipeline exit point. No-op handler. Every graph must have exactly one. | +| `box` | `codergen` | LLM task (code generation, analysis, planning). The default for all nodes without an explicit shape. | +| `hexagon` | `wait.human` | Human-in-the-loop gate. Blocks until a human selects an option. | +| `diamond` | `conditional` | Conditional routing point. Routes based on edge conditions against current context. | +| `component` | `parallel` | Parallel fan-out. Executes multiple branches concurrently. | +| `tripleoctagon` | `parallel.fan_in` | Parallel fan-in. Waits for all branches and consolidates results. | +| `parallelogram` | `tool` | External tool execution (shell command, API call). | +| `house` | `stack.manager_loop` | Supervisor loop. Orchestrates observe/steer/wait cycles over a child pipeline. | + +### 2.9 Chained Edges + +Chained edge declarations are syntactic sugar. The statement: + +``` +A -> B -> C [label="next"] +``` + +expands to two edges: + +``` +A -> B [label="next"] +B -> C [label="next"] +``` + +Edge attributes in a chained declaration apply to all edges in the chain. + +### 2.10 Subgraphs + +Subgraphs serve two purposes: **scoping defaults** and **deriving classes** for the model stylesheet. + +**Scoping defaults:** Attributes declared in a subgraph's `node [ ... ]` block apply to nodes within that subgraph unless the node explicitly overrides them. + +``` +subgraph cluster_loop { + label = "Loop A" + node [thread_id="loop-a", timeout="900s"] + + Plan [label="Plan next step"] + Implement [label="Implement", timeout="1800s"] +} +``` + +Here `Plan` inherits `thread_id="loop-a"` and `timeout="900s"`, while `Implement` inherits `thread_id` but overrides `timeout`. + +**Class derivation:** Subgraph labels can produce CSS-like classes for model stylesheet matching. Nodes inside a subgraph receive the derived class. The class name is derived by lowercasing the label, replacing spaces with hyphens, and stripping non-alphanumeric characters (except hyphens). For example, `label="Loop A"` yields class `loop-a`. + +### 2.11 Node and Edge Default Blocks + +Default blocks set baseline attributes for all subsequent nodes or edges within their scope: + +``` +node [shape=box, timeout="900s"] +edge [weight=0] +``` + +Explicit attributes on individual nodes or edges override these defaults. + +### 2.12 Class Attribute + +The `class` attribute assigns one or more CSS-like class names to a node for model stylesheet targeting: + +``` +review_code [shape=box, class="code,critical", prompt="Review the code"] +``` + +Classes are comma-separated. They can be referenced in the model stylesheet with dot-prefix selectors (`.code`, `.critical`). + +### 2.13 Minimal Examples + +**Simple linear workflow:** + +``` +digraph Simple { + graph [goal="Run tests and report"] + rankdir=LR + + start [shape=Mdiamond, label="Start"] + exit [shape=Msquare, label="Exit"] + + run_tests [label="Run Tests", prompt="Run the test suite and report results"] + report [label="Report", prompt="Summarize the test results"] + + start -> run_tests -> report -> exit +} +``` + +**Branching workflow with conditions:** + +``` +digraph Branch { + graph [goal="Implement and validate a feature"] + rankdir=LR + node [shape=box, timeout="900s"] + + start [shape=Mdiamond, label="Start"] + exit [shape=Msquare, label="Exit"] + plan [label="Plan", prompt="Plan the implementation"] + implement [label="Implement", prompt="Implement the plan"] + validate [label="Validate", prompt="Run tests"] + gate [shape=diamond, label="Tests passing?"] + + start -> plan -> implement -> validate -> gate + gate -> exit [label="Yes", condition="outcome=success"] + gate -> implement [label="No", condition="outcome!=success"] +} +``` + +**Human gate:** + +``` +digraph Review { + rankdir=LR + + start [shape=Mdiamond, label="Start"] + exit [shape=Msquare, label="Exit"] + + review_gate [ + shape=hexagon, + label="Review Changes", + type="wait.human" + ] + + start -> review_gate + review_gate -> ship_it [label="[A] Approve"] + review_gate -> fixes [label="[F] Fix"] + ship_it -> exit + fixes -> review_gate +} +``` + +--- + +## 3. Pipeline Execution Engine + +### 3.1 Run Lifecycle + +The execution lifecycle proceeds through five phases: + +``` +PARSE -> VALIDATE -> INITIALIZE -> EXECUTE -> FINALIZE +``` + +1. **Parse:** Read the `.dot` source and produce an in-memory Graph model (nodes, edges, attributes). +2. **Validate:** Run lint rules (Section 7). Reject invalid graphs. Warn on suspicious patterns. +3. **Initialize:** Create the run directory, initial context, and checkpoint. Mirror graph attributes into the context. Apply transforms (stylesheet, variable expansion). +4. **Execute:** Traverse the graph from the start node, executing handlers and selecting edges. +5. **Finalize:** Write the final checkpoint, emit completion events, and clean up resources (close sessions, release files). + +### 3.2 Core Execution Loop + +The following pseudocode defines the execution engine's traversal algorithm. This is the heart of the system. + +``` +FUNCTION run(graph, config): + context = new Context() + mirror_graph_attributes(graph, context) + checkpoint = new Checkpoint() + completed_nodes = [] + node_outcomes = {} + + current_node = find_start_node(graph) + -- Resolves by: (1) shape=Mdiamond, (2) id="start" or "Start" + -- Raises error if not found + + WHILE true: + node = graph.nodes[current_node.id] + + -- Step 1: Check for terminal node + IF is_terminal(node): + gate_ok, failed_gate = check_goal_gates(graph, node_outcomes) + IF NOT gate_ok AND failed_gate exists: + retry_target = get_retry_target(failed_gate, graph) + IF retry_target exists: + current_node = graph.nodes[retry_target] + CONTINUE + ELSE: + RAISE "Goal gate unsatisfied and no retry target" + BREAK -- Exit the loop; pipeline complete + + -- Step 2: Execute node handler with retry policy + retry_policy = build_retry_policy(node, graph) + outcome = execute_with_retry(node, context, graph, retry_policy) + + -- Step 3: Record completion + completed_nodes.append(node.id) + node_outcomes[node.id] = outcome + + -- Step 4: Apply context updates from outcome + FOR EACH (key, value) IN outcome.context_updates: + context.set(key, value) + context.set("outcome", outcome.status) + IF outcome.preferred_label is not empty: + context.set("preferred_label", outcome.preferred_label) + + -- Step 5: Save checkpoint + checkpoint = create_checkpoint(context, current_node.id, completed_nodes) + save_checkpoint(checkpoint, logs_root) + + -- Step 6: Select next edge + next_edge = select_edge(node, outcome, context, graph) + IF next_edge is NONE: + IF outcome.status == FAIL: + RAISE "Stage failed with no outgoing fail edge" + BREAK + + -- Step 7: Handle loop_restart + IF next_edge has loop_restart=true: + restart_run(graph, config, start_at=next_edge.target) + RETURN + + -- Step 8: Advance to next node + current_node = graph.nodes[next_edge.to_node] + + RETURN last_outcome +``` + +### 3.3 Edge Selection Algorithm + +After a node completes, the engine selects the next edge from the node's outgoing edges. The selection is deterministic and follows a five-step priority order: + +**Step 1: Condition-matching edges.** Evaluate each edge's `condition` expression (see Section 10) against the current context and outcome. Edges whose condition evaluates to `true` are eligible. Edges with no condition are not considered in this step; they proceed to later steps. + +**Step 2: Preferred label match.** If the node's outcome includes a `preferred_label`, find the first eligible edge (condition-passing or unconditional) whose `label` matches after normalization. Label normalization: lowercase, trim whitespace, strip accelerator prefixes (patterns like `[Y] `, `Y) `, `Y - `). + +**Step 3: Suggested next IDs.** If no label match and the outcome includes `suggested_next_ids`, find the first eligible edge whose target node ID appears in the list. + +**Step 4: Highest weight.** Among remaining eligible unconditional edges, choose the one with the highest `weight` attribute (default 0). + +**Step 5: Lexical tiebreak.** If weights are equal, choose the edge whose target node ID comes first lexicographically. + +``` +FUNCTION select_edge(node, outcome, context, graph): + edges = graph.outgoing_edges(node.id) + IF edges is empty: + RETURN NONE + + -- Step 1: Condition matching + condition_matched = [] + FOR EACH edge IN edges: + IF edge.condition is not empty: + IF evaluate_condition(edge.condition, outcome, context) == true: + condition_matched.append(edge) + IF condition_matched is not empty: + RETURN best_by_weight_then_lexical(condition_matched) + + -- Step 2: Preferred label + IF outcome.preferred_label is not empty: + FOR EACH edge IN edges: + IF normalize_label(edge.label) == normalize_label(outcome.preferred_label): + RETURN edge + + -- Step 3: Suggested next IDs + IF outcome.suggested_next_ids is not empty: + FOR EACH suggested_id IN outcome.suggested_next_ids: + FOR EACH edge IN edges: + IF edge.to_node == suggested_id: + RETURN edge + + -- Step 4 & 5: Weight with lexical tiebreak (unconditional edges only) + unconditional = [e FOR e IN edges WHERE e.condition is empty] + IF unconditional is not empty: + RETURN best_by_weight_then_lexical(unconditional) + + -- Fallback: any edge + RETURN best_by_weight_then_lexical(edges) + + +FUNCTION best_by_weight_then_lexical(edges): + SORT edges BY (weight DESCENDING, to_node ASCENDING) + RETURN edges[0] +``` + +### 3.4 Goal Gate Enforcement + +Nodes with `goal_gate=true` represent critical stages that must succeed before the pipeline can exit. When the traversal reaches a terminal node (shape=Msquare): + +1. Check all visited nodes that have `goal_gate=true`. +2. If any goal gate node has a non-success outcome (not SUCCESS or PARTIAL_SUCCESS), the pipeline cannot exit. +3. Instead, jump to the `retry_target` of the unsatisfied goal gate node. If that is not set, try `fallback_retry_target`. If that is also not set, try the graph-level `retry_target` and `fallback_retry_target`. +4. If no retry target exists at any level, the pipeline fails with an error. + +``` +FUNCTION check_goal_gates(graph, node_outcomes): + FOR EACH (node_id, outcome) IN node_outcomes: + node = graph.nodes[node_id] + IF node.goal_gate == true: + IF outcome.status NOT IN {SUCCESS, PARTIAL_SUCCESS}: + RETURN (false, node) + RETURN (true, NONE) +``` + +### 3.5 Retry Logic + +Each node has a retry policy determined by: + +1. Node attribute `max_retries` (if set) -- number of additional attempts beyond the initial execution +2. Graph attribute `default_max_retry` (fallback) +3. Built-in default: 0 (no retries) + +The `max_retries` attribute specifies additional attempts. So `max_retries=3` means a total of 4 executions (1 initial + 3 retries). Internally this maps to `max_attempts = max_retries + 1`. + +``` +FUNCTION execute_with_retry(node, context, graph, retry_policy): + FOR attempt FROM 1 TO retry_policy.max_attempts: + TRY: + outcome = handler.execute(node, context, graph, logs_root) + CATCH exception: + IF retry_policy.should_retry(exception) AND attempt < retry_policy.max_attempts: + delay = retry_policy.backoff.delay_for_attempt(attempt) + sleep(delay) + CONTINUE + ELSE: + RETURN Outcome(status=FAIL, failure_reason=str(exception)) + + IF outcome.status IN {SUCCESS, PARTIAL_SUCCESS}: + reset_retry_counter(node.id) + RETURN outcome + + IF outcome.status == RETRY: + IF attempt < retry_policy.max_attempts: + increment_retry_counter(node.id) + delay = retry_policy.backoff.delay_for_attempt(attempt) + sleep(delay) + CONTINUE + ELSE: + IF node.allow_partial == true: + RETURN Outcome(status=PARTIAL_SUCCESS, notes="retries exhausted, partial accepted") + RETURN Outcome(status=FAIL, failure_reason="max retries exceeded") + + IF outcome.status == FAIL: + RETURN outcome + + RETURN Outcome(status=FAIL, failure_reason="max retries exceeded") +``` + +### 3.6 Retry Policy + +``` +RetryPolicy: + max_attempts : Integer -- minimum 1 (1 means no retries) + backoff : BackoffConfig -- delay calculation between retries + should_retry : Function(Error) -> Boolean -- predicate for retryable errors + +BackoffConfig: + initial_delay_ms : Integer -- first retry delay in milliseconds (default: 200) + backoff_factor : Float -- multiplier for subsequent delays (default: 2.0) + max_delay_ms : Integer -- cap on delay in milliseconds (default: 60000) + jitter : Boolean -- add random jitter to prevent thundering herd (default: true) +``` + +**Delay calculation:** + +``` +FUNCTION delay_for_attempt(attempt, config): + -- attempt is 1-indexed (first retry is attempt=1) + delay = config.initial_delay_ms * (config.backoff_factor ^ (attempt - 1)) + delay = MIN(delay, config.max_delay_ms) + IF config.jitter: + delay = delay * random_uniform(0.5, 1.5) + RETURN delay +``` + +**Preset policies:** + +| Name | Max Attempts | Initial Delay | Factor | Description | +|--------------|-------------|---------------|--------|-------------| +| `none` | 1 | -- | -- | No retries. Fail immediately on error. | +| `standard` | 5 | 200ms | 2.0 | General-purpose. Delays: 200, 400, 800, 1600, 3200ms. | +| `aggressive` | 5 | 500ms | 2.0 | For unreliable operations. Delays: 500, 1000, 2000, 4000, 8000ms. | +| `linear` | 3 | 500ms | 1.0 | Fixed delay between attempts. Delays: 500, 500, 500ms. | +| `patient` | 3 | 2000ms | 3.0 | Long-running operations. Delays: 2000, 6000, 18000ms. | + +**Default should_retry predicate:** Returns `true` for network errors, rate limit errors (HTTP 429), server errors (HTTP 5xx), and provider-reported transient failures. Returns `false` for authentication errors (HTTP 401, 403), bad request errors (HTTP 400), validation errors, and configuration errors. + +### 3.7 Failure Routing + +When a stage returns FAIL (or retries are exhausted), the engine attempts failure routing in this order: + +1. **Fail edge:** An outgoing edge with `condition="outcome=fail"`. If found, follow it. +2. **Retry target:** Node attribute `retry_target`. Jump to that node. +3. **Fallback retry target:** Node attribute `fallback_retry_target`. Jump to that node. +4. **Pipeline termination:** No failure route found. The pipeline fails with the stage's failure reason. + +### 3.8 Concurrency Model + +The graph traversal is single-threaded. Only one node executes at a time in the top-level graph. This simplifies reasoning about context state and avoids race conditions. + +Parallelism exists within specific node handlers (`parallel`, `parallel.fan_in`) that manage concurrent execution internally. Each parallel branch receives an isolated clone of the context. Branch results are collected but individual branch context changes are not merged back into the parent -- only the handler's outcome and its `context_updates` are applied. + +--- + +## 4. Node Handlers + +### 4.1 Handler Interface + +Every node handler implements a common interface. The execution engine dispatches to the appropriate handler based on the node's `type` attribute (or shape-based resolution if `type` is empty). + +``` +INTERFACE Handler: + FUNCTION execute(node, context, graph, logs_root) -> Outcome + + -- Parameters: + -- node : The parsed Node with all its attributes + -- context : The shared key-value Context for the pipeline run (read/write) + -- graph : The full parsed Graph (for reading outgoing edges, etc.) + -- logs_root : Filesystem path for this run's log/artifact directory + + -- Returns: + -- Outcome : The result of execution (see Section 5.2) +``` + +### 4.2 Handler Registry + +The handler registry maps type strings to handler instances. Resolution follows this order: + +1. **Explicit `type` attribute** on the node (e.g., `type="wait.human"`) +2. **Shape-based resolution** using the shape-to-handler-type mapping table (Section 2.8) +3. **Default handler** (the codergen/LLM handler) + +``` +HandlerRegistry: + handlers : Map -- type string -> handler instance + default_handler : Handler -- fallback handler (typically codergen) + + FUNCTION register(type_string, handler): + handlers[type_string] = handler + -- Registering for an already-registered type replaces the previous handler + + FUNCTION resolve(node) -> Handler: + -- 1. Explicit type attribute + IF node.type is not empty AND node.type IN handlers: + RETURN handlers[node.type] + + -- 2. Shape-based resolution + handler_type = SHAPE_TO_TYPE[node.shape] + IF handler_type IN handlers: + RETURN handlers[handler_type] + + -- 3. Default + RETURN default_handler +``` + +### 4.3 Start Handler + +A no-op handler for the pipeline entry point. Returns SUCCESS immediately without performing any work. + +``` +StartHandler: + FUNCTION execute(node, context, graph, logs_root) -> Outcome: + RETURN Outcome(status=SUCCESS) +``` + +Every graph must have exactly one start node (shape=Mdiamond). The lint rules enforce this. + +### 4.4 Exit Handler + +A no-op handler for the pipeline exit point. Returns SUCCESS immediately. Goal gate enforcement is handled by the execution engine (Section 3.4), not by this handler. + +``` +ExitHandler: + FUNCTION execute(node, context, graph, logs_root) -> Outcome: + RETURN Outcome(status=SUCCESS) +``` + +Every graph must have exactly one exit node (shape=Msquare). + +### 4.5 Codergen Handler (LLM Task) + +The codergen handler is the default for all nodes that invoke an LLM. It reads the node's prompt, expands template variables, calls the LLM backend (see Section 1.4 for backend options), writes the prompt and response to the logs directory, and returns the outcome. + +``` +CodergenHandler: + backend : CodergenBackend | None + -- The LLM execution backend. Any implementation of the + -- CodergenBackend interface (Section 4.5). None = simulation mode. + + FUNCTION execute(node, context, graph, logs_root) -> Outcome: + -- 1. Build prompt + prompt = node.prompt + IF prompt is empty: + prompt = node.label + prompt = expand_variables(prompt, graph, context) + + -- 2. Write prompt to logs + stage_dir = logs_root + "/" + node.id + "/" + create_directory(stage_dir) + write_file(stage_dir + "prompt.md", prompt) + + -- 3. Call LLM backend + IF backend is not NONE: + TRY: + result = backend.run(node, prompt, context) + IF result is an Outcome: + write_status(stage_dir, result) + RETURN result + response_text = string(result) + CATCH exception: + RETURN Outcome(status=FAIL, failure_reason=str(exception)) + ELSE: + response_text = "[Simulated] Response for stage: " + node.id + + -- 4. Write response to logs + write_file(stage_dir + "response.md", response_text) + + -- 5. Write status and return outcome + outcome = Outcome( + status=SUCCESS, + notes="Stage completed: " + node.id, + context_updates={ + "last_stage": node.id, + "last_response": truncate(response_text, 200) + } + ) + write_status(stage_dir, outcome) + RETURN outcome +``` + +**Variable expansion:** The only built-in template variable is `$goal`, which resolves to the graph-level `goal` attribute. Variable expansion is simple string replacement, not a templating engine. + +**Status file:** The handler writes `status.json` in the stage directory with the Outcome fields serialized as JSON. This file serves as an audit trail and enables the status-file contract: external tools or agents can write `status.json` to communicate outcomes back to the engine. + +#### CodergenBackend Interface + +``` +INTERFACE CodergenBackend: + FUNCTION run(node: Node, prompt: String, context: Context) -> String | Outcome +``` + +How you implement this interface is up to you. The pipeline engine only cares that it gets a String or Outcome back. + +### 4.6 Wait For Human Handler + +Blocks pipeline execution until a human selects an option derived from the node's outgoing edges. This implements the human-in-the-loop pattern (see Section 6 for the full Interviewer protocol). + +``` +WaitForHumanHandler: + interviewer : Interviewer -- the human interaction frontend + + FUNCTION execute(node, context, graph, logs_root) -> Outcome: + -- 1. Derive choices from outgoing edges + edges = graph.outgoing_edges(node.id) + choices = [] + FOR EACH edge IN edges: + label = edge.label OR edge.to_node + key = parse_accelerator_key(label) + choices.append(Choice(key=key, label=label, to=edge.to_node)) + + IF choices is empty: + RETURN Outcome(status=FAIL, failure_reason="No outgoing edges for human gate") + + -- 2. Build question from choices + options = [Option(key=c.key, label=c.label) FOR c IN choices] + question = Question( + text=node.label OR "Select an option:", + type=MULTIPLE_CHOICE, + options=options, + stage=node.id + ) + + -- 3. Present to interviewer and wait for answer + answer = interviewer.ask(question) + + -- 4. Handle timeout/skip + IF answer is TIMEOUT: + default_choice = node.attrs["human.default_choice"] + IF default_choice exists: + -- Use default + ELSE: + RETURN Outcome(status=RETRY, failure_reason="human gate timeout, no default") + + IF answer is SKIPPED: + RETURN Outcome(status=FAIL, failure_reason="human skipped interaction") + + -- 5. Find matching choice + selected = find_choice_matching(answer, choices) + IF selected is NONE: + selected = choices[0] -- fallback to first + + -- 6. Record in context and return + RETURN Outcome( + status=SUCCESS, + suggested_next_ids=[selected.to], + context_updates={ + "human.gate.selected": selected.key, + "human.gate.label": selected.label + } + ) +``` + +**Accelerator key parsing** extracts shortcut keys from edge labels using these patterns: + +| Pattern | Example | Extracted Key | +|-------------------|-------------------|---------------| +| `[K] Label` | `[Y] Yes, deploy` | `Y` | +| `K) Label` | `Y) Yes, deploy` | `Y` | +| `K - Label` | `Y - Yes, deploy` | `Y` | +| First character | `Yes, deploy` | `Y` | + +### 4.7 Conditional Handler + +For diamond-shaped nodes that act as conditional routing points. The handler itself is a no-op that returns SUCCESS; the actual routing is handled by the execution engine's edge selection algorithm (Section 3.3), which evaluates conditions on outgoing edges. + +``` +ConditionalHandler: + FUNCTION execute(node, context, graph, logs_root) -> Outcome: + RETURN Outcome( + status=SUCCESS, + notes="Conditional node evaluated: " + node.id + ) +``` + +This design keeps routing logic in the engine (where it can be deterministic and inspectable) rather than in the handler. + +### 4.8 Parallel Handler + +Fans out execution to multiple branches concurrently. Each parallel branch receives an isolated clone of the parent context and runs independently. The handler waits for all branches to complete (or applies a configurable join policy) before returning. + +``` +ParallelHandler: + FUNCTION execute(node, context, graph, logs_root) -> Outcome: + -- 1. Identify fan-out edges (all outgoing edges from this node) + branches = graph.outgoing_edges(node.id) + + -- 2. Determine join policy from node attributes + join_policy = node.attrs.get("join_policy", "wait_all") + error_policy = node.attrs.get("error_policy", "continue") + max_parallel = integer(node.attrs.get("max_parallel", "4")) + + -- 3. Execute branches concurrently with bounded parallelism + results = [] + FOR EACH branch IN branches (up to max_parallel at a time): + branch_context = context.clone() + branch_outcome = execute_subgraph(branch.to_node, branch_context, graph, logs_root) + results.append(branch_outcome) + + -- 4. Evaluate join policy + success_count = count(r FOR r IN results WHERE r.status == SUCCESS) + fail_count = count(r FOR r IN results WHERE r.status == FAIL) + + IF join_policy == "wait_all": + IF fail_count == 0: + RETURN Outcome(status=SUCCESS) + ELSE: + RETURN Outcome(status=PARTIAL_SUCCESS) + + IF join_policy == "first_success": + IF success_count > 0: + RETURN Outcome(status=SUCCESS) + ELSE: + RETURN Outcome(status=FAIL) + + -- 5. Store results in context for downstream fan-in + context.set("parallel.results", serialize_results(results)) + RETURN Outcome(status=SUCCESS) +``` + +**Join policies:** + +| Policy | Behavior | +|------------------|----------| +| `wait_all` | All branches must complete. Join satisfied when all are done. | +| `k_of_n` | At least K branches must succeed. | +| `first_success` | Join satisfied as soon as one branch succeeds. Others may be cancelled. | +| `quorum` | At least a configurable fraction of branches must succeed. | + +**Error policies:** + +| Policy | Behavior | +|---------------------|----------| +| `fail_fast` | Cancel all remaining branches on first failure. | +| `continue` | Continue remaining branches. Collect all results. | +| `ignore` | Ignore failures entirely. Return only successful results. | + +### 4.9 Fan-In Handler + +Consolidates results from a preceding parallel node and selects the best candidate. + +``` +FanInHandler: + FUNCTION execute(node, context, graph, logs_root) -> Outcome: + -- 1. Read parallel results + results = context.get("parallel.results") + IF results is empty: + RETURN Outcome(status=FAIL, failure_reason="No parallel results to evaluate") + + -- 2. Evaluate candidates + IF node.prompt is not empty: + -- LLM-based evaluation: call LLM to rank candidates + best = llm_evaluate(node.prompt, results) + ELSE: + -- Heuristic: rank by outcome status, then by score + best = heuristic_select(results) + + -- 3. Record winner in context + context_updates = { + "parallel.fan_in.best_id": best.id, + "parallel.fan_in.best_outcome": best.outcome + } + + RETURN Outcome( + status=SUCCESS, + context_updates=context_updates, + notes="Selected best candidate: " + best.id + ) + + +FUNCTION heuristic_select(candidates): + outcome_rank = {SUCCESS: 0, PARTIAL_SUCCESS: 1, RETRY: 2, FAIL: 3} + SORT candidates BY (outcome_rank[c.outcome], -c.score, c.id) + RETURN candidates[0] +``` + +Fan-in runs even when some candidates failed, as long as at least one candidate is available. Only when all candidates fail does fan-in return FAIL. + +### 4.10 Tool Handler + +Executes an external tool (shell command, API call, or other non-LLM operation) configured via node attributes. + +``` +ToolHandler: + FUNCTION execute(node, context, graph, logs_root) -> Outcome: + command = node.attrs.get("tool_command", "") + IF command is empty: + RETURN Outcome(status=FAIL, failure_reason="No tool_command specified") + + -- Execute the command + TRY: + result = run_shell_command(command, timeout=node.timeout) + RETURN Outcome( + status=SUCCESS, + context_updates={"tool.output": result.stdout}, + notes="Tool completed: " + command + ) + CATCH exception: + RETURN Outcome(status=FAIL, failure_reason=str(exception)) +``` + +### 4.11 Manager Loop Handler + +Orchestrates sprint-based iteration by supervising a child pipeline. The manager observes the child's telemetry, evaluates progress via a guard function, and optionally steers the child through intervention. + +``` +ManagerLoopHandler: + FUNCTION execute(node, context, graph, logs_root) -> Outcome: + child_dotfile = graph.attrs.get("stack.child_dotfile") + poll_interval = parse_duration(node.attrs.get("manager.poll_interval", "45s")) + max_cycles = integer(node.attrs.get("manager.max_cycles", "1000")) + stop_condition = node.attrs.get("manager.stop_condition", "") + actions = split(node.attrs.get("manager.actions", "observe,wait"), ",") + + -- 1. Auto-start child if configured + IF node.attrs.get("stack.child_autostart", "true") == "true": + start_child_pipeline(child_dotfile) + + -- 2. Observation loop + FOR cycle FROM 1 TO max_cycles: + IF "observe" IN actions: + ingest_child_telemetry(context) + + IF "steer" IN actions AND steer_cooldown_elapsed(): + steer_child(context, node) + + -- Evaluate stop conditions + child_status = context.get_string("context.stack.child.status") + IF child_status IN {"completed", "failed"}: + child_outcome = context.get_string("context.stack.child.outcome") + IF child_outcome == "success": + RETURN Outcome(status=SUCCESS, notes="Child completed") + IF child_status == "failed": + RETURN Outcome(status=FAIL, failure_reason="Child failed") + + IF stop_condition is not empty: + IF evaluate_condition(stop_condition, ..., context): + RETURN Outcome(status=SUCCESS, notes="Stop condition satisfied") + + IF "wait" IN actions: + sleep(poll_interval) + + RETURN Outcome(status=FAIL, failure_reason="Max cycles exceeded") +``` + +The manager pattern implements a **supervisor architecture** where: +- **Observe** ingests worker telemetry (active stage, outcomes, retry counts, artifacts) +- **Guard** scores worker progress and routes to continue, intervene, or escalate +- **Steer** writes intervention instructions to the child's active stage directory + +### 4.12 Custom Handlers + +New handler types are added by implementing the Handler interface and registering with the registry: + +``` +-- Define a custom handler +MyCustomHandler: + FUNCTION execute(node, context, graph, logs_root) -> Outcome: + -- Custom logic here + RETURN Outcome(status=SUCCESS) + +-- Register it +registry.register("my_custom_type", MyCustomHandler()) + +-- Reference in DOT file +my_node [type="my_custom_type", shape=box, custom_attr="value"] +``` + +**Handler contract:** +- Handlers MUST be stateless or protect shared mutable state with synchronization. +- Handler panics/exceptions MUST be caught by the engine and converted to FAIL outcomes. +- Handlers SHOULD NOT embed provider-specific logic; LLM orchestration is delegated to the integrated SDK. + +--- + +## 5. State and Context + +### 5.1 Context + +The context is a thread-safe key-value store shared across all stages during a pipeline run. It is the primary mechanism for passing data between nodes. + +``` +Context: + values : Map -- key-value store + lock : ReadWriteLock -- thread safety for parallel access + logs : List -- append-only run log + + FUNCTION set(key, value): + ACQUIRE write lock + values[key] = value + RELEASE write lock + + FUNCTION get(key, default=NONE) -> Any: + ACQUIRE read lock + result = values.get(key, default) + RELEASE read lock + RETURN result + + FUNCTION get_string(key, default="") -> String: + value = get(key) + IF value is NONE: RETURN default + RETURN string(value) + + FUNCTION append_log(entry): + ACQUIRE write lock + logs.append(entry) + RELEASE write lock + + FUNCTION snapshot() -> Map: + -- Returns a serializable copy of all values + ACQUIRE read lock + result = shallow_copy(values) + RELEASE read lock + RETURN result + + FUNCTION clone() -> Context: + -- Deep copy for parallel branch isolation + ACQUIRE read lock + new_context = new Context() + new_context.values = shallow_copy(values) + new_context.logs = copy(logs) + RELEASE read lock + RETURN new_context + + FUNCTION apply_updates(updates): + -- Merge a dictionary of updates into the context + ACQUIRE write lock + FOR EACH (key, value) IN updates: + values[key] = value + RELEASE write lock +``` + +**Built-in context keys set by the engine:** + +| Key | Type | Set By | Description | +|---------------------------------------|---------|----------|-------------| +| `outcome` | String | Engine | Last handler outcome status (`success`, `fail`, etc.) | +| `preferred_label` | String | Engine | Last handler's preferred edge label | +| `graph.goal` | String | Engine | Mirrored from graph `goal` attribute | +| `current_node` | String | Engine | ID of the currently executing node | +| `last_stage` | String | Handler | ID of the last completed stage | +| `last_response` | String | Handler | Truncated text of the last LLM response | +| `internal.retry_count.` | Integer | Engine | Retry counter for a specific node | + +**Context key namespace conventions:** + +| Prefix | Purpose | +|---------------|------------------------------------------------| +| `context.*` | Semantic state shared between nodes | +| `graph.*` | Graph attributes mirrored at initialization | +| `internal.*` | Engine bookkeeping (retry counters, timing) | +| `parallel.*` | Parallel handler state (results, counts) | +| `stack.*` | Supervisor/worker state | +| `human.gate.*`| Human interaction state | +| `work.*` | Per-item context for parallel work items | + +### 5.2 Outcome + +The outcome is the result of executing a node handler. It drives routing decisions and state updates. + +``` +Outcome: + status : StageStatus -- SUCCESS, FAIL, PARTIAL_SUCCESS, RETRY, SKIPPED + preferred_label : String -- which edge label to follow (optional) + suggested_next_ids : List -- explicit next node IDs (optional) + context_updates : Map -- key-value pairs to merge into context + notes : String -- human-readable execution summary + failure_reason : String -- reason for failure (when status is FAIL or RETRY) +``` + +**StageStatus values:** + +| Status | Meaning | +|--------------------|---------| +| `SUCCESS` | Stage completed its work. Proceed to next edge. Reset retry counter. | +| `PARTIAL_SUCCESS` | Stage completed with caveats. Treated as success for routing but notes describe what was incomplete. | +| `RETRY` | Stage requests re-execution. Engine increments retry counter and re-executes if within limits. | +| `FAIL` | Stage failed permanently. Engine looks for a fail edge or terminates the pipeline. | +| `SKIPPED` | Stage was skipped (e.g., condition not met). Proceed without recording an outcome. | + +### 5.3 Checkpoint + +A serializable snapshot of execution state, saved after each node completes. Enables crash recovery and resume. + +``` +Checkpoint: + timestamp : Timestamp -- when this checkpoint was created + current_node : String -- ID of the last completed node + completed_nodes : List -- IDs of all completed nodes in order + node_retries : Map -- retry counters per node + context_values : Map -- serialized snapshot of the context + logs : List -- run log entries + + FUNCTION save(path): + -- Serialize to JSON and write to filesystem + data = { + "timestamp": timestamp, + "current_node": current_node, + "completed_nodes": completed_nodes, + "node_retries": node_retries, + "context": serialize_to_json(context_values), + "logs": logs + } + write_json_file(path, data) + + FUNCTION load(path) -> Checkpoint: + -- Deserialize from JSON file + data = read_json_file(path) + RETURN new Checkpoint from data +``` + +**Resume behavior:** + +1. Load the checkpoint from `{logs_root}/checkpoint.json`. +2. Restore context state from `context_values`. +3. Restore `completed_nodes` to skip already-finished work. +4. Restore retry counters from `node_retries`. +5. Determine the next node to execute (the one after `current_node` in the traversal). +6. If the previous node used `full` fidelity, degrade to `summary:high` for the first resumed node, because in-memory LLM sessions cannot be serialized. After this one degraded hop, subsequent nodes may use `full` fidelity again. + +### 5.4 Context Fidelity + +Context fidelity controls how much prior conversation and state is carried into the next node's LLM session. This is a core mechanism for managing context window usage across multi-stage pipelines. + +``` +FidelityMode ::= 'full' + | 'truncate' + | 'compact' + | 'summary:low' + | 'summary:medium' + | 'summary:high' +``` + +| Mode | Session | Context Carried | Approximate Token Budget | +|------------------|---------|---------------------------------------------------------|--------------------------| +| `full` | Reused (same thread) | Full conversation history preserved | Unbounded (uses compaction) | +| `truncate` | Fresh | Minimal: only graph goal and run ID | Minimal | +| `compact` | Fresh | Structured bullet-point summary: completed stages, outcomes, key context values | Moderate | +| `summary:low` | Fresh | Brief textual summary with minimal event counts | ~600 tokens | +| `summary:medium` | Fresh | Moderate detail: recent stage outcomes, active context values, notable events | ~1500 tokens | +| `summary:high` | Fresh | Detailed: many recent events, tool call summaries, comprehensive context | ~3000 tokens | + +**Fidelity resolution precedence (highest to lowest):** + +1. Edge `fidelity` attribute (on the incoming edge) +2. Target node `fidelity` attribute +3. Graph `default_fidelity` attribute +4. Default: `compact` + +**Thread resolution (for `full` fidelity):** + +When fidelity resolves to `full`, the engine determines a thread key for session reuse: + +1. Target node `thread_id` attribute +2. Edge `thread_id` attribute +3. Graph-level default thread +4. Derived class from enclosing subgraph +5. Fallback: previous node ID + +Nodes that share the same thread key reuse the same LLM session. Nodes with different thread keys start fresh sessions. + +### 5.5 Artifact Store + +The artifact store provides named, typed storage for large stage outputs that do not belong in the context (which should contain only small scalar values for routing and checkpoint serialization). + +``` +ArtifactStore: + artifacts : Map + lock : ReadWriteLock + base_dir : String or NONE -- filesystem directory for file-backed artifacts + + FUNCTION store(artifact_id, name, data) -> ArtifactInfo: + size = byte_size(data) + is_file_backed = (size > FILE_BACKING_THRESHOLD) AND (base_dir is not NONE) + IF is_file_backed: + write data to "{base_dir}/artifacts/{artifact_id}.json" + stored_data = file_path + ELSE: + stored_data = data + info = ArtifactInfo(id=artifact_id, name=name, size=size, is_file_backed=is_file_backed) + artifacts[artifact_id] = (info, stored_data) + RETURN info + + FUNCTION retrieve(artifact_id) -> Any: + IF artifact_id NOT IN artifacts: + RAISE "Artifact not found" + (info, data) = artifacts[artifact_id] + IF info.is_file_backed: + RETURN read_json_file(data) + RETURN data + + FUNCTION has(artifact_id) -> Boolean + FUNCTION list() -> List + FUNCTION remove(artifact_id) + FUNCTION clear() + +ArtifactInfo: + id : String + name : String + size_bytes : Integer + stored_at : Timestamp + is_file_backed : Boolean +``` + +The default file-backing threshold is 100KB. Artifacts below this threshold are stored in memory; above it, they are written to disk. + +### 5.6 Run Directory Structure + +Each pipeline execution produces a directory tree for logging, checkpoints, and artifacts: + +``` +{logs_root}/ + checkpoint.json -- Serialized checkpoint after each node + manifest.json -- Pipeline metadata (name, goal, start time) + {node_id}/ + status.json -- Node execution outcome + prompt.md -- Rendered prompt sent to LLM + response.md -- LLM response text + artifacts/ + {artifact_id}.json -- File-backed artifacts +``` + +--- + +## 6. Human-in-the-Loop (Interviewer Pattern) + +### 6.1 Interviewer Interface + +All human interaction in Attractor goes through an Interviewer interface. This abstraction allows the pipeline to present questions to a human and receive answers through any frontend: CLI, web UI, Slack bot, or a programmatic queue for testing. + +``` +INTERFACE Interviewer: + FUNCTION ask(question: Question) -> Answer + FUNCTION ask_multiple(questions: List) -> List + FUNCTION inform(message: String, stage: String) -> Void +``` + +### 6.2 Question Model + +``` +Question: + text : String -- the question to present to the human + type : QuestionType -- determines the UI and valid answers + options : List