Skip to content

yushaw/agentGraph

Repository files navigation

GeneralAgent - Flexible Agent System

An opinionated LangGraph-based architecture for building various types of agents. This repository provides the general-purpose agent loop implementation. Future additions will include workflow-based agents and other specialized agent types.

Current Implementation: General-purpose agent with dynamic tool calling, skill loading, and multi-model routing.

Features

  • Model registry & routing – register five core model classes (base, reasoning, vision, code, chat) and pick the right model per phase (plan, decompose, delegate, etc.).
  • Skill packages – discoverable skills/<id>/SKILL.yaml descriptors with progressive disclosure and tool allowlists.
  • Governed tool runtime – declarative metadata (ToolMeta) for risk tagging, global read-only utilities, and skill-scoped business tools.
  • Context Management ⭐ NEW – Intelligent conversation compression with progressive warnings (75% info → 85% warning → 95% auto-compress). Combines Gemini-style summarization with Kimi-style truncation for robust token management.
  • Document Search ⭐ OPTIMIZED – Industry best practices: BM25 ranking, jieba Chinese segmentation, 400-char smart chunking with 20% overlap. Details
  • MCP Integration – Model Context Protocol support with lazy server startup, manual tool control, and stdio/SSE modes. Details
  • LangGraph flowplan → guard → tools → post → (decompose|delegate) → guard → tools → after → … with deliverable verification and budgets.
  • Delegation loop – decomposition into structured plans, delegated delegated agents with scoped tools, and per-step verification.
  • Observability hooks – optional LangSmith tracing + Postgres checkpointer.

Directory Layout

generalAgent/
├── agents/           # Agent factories and model resolver protocol
├── config/           # Pydantic settings objects (.env-aware)
├── graph/            # State, prompts, plan schema, routing, node factories
├── models/           # Model registry & routing heuristics
├── persistence/      # Optional checkpointer integration
├── runtime/          # High-level app assembly (`build_application`)
├── skills/           # Skill registry + loader (expects skills/<id>/SKILL.yaml)
├── telemetry/        # LangSmith / tracing configuration
└── tools/            # Base tools, business stubs, registry, skill tools

main.py shows a CLI stub that wires the app with a placeholder model resolver; replace it with real LangChain-compatible models before invoking the flow.

Configuration

All runtime configuration is sourced from .env via Pydantic BaseSettings with automatic environment variable loading.

Settings Structure

Settings (generalAgent/config/settings.py)
├── ModelRoutingSettings     # Model IDs and API credentials
├── GovernanceSettings       # Runtime controls (auto_approve, max_loops)
└── ObservabilitySettings    # Tracing, logging, persistence

Key Environment Variables

Model Configuration:

# Five model slots with flexible aliasing
MODEL_BASE=deepseek-chat                    # Or: MODEL_BASE_ID, MODEL_BASIC_ID
MODEL_BASE_API_KEY=sk-xxx                   # Or: MODEL_BASIC_API_KEY
MODEL_BASE_URL=https://api.deepseek.com     # Or: MODEL_BASIC_BASE_URL

MODEL_REASON=deepseek-reasoner              # Or: MODEL_REASON_ID, MODEL_REASONING_ID
MODEL_REASON_API_KEY=sk-xxx                 # Or: MODEL_REASONING_API_KEY
MODEL_REASON_URL=https://api.deepseek.com   # Or: MODEL_REASONING_BASE_URL

MODEL_VISION=glm-4.5v                       # Or: MODEL_VISION_ID, MODEL_MULTIMODAL_ID
MODEL_VISION_API_KEY=xxx                    # Or: MODEL_MULTIMODAL_API_KEY
MODEL_VISION_URL=https://open.bigmodel.cn/api/paas/v4

MODEL_CODE=code-pro                         # Or: MODEL_CODE_ID
MODEL_CODE_API_KEY=xxx

MODEL_CHAT=kimi-k2-0905-preview             # Or: MODEL_CHAT_ID
MODEL_CHAT_API_KEY=xxx
MODEL_CHAT_URL=https://api.moonshot.cn/v1

Governance:

AUTO_APPROVE_WRITES=false
MAX_LOOPS=100                   # Max agent loop iterations (1-500)
MAX_MESSAGE_HISTORY=40          # Message history size (10-100)

Context Management ⭐ NEW:

Automatic context compression with silent operation. When token usage exceeds 95%, the system automatically compresses older messages via LLM summarization while preserving recent context.

# Enable/disable context management
CONTEXT_MANAGEMENT_ENABLED=true

# Token monitoring thresholds
CONTEXT_INFO_THRESHOLD=0.75        # 75% - Log info message
CONTEXT_WARNING_THRESHOLD=0.85     # 85% - Log warning
CONTEXT_CRITICAL_THRESHOLD=0.95    # 95% - Trigger auto-compression

# Recent message preservation (hybrid strategy)
CONTEXT_KEEP_RECENT_RATIO=0.15     # Keep 15% of context window as recent
CONTEXT_KEEP_RECENT_MESSAGES=10    # Or keep at least 10 messages (whichever reached first)

# Compression trigger condition
CONTEXT_MIN_MESSAGES_TO_COMPRESS=15  # Minimum messages before compression

# Emergency fallback (if LLM compression fails)
CONTEXT_MAX_HISTORY=100            # Keep last 100 messages max

How it works:

  1. Token usage monitored after each LLM call
  2. When exceeds 95%, system routes to dedicated summarization node
  3. Old messages compressed via LLM, recent messages preserved
  4. Agent continues answering user's question seamlessly

User Experience: Completely silent - no notifications. Example: 302 messages (~123K tokens, 96% usage) → 13 messages (~6.5K tokens, 95% reduction).

For detailed architecture, see docs/ARCHITECTURE.md - Section 1.5

Observability:

LANGCHAIN_TRACING_V2=true
LANGCHAIN_PROJECT=my-project
LANGCHAIN_API_KEY=xxx           # Or: LANGSMITH_API_KEY
SESSION_DB_PATH=./data/sessions.db  # SQLite session storage
LOG_PROMPT_MAX_LENGTH=500       # Truncate logged prompts

Configuration Features

  • Automatic .env loading - All settings inherit from BaseSettings
  • Multiple aliases - Provider-specific names (DeepSeek: MODEL_BASIC_*, GLM: MODEL_MULTIMODAL_*, etc.)
  • Type validation - Pydantic validates types and ranges
  • No fallbacks needed - Settings load directly from environment

Usage Example

from generalAgent.config.settings import get_settings

settings = get_settings()  # Cached singleton
api_key = settings.models.reason_api_key  # Automatically from .env
max_loops = settings.governance.max_loops  # Default: 100

See CLAUDE.md - Settings Architecture for implementation details.

Skills

Skills are knowledge packages (documentation + scripts), NOT tool containers. Each skill provides:

  • SKILL.md - Main documentation with usage guide
  • scripts/ - Python scripts for specific tasks (e.g., fill_pdf_form.py)
  • Reference docs - Additional documentation (forms.md, reference.md, etc.)

Example structure:

skills/pdf/
├── SKILL.md           # Main skill documentation
├── forms.md           # PDF form filling guide
├── reference.md       # Advanced usage reference
└── scripts/           # Executable Python scripts
    ├── fill_fillable_fields.py
    ├── extract_form_field_info.py
    └── convert_pdf_to_images.py

When a user mentions @pdf, the system:

  1. Loads the skill into the session workspace (symlink)
  2. Generates a reminder for the agent to read SKILL.md
  3. Agent reads documentation and executes scripts as needed

Important: Skills do NOT have allowed_tools - they are documentation packages that guide the agent.

Workspace Isolation

Each session gets an isolated workspace directory for safe file operations:

data/workspace/{session_id}/
├── skills/           # Symlinked skills (read-only)
│   └── pdf/
│       ├── SKILL.md
│       └── scripts/
├── uploads/          # User-uploaded files
├── outputs/          # Agent-generated files
├── temp/             # Temporary files
└── .metadata.json    # Session metadata

File operation tools:

  • read_file - Read files from workspace (skills/, uploads/, outputs/)
  • write_file - Write files to workspace (outputs/, temp/)
  • list_workspace_files - List workspace directory contents
  • run_bash_command - Execute bash commands and Python scripts (optional, disabled by default)

Security features:

  • Path traversal protection (cannot access files outside workspace)
  • Write restrictions (can only write to outputs/, temp/, uploads/)
  • Skills are read-only (symlinked or copied)
  • Automatic cleanup on exit (workspaces older than 7 days)
  • Manual cleanup via /clean command

File Upload

Users can upload files to the agent using #filename syntax from the uploads/ directory:

# Put files in uploads/ directory
uploads/
├── document.pdf
├── screenshot.png
└── data.txt

# Reference in conversation
You> 分析这张图 #screenshot.png
You> 处理这个文档 #document.pdf

Automatic handling:

  • Images (.png, .jpg, etc.): Base64 encoded + injected into message → vision model
  • PDFs (.pdf): Copied to workspace + auto-load @pdf skill
  • Text files (<10KB): Content directly injected into message
  • Others: Copied to workspace for agent tool processing

File type limits:

  • Images: 10MB
  • PDFs: 50MB
  • Text/Code: 5MB
  • Office docs: 20MB

See uploads/README.md for examples and detailed usage.

Tools

Core tools (always enabled):

  • now - Get current UTC time
  • todo_write, todo_read - Task tracking
  • delegate_task - Delegate tasks to delegated agents
  • read_file, write_file, list_workspace_files - File operations
  • fetch_web - Fetch web pages and convert to LLM-friendly markdown (Jina Reader)
  • web_search - Search the web with LLM-optimized results (Jina Search)

Optional tools (can be enabled via tools.yaml):

  • http_fetch - HTTP requests (stub, deprecated - use fetch_web instead)
  • extract_links - Link extraction (stub)
  • ask_vision - Vision perception (stub)
  • run_bash_command - Execute bash commands and Python scripts (disabled by default)

Tool Development:

  • Tools are automatically discovered by scanning generalAgent/tools/builtin/
  • Multiple tools can be defined in a single file using __all__ export
  • Configuration is managed via generalAgent/config/tools.yaml
  • See generalAgent/tools/builtin/file_ops.py for multi-tool file example

LangGraph Flow

generalAgent.graph.builder.build_state_graph assembles the full flow with these nodes:

  1. plan – governed planner (scoped tools, Skill discovery).
  2. guard – policy enforcement & HITL gate.
  3. tools – executes actual tool calls.
  4. post – updates active skill and allowlists.
  5. decompose (conditional) – produces a structured plan (Pydantic validated).
  6. delegate – runs scoped delegated agents per step.
  7. after – verifies deliverables, advances plan, enforces budgets.

Routing helpers in generalAgent.graph.routing decide whether to decompose and when to finish loops.

Extending the System

  1. Override the model resolver (可选)
    默认情况下 build_application() 会读取 .env 并通过 langchain-openai 创建兼容的 ChatOpenAI 客户端(DeepSeek/Moonshot/GLM 等 OpenAI-style API)。如需自定义缓存、重试或使用其他 SDK,可实现 ModelResolver 并传入。
  2. Add skills
    Drop new skill folders under skills/ with SKILL.yaml, templates, scripts, etc. Call SkillRegistry.reload() when hot-reloading.
  3. Register tools
    Add tool functions/classes, register them with ToolRegistry, and maintain their ToolMeta entries.
  4. Delegated agent catalogs & deliverables
    Expand delegated agent_catalog in runtime/app.py and extend deliverable_checkers for domain-specific outputs.
  5. Observability & persistence
    Set PG_DSN for Postgres checkpoints and enable tracing via LangSmith env vars.

Testing

The project includes a comprehensive test suite organized into four tiers:

# Quick validation before commits (< 30s)
python tests/run_tests.py smoke

# Run specific test types
python tests/run_tests.py unit          # Module-level tests
python tests/run_tests.py integration   # Module interaction tests
python tests/run_tests.py e2e           # Complete business workflows

# Run all tests
python tests/run_tests.py all

# Generate coverage report
python tests/run_tests.py coverage

Test organization:

  • tests/smoke/ - Fast critical-path validation tests
  • tests/unit/ - Unit tests for individual modules (HITL, MCP, Tools, etc.)
  • tests/integration/ - Integration tests for module interactions
  • tests/e2e/ - End-to-end business workflow tests
  • tests/fixtures/ - Test infrastructure (test MCP servers, etc.)

For detailed testing guidelines and best practices, see docs/TESTING.md.

Documentation

Comprehensive documentation is organized into six core documents by topic and audience:

For New Users

  • docs/README.md - Documentation index with quick start guides and topic finder
  • docs/FEATURES.md - User-facing features (Workspace, @Mentions, File Upload, MCP, HITL)

For Developers

For Advanced Topics

  • docs/OPTIMIZATION.md - Performance optimization (KV Cache, Document Search, Text Indexer)
  • docs/TESTING.md - Comprehensive testing guide (Smoke, Unit, Integration, E2E, HITL)

Quick links:

Note: Previous documentation has been archived in docs/archive/ with a mapping guide.

Next Steps

  • 安装 Python 3.12,并执行 uv sync(或 pip install -e .)以拉取依赖(含 langchain-openaipython-dotenv)。
  • 运行 python main.py 进入多轮 CLI,会基于 .env 中的模型配置初始化对话;也可在自己的脚本中调用 build_application() 后驱动 app.invoke(state)
  • 根据业务补充技能包与工具风险标签,增加测试覆盖治理与路由。

Recent Updates

2025-10-31

Product Requirements Document (PRD) - P2 ExtensionLATEST

  • Extended PRD to 5,866 lines (v3.2, +22% from v3.0)
  • P2 Chapters (2/3):
    • ✅ Product Overview (410 lines) - Framework positioning, competitive analysis, quick start guide
    • ✅ Architecture Optimization (663 lines) - 8 optimization strategies with quantified metrics ⭐ NEW
      • KV Cache optimization: 70-90% token reuse, 60-80% cost reduction
      • Context auto-compression: 95% compression ratio (302 messages → 13)
      • Document indexing: First search 3s, subsequent <100ms
      • 5 troubleshooting guides with executable commands
  • Updated Statistics:
    • Total: 43+ functional requirements
    • Code references: 70+ precise file paths
    • Optimization strategies: 8 production-proven techniques

PRD Evolution (2025-10-31):

  • v3.0 (4,789 lines): P0+P1 complete
  • v3.1 (5,203 lines): Added Product Overview (+410 lines)
  • v3.2 (5,866 lines): Added Architecture Optimization (+663 lines) ⭐ CURRENT

PRD Completion (Earlier 2025-10-31)

  • Completed comprehensive PRD: docs/桌面 AI 框架需求.md
  • P0 Core Chapters (6/6): Tool System, Skill System, Agent Templates, Agent Flow & State, HITL, Context Management
  • P1 Important Chapters (5/5): Model Routing, Multi-Agent Collaboration, Workspace Management, File Processing, Session Management
  • Complete maintenance guide, terminology glossary, and version history

Key PRD Features:

  • Unified chapter structure (Product Positioning → Scenarios → Requirements → NFR → Code References)
  • Cross-references between chapters ("See Chapter X")
  • Version tracking (v1.0 → v2.0 → v3.0 → v3.2)
  • Quality checklist and documentation maintenance guide
  • Production-grade optimization strategies with quantified ROI

2025-10-27

Documentation Reorganization

  • Consolidated 14 documents → 6 core documents (50% reduction)
  • Created comprehensive maintenance guide in docs/README.md
  • Archived old files with migration mapping

TODO Tool State Synchronization Fix

  • Fixed critical bug: todo_write now correctly updates state["todos"] using LangGraph Command objects
  • Enhanced TODO reminder to display ALL incomplete tasks with priority tags
  • 16 comprehensive tests, 100% passing

Document Search Optimization

  • Upgraded with BM25 ranking, jieba Chinese segmentation, smart chunking (400 chars with 20% overlap)
  • Performance gains: +40-60% precision, +30-40% Chinese accuracy
  • Added find_files and search_file tools with index-based search

Document Reading Support

  • Enhanced read_file to support PDF, DOCX, XLSX, PPTX with automatic format detection
  • Smart preview for large files with search hints
  • Global MD5-based indexing system for efficient search

For complete version history and detailed technical explanations, see CHANGELOG.md.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •