An opinionated LangGraph-based architecture for building various types of agents. This repository provides the general-purpose agent loop implementation. Future additions will include workflow-based agents and other specialized agent types.
Current Implementation: General-purpose agent with dynamic tool calling, skill loading, and multi-model routing.
- Model registry & routing – register five core model classes (base, reasoning, vision, code, chat) and pick the right model per phase (
plan,decompose,delegate, etc.). - Skill packages – discoverable
skills/<id>/SKILL.yamldescriptors with progressive disclosure and tool allowlists. - Governed tool runtime – declarative metadata (
ToolMeta) for risk tagging, global read-only utilities, and skill-scoped business tools. - Context Management ⭐ NEW – Intelligent conversation compression with progressive warnings (75% info → 85% warning → 95% auto-compress). Combines Gemini-style summarization with Kimi-style truncation for robust token management.
- Document Search ⭐ OPTIMIZED – Industry best practices: BM25 ranking, jieba Chinese segmentation, 400-char smart chunking with 20% overlap. Details
- MCP Integration – Model Context Protocol support with lazy server startup, manual tool control, and stdio/SSE modes. Details
- LangGraph flow –
plan → guard → tools → post → (decompose|delegate) → guard → tools → after → …with deliverable verification and budgets. - Delegation loop – decomposition into structured plans, delegated delegated agents with scoped tools, and per-step verification.
- Observability hooks – optional LangSmith tracing + Postgres checkpointer.
generalAgent/
├── agents/ # Agent factories and model resolver protocol
├── config/ # Pydantic settings objects (.env-aware)
├── graph/ # State, prompts, plan schema, routing, node factories
├── models/ # Model registry & routing heuristics
├── persistence/ # Optional checkpointer integration
├── runtime/ # High-level app assembly (`build_application`)
├── skills/ # Skill registry + loader (expects skills/<id>/SKILL.yaml)
├── telemetry/ # LangSmith / tracing configuration
└── tools/ # Base tools, business stubs, registry, skill tools
main.py shows a CLI stub that wires the app with a placeholder model resolver; replace it with real LangChain-compatible models before invoking the flow.
All runtime configuration is sourced from .env via Pydantic BaseSettings with automatic environment variable loading.
Settings (generalAgent/config/settings.py)
├── ModelRoutingSettings # Model IDs and API credentials
├── GovernanceSettings # Runtime controls (auto_approve, max_loops)
└── ObservabilitySettings # Tracing, logging, persistenceModel Configuration:
# Five model slots with flexible aliasing
MODEL_BASE=deepseek-chat # Or: MODEL_BASE_ID, MODEL_BASIC_ID
MODEL_BASE_API_KEY=sk-xxx # Or: MODEL_BASIC_API_KEY
MODEL_BASE_URL=https://api.deepseek.com # Or: MODEL_BASIC_BASE_URL
MODEL_REASON=deepseek-reasoner # Or: MODEL_REASON_ID, MODEL_REASONING_ID
MODEL_REASON_API_KEY=sk-xxx # Or: MODEL_REASONING_API_KEY
MODEL_REASON_URL=https://api.deepseek.com # Or: MODEL_REASONING_BASE_URL
MODEL_VISION=glm-4.5v # Or: MODEL_VISION_ID, MODEL_MULTIMODAL_ID
MODEL_VISION_API_KEY=xxx # Or: MODEL_MULTIMODAL_API_KEY
MODEL_VISION_URL=https://open.bigmodel.cn/api/paas/v4
MODEL_CODE=code-pro # Or: MODEL_CODE_ID
MODEL_CODE_API_KEY=xxx
MODEL_CHAT=kimi-k2-0905-preview # Or: MODEL_CHAT_ID
MODEL_CHAT_API_KEY=xxx
MODEL_CHAT_URL=https://api.moonshot.cn/v1Governance:
AUTO_APPROVE_WRITES=false
MAX_LOOPS=100 # Max agent loop iterations (1-500)
MAX_MESSAGE_HISTORY=40 # Message history size (10-100)Context Management ⭐ NEW:
Automatic context compression with silent operation. When token usage exceeds 95%, the system automatically compresses older messages via LLM summarization while preserving recent context.
# Enable/disable context management
CONTEXT_MANAGEMENT_ENABLED=true
# Token monitoring thresholds
CONTEXT_INFO_THRESHOLD=0.75 # 75% - Log info message
CONTEXT_WARNING_THRESHOLD=0.85 # 85% - Log warning
CONTEXT_CRITICAL_THRESHOLD=0.95 # 95% - Trigger auto-compression
# Recent message preservation (hybrid strategy)
CONTEXT_KEEP_RECENT_RATIO=0.15 # Keep 15% of context window as recent
CONTEXT_KEEP_RECENT_MESSAGES=10 # Or keep at least 10 messages (whichever reached first)
# Compression trigger condition
CONTEXT_MIN_MESSAGES_TO_COMPRESS=15 # Minimum messages before compression
# Emergency fallback (if LLM compression fails)
CONTEXT_MAX_HISTORY=100 # Keep last 100 messages maxHow it works:
- Token usage monitored after each LLM call
- When exceeds 95%, system routes to dedicated summarization node
- Old messages compressed via LLM, recent messages preserved
- Agent continues answering user's question seamlessly
User Experience: Completely silent - no notifications. Example: 302 messages (~123K tokens, 96% usage) → 13 messages (~6.5K tokens, 95% reduction).
For detailed architecture, see docs/ARCHITECTURE.md - Section 1.5
Observability:
LANGCHAIN_TRACING_V2=true
LANGCHAIN_PROJECT=my-project
LANGCHAIN_API_KEY=xxx # Or: LANGSMITH_API_KEY
SESSION_DB_PATH=./data/sessions.db # SQLite session storage
LOG_PROMPT_MAX_LENGTH=500 # Truncate logged prompts- Automatic .env loading - All settings inherit from
BaseSettings - Multiple aliases - Provider-specific names (DeepSeek:
MODEL_BASIC_*, GLM:MODEL_MULTIMODAL_*, etc.) - Type validation - Pydantic validates types and ranges
- No fallbacks needed - Settings load directly from environment
from generalAgent.config.settings import get_settings
settings = get_settings() # Cached singleton
api_key = settings.models.reason_api_key # Automatically from .env
max_loops = settings.governance.max_loops # Default: 100See CLAUDE.md - Settings Architecture for implementation details.
Skills are knowledge packages (documentation + scripts), NOT tool containers. Each skill provides:
- SKILL.md - Main documentation with usage guide
- scripts/ - Python scripts for specific tasks (e.g.,
fill_pdf_form.py) - Reference docs - Additional documentation (forms.md, reference.md, etc.)
Example structure:
skills/pdf/
├── SKILL.md # Main skill documentation
├── forms.md # PDF form filling guide
├── reference.md # Advanced usage reference
└── scripts/ # Executable Python scripts
├── fill_fillable_fields.py
├── extract_form_field_info.py
└── convert_pdf_to_images.py
When a user mentions @pdf, the system:
- Loads the skill into the session workspace (symlink)
- Generates a reminder for the agent to read
SKILL.md - Agent reads documentation and executes scripts as needed
Important: Skills do NOT have allowed_tools - they are documentation packages that guide the agent.
Each session gets an isolated workspace directory for safe file operations:
data/workspace/{session_id}/
├── skills/ # Symlinked skills (read-only)
│ └── pdf/
│ ├── SKILL.md
│ └── scripts/
├── uploads/ # User-uploaded files
├── outputs/ # Agent-generated files
├── temp/ # Temporary files
└── .metadata.json # Session metadata
File operation tools:
read_file- Read files from workspace (skills/, uploads/, outputs/)write_file- Write files to workspace (outputs/, temp/)list_workspace_files- List workspace directory contentsrun_bash_command- Execute bash commands and Python scripts (optional, disabled by default)
Security features:
- Path traversal protection (cannot access files outside workspace)
- Write restrictions (can only write to outputs/, temp/, uploads/)
- Skills are read-only (symlinked or copied)
- Automatic cleanup on exit (workspaces older than 7 days)
- Manual cleanup via
/cleancommand
Users can upload files to the agent using #filename syntax from the uploads/ directory:
# Put files in uploads/ directory
uploads/
├── document.pdf
├── screenshot.png
└── data.txt
# Reference in conversation
You> 分析这张图 #screenshot.png
You> 处理这个文档 #document.pdfAutomatic handling:
- Images (.png, .jpg, etc.): Base64 encoded + injected into message → vision model
- PDFs (.pdf): Copied to workspace + auto-load @pdf skill
- Text files (<10KB): Content directly injected into message
- Others: Copied to workspace for agent tool processing
File type limits:
- Images: 10MB
- PDFs: 50MB
- Text/Code: 5MB
- Office docs: 20MB
See uploads/README.md for examples and detailed usage.
Core tools (always enabled):
now- Get current UTC timetodo_write,todo_read- Task trackingdelegate_task- Delegate tasks to delegated agentsread_file,write_file,list_workspace_files- File operationsfetch_web- Fetch web pages and convert to LLM-friendly markdown (Jina Reader)web_search- Search the web with LLM-optimized results (Jina Search)
Optional tools (can be enabled via tools.yaml):
http_fetch- HTTP requests (stub, deprecated - use fetch_web instead)extract_links- Link extraction (stub)ask_vision- Vision perception (stub)run_bash_command- Execute bash commands and Python scripts (disabled by default)
Tool Development:
- Tools are automatically discovered by scanning
generalAgent/tools/builtin/ - Multiple tools can be defined in a single file using
__all__export - Configuration is managed via
generalAgent/config/tools.yaml - See
generalAgent/tools/builtin/file_ops.pyfor multi-tool file example
generalAgent.graph.builder.build_state_graph assembles the full flow with these nodes:
- plan – governed planner (scoped tools, Skill discovery).
- guard – policy enforcement & HITL gate.
- tools – executes actual tool calls.
- post – updates active skill and allowlists.
- decompose (conditional) – produces a structured plan (Pydantic validated).
- delegate – runs scoped delegated agents per step.
- after – verifies deliverables, advances plan, enforces budgets.
Routing helpers in generalAgent.graph.routing decide whether to decompose and when to finish loops.
- Override the model resolver (可选)
默认情况下build_application()会读取.env并通过langchain-openai创建兼容的ChatOpenAI客户端(DeepSeek/Moonshot/GLM 等 OpenAI-style API)。如需自定义缓存、重试或使用其他 SDK,可实现ModelResolver并传入。 - Add skills
Drop new skill folders underskills/withSKILL.yaml, templates, scripts, etc. CallSkillRegistry.reload()when hot-reloading. - Register tools
Add tool functions/classes, register them withToolRegistry, and maintain theirToolMetaentries. - Delegated agent catalogs & deliverables
Expanddelegated agent_cataloginruntime/app.pyand extenddeliverable_checkersfor domain-specific outputs. - Observability & persistence
SetPG_DSNfor Postgres checkpoints and enable tracing via LangSmith env vars.
The project includes a comprehensive test suite organized into four tiers:
# Quick validation before commits (< 30s)
python tests/run_tests.py smoke
# Run specific test types
python tests/run_tests.py unit # Module-level tests
python tests/run_tests.py integration # Module interaction tests
python tests/run_tests.py e2e # Complete business workflows
# Run all tests
python tests/run_tests.py all
# Generate coverage report
python tests/run_tests.py coverageTest organization:
tests/smoke/- Fast critical-path validation teststests/unit/- Unit tests for individual modules (HITL, MCP, Tools, etc.)tests/integration/- Integration tests for module interactionstests/e2e/- End-to-end business workflow teststests/fixtures/- Test infrastructure (test MCP servers, etc.)
For detailed testing guidelines and best practices, see docs/TESTING.md.
Comprehensive documentation is organized into six core documents by topic and audience:
- docs/README.md - Documentation index with quick start guides and topic finder
- docs/FEATURES.md - User-facing features (Workspace, @Mentions, File Upload, MCP, HITL)
- docs/DEVELOPMENT.md - Environment setup, tool/skill development, best practices
- docs/ARCHITECTURE.md - Core architecture, tool system, skill system, design patterns
- docs/OPTIMIZATION.md - Performance optimization (KV Cache, Document Search, Text Indexer)
- docs/TESTING.md - Comprehensive testing guide (Smoke, Unit, Integration, E2E, HITL)
Quick links:
- Architecture overview → docs/ARCHITECTURE.md - Part 1
- Tool development → docs/DEVELOPMENT.md - Part 2
- Skill creation → docs/DEVELOPMENT.md - Part 3
- Performance tuning → docs/OPTIMIZATION.md - Part 1
Note: Previous documentation has been archived in docs/archive/ with a mapping guide.
- 安装 Python 3.12,并执行
uv sync(或pip install -e .)以拉取依赖(含langchain-openai、python-dotenv)。 - 运行
python main.py进入多轮 CLI,会基于.env中的模型配置初始化对话;也可在自己的脚本中调用build_application()后驱动app.invoke(state)。 - 根据业务补充技能包与工具风险标签,增加测试覆盖治理与路由。
Product Requirements Document (PRD) - P2 Extension ⭐ LATEST
- Extended PRD to 5,866 lines (v3.2, +22% from v3.0)
- P2 Chapters (2/3):
- ✅ Product Overview (410 lines) - Framework positioning, competitive analysis, quick start guide
- ✅ Architecture Optimization (663 lines) - 8 optimization strategies with quantified metrics ⭐ NEW
- KV Cache optimization: 70-90% token reuse, 60-80% cost reduction
- Context auto-compression: 95% compression ratio (302 messages → 13)
- Document indexing: First search 3s, subsequent <100ms
- 5 troubleshooting guides with executable commands
- Updated Statistics:
- Total: 43+ functional requirements
- Code references: 70+ precise file paths
- Optimization strategies: 8 production-proven techniques
PRD Evolution (2025-10-31):
- v3.0 (4,789 lines): P0+P1 complete
- v3.1 (5,203 lines): Added Product Overview (+410 lines)
- v3.2 (5,866 lines): Added Architecture Optimization (+663 lines) ⭐ CURRENT
PRD Completion (Earlier 2025-10-31)
- Completed comprehensive PRD:
docs/桌面 AI 框架需求.md - P0 Core Chapters (6/6): Tool System, Skill System, Agent Templates, Agent Flow & State, HITL, Context Management
- P1 Important Chapters (5/5): Model Routing, Multi-Agent Collaboration, Workspace Management, File Processing, Session Management
- Complete maintenance guide, terminology glossary, and version history
Key PRD Features:
- Unified chapter structure (Product Positioning → Scenarios → Requirements → NFR → Code References)
- Cross-references between chapters ("See Chapter X")
- Version tracking (v1.0 → v2.0 → v3.0 → v3.2)
- Quality checklist and documentation maintenance guide
- Production-grade optimization strategies with quantified ROI
Documentation Reorganization ⭐
- Consolidated 14 documents → 6 core documents (50% reduction)
- Created comprehensive maintenance guide in docs/README.md
- Archived old files with migration mapping
TODO Tool State Synchronization Fix ⭐
- Fixed critical bug:
todo_writenow correctly updatesstate["todos"]using LangGraphCommandobjects - Enhanced TODO reminder to display ALL incomplete tasks with priority tags
- 16 comprehensive tests, 100% passing
Document Search Optimization ⭐
- Upgraded with BM25 ranking, jieba Chinese segmentation, smart chunking (400 chars with 20% overlap)
- Performance gains: +40-60% precision, +30-40% Chinese accuracy
- Added
find_filesandsearch_filetools with index-based search
Document Reading Support
- Enhanced
read_fileto support PDF, DOCX, XLSX, PPTX with automatic format detection - Smart preview for large files with search hints
- Global MD5-based indexing system for efficient search
For complete version history and detailed technical explanations, see CHANGELOG.md.