GeneralAgent - Flexible Agent System

An opinionated LangGraph-based architecture for building various types of agents. This repository provides the general-purpose agent loop implementation. Future additions will include workflow-based agents and other specialized agent types.

Current Implementation: General-purpose agent with dynamic tool calling, skill loading, and multi-model routing.

Features

Model registry & routing – register five core model classes (base, reasoning, vision, code, chat) and pick the right model per phase (plan, decompose, delegate, etc.).
Skill packages – discoverable skills/<id>/SKILL.yaml descriptors with progressive disclosure and tool allowlists.
Governed tool runtime – declarative metadata (ToolMeta) for risk tagging, global read-only utilities, and skill-scoped business tools.
Context Management ⭐ NEW – Intelligent conversation compression with progressive warnings (75% info → 85% warning → 95% auto-compress). Combines Gemini-style summarization with Kimi-style truncation for robust token management.
Document Search ⭐ OPTIMIZED – Industry best practices: BM25 ranking, jieba Chinese segmentation, 400-char smart chunking with 20% overlap. Details
MCP Integration – Model Context Protocol support with lazy server startup, manual tool control, and stdio/SSE modes. Details
LangGraph flow – plan → guard → tools → post → (decompose|delegate) → guard → tools → after → … with deliverable verification and budgets.
Delegation loop – decomposition into structured plans, delegated delegated agents with scoped tools, and per-step verification.
Observability hooks – optional LangSmith tracing + Postgres checkpointer.

Directory Layout

generalAgent/
├── agents/           # Agent factories and model resolver protocol
├── config/           # Pydantic settings objects (.env-aware)
├── graph/            # State, prompts, plan schema, routing, node factories
├── models/           # Model registry & routing heuristics
├── persistence/      # Optional checkpointer integration
├── runtime/          # High-level app assembly (`build_application`)
├── skills/           # Skill registry + loader (expects skills/<id>/SKILL.yaml)
├── telemetry/        # LangSmith / tracing configuration
└── tools/            # Base tools, business stubs, registry, skill tools

main.py shows a CLI stub that wires the app with a placeholder model resolver; replace it with real LangChain-compatible models before invoking the flow.

Configuration

All runtime configuration is sourced from .env via Pydantic BaseSettings with automatic environment variable loading.

Settings Structure

Settings (generalAgent/config/settings.py)
├── ModelRoutingSettings     # Model IDs and API credentials
├── GovernanceSettings       # Runtime controls (auto_approve, max_loops)
└── ObservabilitySettings    # Tracing, logging, persistence

Key Environment Variables

Model Configuration:

# Five model slots with flexible aliasing
MODEL_BASE=deepseek-chat                    # Or: MODEL_BASE_ID, MODEL_BASIC_ID
MODEL_BASE_API_KEY=sk-xxx                   # Or: MODEL_BASIC_API_KEY
MODEL_BASE_URL=https://api.deepseek.com     # Or: MODEL_BASIC_BASE_URL

MODEL_REASON=deepseek-reasoner              # Or: MODEL_REASON_ID, MODEL_REASONING_ID
MODEL_REASON_API_KEY=sk-xxx                 # Or: MODEL_REASONING_API_KEY
MODEL_REASON_URL=https://api.deepseek.com   # Or: MODEL_REASONING_BASE_URL

MODEL_VISION=glm-4.5v                       # Or: MODEL_VISION_ID, MODEL_MULTIMODAL_ID
MODEL_VISION_API_KEY=xxx                    # Or: MODEL_MULTIMODAL_API_KEY
MODEL_VISION_URL=https://open.bigmodel.cn/api/paas/v4

MODEL_CODE=code-pro                         # Or: MODEL_CODE_ID
MODEL_CODE_API_KEY=xxx

MODEL_CHAT=kimi-k2-0905-preview             # Or: MODEL_CHAT_ID
MODEL_CHAT_API_KEY=xxx
MODEL_CHAT_URL=https://api.moonshot.cn/v1

Governance:

AUTO_APPROVE_WRITES=false
MAX_LOOPS=100                   # Max agent loop iterations (1-500)
MAX_MESSAGE_HISTORY=40          # Message history size (10-100)

Context Management ⭐ NEW:

Automatic context compression with silent operation. When token usage exceeds 95%, the system automatically compresses older messages via LLM summarization while preserving recent context.

# Enable/disable context management
CONTEXT_MANAGEMENT_ENABLED=true

# Token monitoring thresholds
CONTEXT_INFO_THRESHOLD=0.75        # 75% - Log info message
CONTEXT_WARNING_THRESHOLD=0.85     # 85% - Log warning
CONTEXT_CRITICAL_THRESHOLD=0.95    # 95% - Trigger auto-compression

# Recent message preservation (hybrid strategy)
CONTEXT_KEEP_RECENT_RATIO=0.15     # Keep 15% of context window as recent
CONTEXT_KEEP_RECENT_MESSAGES=10    # Or keep at least 10 messages (whichever reached first)

# Compression trigger condition
CONTEXT_MIN_MESSAGES_TO_COMPRESS=15  # Minimum messages before compression

# Emergency fallback (if LLM compression fails)
CONTEXT_MAX_HISTORY=100            # Keep last 100 messages max

How it works:

Token usage monitored after each LLM call
When exceeds 95%, system routes to dedicated summarization node
Old messages compressed via LLM, recent messages preserved
Agent continues answering user's question seamlessly

User Experience: Completely silent - no notifications. Example: 302 messages (~123K tokens, 96% usage) → 13 messages (~6.5K tokens, 95% reduction).

For detailed architecture, see docs/ARCHITECTURE.md - Section 1.5

Observability:

LANGCHAIN_TRACING_V2=true
LANGCHAIN_PROJECT=my-project
LANGCHAIN_API_KEY=xxx           # Or: LANGSMITH_API_KEY
SESSION_DB_PATH=./data/sessions.db  # SQLite session storage
LOG_PROMPT_MAX_LENGTH=500       # Truncate logged prompts

Configuration Features

Automatic .env loading - All settings inherit from BaseSettings
Multiple aliases - Provider-specific names (DeepSeek: MODEL_BASIC_*, GLM: MODEL_MULTIMODAL_*, etc.)
Type validation - Pydantic validates types and ranges
No fallbacks needed - Settings load directly from environment

Usage Example

from generalAgent.config.settings import get_settings

settings = get_settings()  # Cached singleton
api_key = settings.models.reason_api_key  # Automatically from .env
max_loops = settings.governance.max_loops  # Default: 100

See CLAUDE.md - Settings Architecture for implementation details.

Skills

Skills are knowledge packages (documentation + scripts), NOT tool containers. Each skill provides:

SKILL.md - Main documentation with usage guide
scripts/ - Python scripts for specific tasks (e.g., fill_pdf_form.py)
Reference docs - Additional documentation (forms.md, reference.md, etc.)

Example structure:

skills/pdf/
├── SKILL.md           # Main skill documentation
├── forms.md           # PDF form filling guide
├── reference.md       # Advanced usage reference
└── scripts/           # Executable Python scripts
    ├── fill_fillable_fields.py
    ├── extract_form_field_info.py
    └── convert_pdf_to_images.py

When a user mentions @pdf, the system:

Loads the skill into the session workspace (symlink)
Generates a reminder for the agent to read SKILL.md
Agent reads documentation and executes scripts as needed

Important: Skills do NOT have allowed_tools - they are documentation packages that guide the agent.

Workspace Isolation

Each session gets an isolated workspace directory for safe file operations:

data/workspace/{session_id}/
├── skills/           # Symlinked skills (read-only)
│   └── pdf/
│       ├── SKILL.md
│       └── scripts/
├── uploads/          # User-uploaded files
├── outputs/          # Agent-generated files
├── temp/             # Temporary files
└── .metadata.json    # Session metadata

File operation tools:

read_file - Read files from workspace (skills/, uploads/, outputs/)
write_file - Write files to workspace (outputs/, temp/)
list_workspace_files - List workspace directory contents
run_bash_command - Execute bash commands and Python scripts (optional, disabled by default)

Security features:

Path traversal protection (cannot access files outside workspace)
Write restrictions (can only write to outputs/, temp/, uploads/)
Skills are read-only (symlinked or copied)
Automatic cleanup on exit (workspaces older than 7 days)
Manual cleanup via /clean command

File Upload

Users can upload files to the agent using #filename syntax from the uploads/ directory:

# Put files in uploads/ directory
uploads/
├── document.pdf
├── screenshot.png
└── data.txt

# Reference in conversation
You> 分析这张图 #screenshot.png
You> 处理这个文档 #document.pdf

Automatic handling:

Images (.png, .jpg, etc.): Base64 encoded + injected into message → vision model
PDFs (.pdf): Copied to workspace + auto-load @pdf skill
Text files (<10KB): Content directly injected into message
Others: Copied to workspace for agent tool processing

File type limits:

Images: 10MB
PDFs: 50MB
Text/Code: 5MB
Office docs: 20MB

See uploads/README.md for examples and detailed usage.

Tools

Core tools (always enabled):

now - Get current UTC time
todo_write, todo_read - Task tracking
delegate_task - Delegate tasks to delegated agents
read_file, write_file, list_workspace_files - File operations
fetch_web - Fetch web pages and convert to LLM-friendly markdown (Jina Reader)
web_search - Search the web with LLM-optimized results (Jina Search)

Optional tools (can be enabled via tools.yaml):

http_fetch - HTTP requests (stub, deprecated - use fetch_web instead)
extract_links - Link extraction (stub)
ask_vision - Vision perception (stub)
run_bash_command - Execute bash commands and Python scripts (disabled by default)

Tool Development:

Tools are automatically discovered by scanning generalAgent/tools/builtin/
Multiple tools can be defined in a single file using __all__ export
Configuration is managed via generalAgent/config/tools.yaml
See generalAgent/tools/builtin/file_ops.py for multi-tool file example

LangGraph Flow

generalAgent.graph.builder.build_state_graph assembles the full flow with these nodes:

plan – governed planner (scoped tools, Skill discovery).
guard – policy enforcement & HITL gate.
tools – executes actual tool calls.
post – updates active skill and allowlists.
decompose (conditional) – produces a structured plan (Pydantic validated).
delegate – runs scoped delegated agents per step.
after – verifies deliverables, advances plan, enforces budgets.

Routing helpers in generalAgent.graph.routing decide whether to decompose and when to finish loops.

Extending the System

Override the model resolver (可选)
默认情况下 build_application() 会读取 .env 并通过 langchain-openai 创建兼容的 ChatOpenAI 客户端（DeepSeek/Moonshot/GLM 等 OpenAI-style API）。如需自定义缓存、重试或使用其他 SDK，可实现 ModelResolver 并传入。
Add skills
Drop new skill folders under skills/ with SKILL.yaml, templates, scripts, etc. Call SkillRegistry.reload() when hot-reloading.
Register tools
Add tool functions/classes, register them with ToolRegistry, and maintain their ToolMeta entries.
Delegated agent catalogs & deliverables
Expand delegated agent_catalog in runtime/app.py and extend deliverable_checkers for domain-specific outputs.
Observability & persistence
Set PG_DSN for Postgres checkpoints and enable tracing via LangSmith env vars.

Testing

The project includes a comprehensive test suite organized into four tiers:

# Quick validation before commits (< 30s)
python tests/run_tests.py smoke

# Run specific test types
python tests/run_tests.py unit          # Module-level tests
python tests/run_tests.py integration   # Module interaction tests
python tests/run_tests.py e2e           # Complete business workflows

# Run all tests
python tests/run_tests.py all

# Generate coverage report
python tests/run_tests.py coverage

Test organization:

tests/smoke/ - Fast critical-path validation tests
tests/unit/ - Unit tests for individual modules (HITL, MCP, Tools, etc.)
tests/integration/ - Integration tests for module interactions
tests/e2e/ - End-to-end business workflow tests
tests/fixtures/ - Test infrastructure (test MCP servers, etc.)

For detailed testing guidelines and best practices, see docs/TESTING.md.

Documentation

Comprehensive documentation is organized into six core documents by topic and audience:

For New Users

docs/README.md - Documentation index with quick start guides and topic finder
docs/FEATURES.md - User-facing features (Workspace, @Mentions, File Upload, MCP, HITL)

For Developers

docs/DEVELOPMENT.md - Environment setup, tool/skill development, best practices
docs/ARCHITECTURE.md - Core architecture, tool system, skill system, design patterns

For Advanced Topics

docs/OPTIMIZATION.md - Performance optimization (KV Cache, Document Search, Text Indexer)
docs/TESTING.md - Comprehensive testing guide (Smoke, Unit, Integration, E2E, HITL)

Quick links:

Architecture overview → docs/ARCHITECTURE.md - Part 1
Tool development → docs/DEVELOPMENT.md - Part 2
Skill creation → docs/DEVELOPMENT.md - Part 3
Performance tuning → docs/OPTIMIZATION.md - Part 1

Note: Previous documentation has been archived in docs/archive/ with a mapping guide.

Next Steps

安装 Python 3.12，并执行 uv sync（或 pip install -e .）以拉取依赖（含 langchain-openai、python-dotenv）。
运行 python main.py 进入多轮 CLI，会基于 .env 中的模型配置初始化对话；也可在自己的脚本中调用 build_application() 后驱动 app.invoke(state)。
根据业务补充技能包与工具风险标签，增加测试覆盖治理与路由。

Recent Updates

2025-10-31

Product Requirements Document (PRD) - P2 Extension ⭐ LATEST

Extended PRD to 5,866 lines (v3.2, +22% from v3.0)
P2 Chapters (2/3):
- ✅ Product Overview (410 lines) - Framework positioning, competitive analysis, quick start guide
- ✅ Architecture Optimization (663 lines) - 8 optimization strategies with quantified metrics ⭐ NEW
  - KV Cache optimization: 70-90% token reuse, 60-80% cost reduction
  - Context auto-compression: 95% compression ratio (302 messages → 13)
  - Document indexing: First search 3s, subsequent <100ms
  - 5 troubleshooting guides with executable commands
Updated Statistics:
- Total: 43+ functional requirements
- Code references: 70+ precise file paths
- Optimization strategies: 8 production-proven techniques

PRD Evolution (2025-10-31):

v3.0 (4,789 lines): P0+P1 complete
v3.1 (5,203 lines): Added Product Overview (+410 lines)
v3.2 (5,866 lines): Added Architecture Optimization (+663 lines) ⭐ CURRENT

PRD Completion (Earlier 2025-10-31)

Completed comprehensive PRD: docs/桌面 AI 框架需求.md
P0 Core Chapters (6/6): Tool System, Skill System, Agent Templates, Agent Flow & State, HITL, Context Management
P1 Important Chapters (5/5): Model Routing, Multi-Agent Collaboration, Workspace Management, File Processing, Session Management
Complete maintenance guide, terminology glossary, and version history

Key PRD Features:

Unified chapter structure (Product Positioning → Scenarios → Requirements → NFR → Code References)
Cross-references between chapters ("See Chapter X")
Version tracking (v1.0 → v2.0 → v3.0 → v3.2)
Quality checklist and documentation maintenance guide
Production-grade optimization strategies with quantified ROI

2025-10-27

Documentation Reorganization ⭐

Consolidated 14 documents → 6 core documents (50% reduction)
Created comprehensive maintenance guide in docs/README.md
Archived old files with migration mapping

TODO Tool State Synchronization Fix ⭐

Fixed critical bug: todo_write now correctly updates state["todos"] using LangGraph Command objects
Enhanced TODO reminder to display ALL incomplete tasks with priority tags
16 comprehensive tests, 100% passing

Document Search Optimization ⭐

Upgraded with BM25 ranking, jieba Chinese segmentation, smart chunking (400 chars with 20% overlap)
Performance gains: +40-60% precision, +30-40% Chinese accuracy
Added find_files and search_file tools with index-based search

Document Reading Support

Enhanced read_file to support PDF, DOCX, XLSX, PPTX with automatic format detection
Smart preview for large files with search hints
Global MD5-based indexing system for efficient search

For complete version history and detailed technical explanations, see CHANGELOG.md.

Name		Name	Last commit message	Last commit date
Latest commit History 54 Commits
docs		docs
generalAgent		generalAgent
orchestrationAgent		orchestrationAgent
scripts		scripts
shared		shared
simpleAgent		simpleAgent
test_workspace		test_workspace
tests		tests
uploads		uploads
.env.example		.env.example
.gitignore		.gitignore
.python-version		.python-version
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
README.md		README.md
TODO.md		TODO.md
main.py		main.py
orchestration_main.py		orchestration_main.py
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
simple_main.py		simple_main.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

GeneralAgent - Flexible Agent System

Features

Directory Layout

Configuration

Settings Structure

Key Environment Variables

Configuration Features

Usage Example

Skills

Workspace Isolation

File Upload

Tools

LangGraph Flow

Extending the System

Testing

Documentation

For New Users

For Developers

For Advanced Topics

Next Steps

Recent Updates

2025-10-31

2025-10-27

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

yushaw/agentGraph

Folders and files

Latest commit

History

Repository files navigation

GeneralAgent - Flexible Agent System

Features

Directory Layout

Configuration

Settings Structure

Key Environment Variables

Configuration Features

Usage Example

Skills

Workspace Isolation

File Upload

Tools

LangGraph Flow

Extending the System

Testing

Documentation

For New Users

For Developers

For Advanced Topics

Next Steps

Recent Updates

2025-10-31

2025-10-27

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages