Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion content/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ sections:
text: Read the Technical Report
url: blog/technical-deep-dive/
announcement:
text: "Sirchmunk v0.0.6post1OpenClaw Skill, SSE Streaming API & Multi-turn Conversation"
text: "Sirchmunk v0.0.8Knowledge Compile (Beta), DEEP Mode Generalization & I/O Optimization"
link:
text: "View all releases"
url: "https://github.com/modelscope/sirchmunk/releases"
Expand Down
2 changes: 1 addition & 1 deletion content/_index.zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ sections:
text: 阅读技术报告
url: blog/technical-deep-dive/
announcement:
text: "Sirchmunk v0.0.6post1OpenClaw 技能支持、SSE 流式 API 与多轮对话"
text: "Sirchmunk v0.0.8知识编译(Beta)、DEEP 模式泛化增强与 I/O 优化"
link:
text: "查看所有版本"
url: "https://github.com/modelscope/sirchmunk/releases"
Expand Down
9 changes: 9 additions & 0 deletions content/docs/guide/architecture.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,8 @@ Sirchmunk's architecture is organized into cleanly separated layers, following t
| **EvidenceProcessor** | Evidence processing based on the Monte Carlo Importance Sampling |
| **GrepRetriever** | High-performance _indexless_ file search with parallel processing |
| **OpenAIChat** | Unified LLM interface supporting streaming and usage tracking |
| **KnowledgeCompiler** | Offline document compilation into tree indices and knowledge clusters (Beta) |
| **KnowledgeLint** | Knowledge health checks and auto-repair |
| **MonitorTracker** | Real-time system and application metrics collection |

## Multi-Phase Search Pipeline
Expand Down Expand Up @@ -166,6 +168,13 @@ All persistent data is stored in the configured `SIRCHMUNK_WORK_PATH` (default:
│ └── chat_history.db
├── knowledge/ # Knowledge clusters (Parquet)
│ └── knowledge_clusters.parquet
├── compile/ # Compile artifacts (Beta)
│ ├── manifest.json # File manifest with hashes
│ ├── document_catalog.json
│ ├── summary_index.json
│ ├── trees/ # Hierarchical tree indices
│ ├── table_digests/ # Table extraction digests
│ └── xlsx_digests/ # Spreadsheet digests
└── settings/ # User settings (DuckDB)
└── settings.db
```
Expand Down
9 changes: 9 additions & 0 deletions content/docs/guide/architecture.zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,8 @@ Sirchmunk 采用清晰分离的层次化架构,遵循**关注点分离**原则
| **EvidenceProcessor** | 基于蒙特卡洛重要性采样的证据处理 |
| **GrepRetriever** | 高性能 _无索引_ 文件检索,支持并行处理 |
| **OpenAIChat** | 统一 LLM 接口,支持流式与用量统计 |
| **KnowledgeCompiler** | 离线文档编译为树索引和知识簇(Beta) |
| **KnowledgeLint** | 知识健康检查与自动修复 |
| **MonitorTracker** | 实时系统与应用指标采集 |

## 多阶段搜索管线
Expand Down Expand Up @@ -166,6 +168,13 @@ KnowledgeCluster 是一个丰富标注的对象,完整记录了单次搜索周
│ └── chat_history.db
├── knowledge/ # 知识簇(Parquet)
│ └── knowledge_clusters.parquet
├── compile/ # 编译产物(Beta)
│ ├── manifest.json # 文件清单与哈希
│ ├── document_catalog.json
│ ├── summary_index.json
│ ├── trees/ # 层次化树索引
│ ├── table_digests/ # 表格提取摘要
│ └── xlsx_digests/ # 电子表格摘要
└── settings/ # 用户设置(DuckDB)
└── settings.db
```
Expand Down
42 changes: 41 additions & 1 deletion content/docs/guide/cli.md
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@ sirchmunk search "How does authentication work?"
# Search in specific paths
sirchmunk search "find all API endpoints" ./src ./docs

# DEEP mode: comprehensive analysis with Monte Carlo sampling
# DEEP mode: comprehensive agentic retrieval analysis
sirchmunk search "database architecture" --mode DEEP

# Quick filename search (no LLM required)
Expand Down Expand Up @@ -103,6 +103,45 @@ sirchmunk mcp serve
sirchmunk mcp serve --transport http --port 3000
```

### `sirchmunk compile` (Beta)

Pre-process document collections into hierarchical tree indices and knowledge clusters. This is an **optional** step — search works without it, but compile artifacts can significantly boost retrieval precision for large document sets.

```bash
# Compile documents (incremental by default)
sirchmunk compile --paths /path/to/documents

# Full recompile (ignore cache)
sirchmunk compile --paths /path/to/documents --full

# Shallow mode (skip tree indexing, faster)
sirchmunk compile --paths /path/to/documents --shallow

# Check compile status
sirchmunk compile --paths /path/to/documents --status

# Run knowledge health checks
sirchmunk compile --lint --work-path ~/.sirchmunk

# Auto-fix lint issues
sirchmunk compile --lint --fix --work-path ~/.sirchmunk
Comment on lines +123 to +127

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

In the options table below, --paths is marked as (required). However, in these --lint examples, --paths is omitted. If --paths is indeed required for all compile subcommands, please add it to these examples. If it is optional when running with --lint, please update the options table to clarify that --paths is only required for compilation.

```

| Option | Description |
|--------|-------------|
| `--paths` | Directories or files to compile (required) |
| `--full` | Force full recompile, ignoring incremental cache |
| `--shallow` | Skip tree indexing, use direct LLM summarization only (faster) |
| `--max-files` | Max files to process (triggers importance sampling for large sets) |
| `--concurrency` | Max parallel file compilations (default: 3) |
| `--status` | Show compile status instead of running compile |
| `--lint` | Run knowledge health checks |
| `--fix` | Auto-fix lint issues (use with `--lint`) |
| `--work-path` | Working directory (default: `~/.sirchmunk`) |

> [!NOTE]
> Compile artifacts are automatically detected by the search pipeline — no additional configuration is needed after compilation. When no compile artifacts exist, search falls back to the standard retrieval pipeline.

### `sirchmunk version`

Display version information.
Expand All @@ -119,6 +158,7 @@ sirchmunk mcp version
| `sirchmunk init` | Initialize working directory, .env, and MCP config |
| `sirchmunk serve` | Start the backend API server |
| `sirchmunk search` | Perform search queries |
| `sirchmunk compile` | Compile documents into knowledge indices **(Beta)** |
| `sirchmunk web init` | Build WebUI frontend (requires Node.js 18+) |
| `sirchmunk web serve` | Start API + WebUI (single port) |
| `sirchmunk web serve --dev` | Start API + Next.js dev server (hot-reload) |
Expand Down
42 changes: 41 additions & 1 deletion content/docs/guide/cli.zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@ sirchmunk search "How does authentication work?"
# 在指定路径中搜索
sirchmunk search "find all API endpoints" ./src ./docs

# DEEP 模式:蒙特卡洛证据采样全面分析
# DEEP 模式:智能体检索全面分析
sirchmunk search "数据库架构" --mode DEEP

# 快速文件名搜索(无需 LLM)
Expand Down Expand Up @@ -103,6 +103,45 @@ sirchmunk mcp serve
sirchmunk mcp serve --transport http --port 3000
```

### `sirchmunk compile`(Beta)

将文档集预处理为层次化树索引和知识簇。这是一个**可选**步骤 — 无需编译即可搜索,但编译产物可显著提升大型文档集的检索精度。

```bash
# 编译文档(默认增量模式)
sirchmunk compile --paths /path/to/documents

# 全量重新编译(忽略缓存)
sirchmunk compile --paths /path/to/documents --full

# 浅层模式(跳过树索引,更快)
sirchmunk compile --paths /path/to/documents --shallow

# 查看编译状态
sirchmunk compile --paths /path/to/documents --status

# 运行知识健康检查
sirchmunk compile --lint --work-path ~/.sirchmunk

# 自动修复检查问题
sirchmunk compile --lint --fix --work-path ~/.sirchmunk
Comment on lines +123 to +127

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

In the options table below, --paths is marked as (必填). However, in these --lint examples, --paths is omitted. If --paths is indeed required for all compile subcommands, please add it to these examples. If it is optional when running with --lint, please update the options table to clarify that --paths is only required for compilation.

```

| 选项 | 描述 |
|------|------|
| `--paths` | 要编译的目录或文件(必填) |
| `--full` | 强制全量重编译,忽略增量缓存 |
| `--shallow` | 跳过树索引,仅使用 LLM 直接摘要(更快) |
| `--max-files` | 最大处理文件数(超出时触发重要性采样) |
| `--concurrency` | 最大并行编译数(默认:3) |
| `--status` | 显示编译状态而非执行编译 |
| `--lint` | 运行知识健康检查 |
| `--fix` | 自动修复检查问题(需配合 `--lint`) |
| `--work-path` | 工作目录(默认:`~/.sirchmunk`) |

> [!NOTE]
> 编译产物会被搜索管线自动检测 — 编译完成后无需额外配置。当不存在编译产物时,搜索会回退到标准检索管线。

### `sirchmunk version`

显示版本信息。
Expand All @@ -119,6 +158,7 @@ sirchmunk mcp version
| `sirchmunk init` | 初始化工作目录、.env 和 MCP 配置 |
| `sirchmunk serve` | 启动后端 API 服务器 |
| `sirchmunk search` | 执行搜索查询 |
| `sirchmunk compile` | 将文档编译为知识索引 **(Beta)** |
| `sirchmunk web init` | 构建 WebUI 前端(需要 Node.js 18+) |
| `sirchmunk web serve` | 启动 API + WebUI(单端口) |
| `sirchmunk web serve --dev` | 启动 API + Next.js 开发服务器(热重载) |
Expand Down
9 changes: 8 additions & 1 deletion content/docs/guide/configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,13 @@ All persistent data is stored under `SIRCHMUNK_WORK_PATH`:
│ │ └── chat_history.db
│ ├── knowledge/ # Knowledge clusters (Parquet)
│ │ └── knowledge_clusters.parquet
│ ├── compile/ # Compile artifacts (Beta)
│ │ ├── manifest.json # File manifest with hashes
│ │ ├── document_catalog.json
│ │ ├── summary_index.json
│ │ ├── trees/ # Hierarchical tree indices
│ │ ├── table_digests/ # Table extraction digests
│ │ └── xlsx_digests/ # Spreadsheet digests
│ └── settings/ # User settings (DuckDB)
│ └── settings.db
├── .env # Environment configuration
Expand All @@ -67,7 +74,7 @@ When invoking search (via SDK, CLI, or API), the following parameters are availa
|-----------|------|---------|-------------|
| `query` | `string` | *required* | Search query or question |
| `paths` | `string \| string[]` | *optional* | Directories or files to search; falls back to `SIRCHMUNK_SEARCH_PATHS`, then cwd |
| `mode` | `string` | `FAST` | `FAST` (greedy, 2-5s), `DEEP` (Monte Carlo, 10-30s), or `FILENAME_ONLY` |
| `mode` | `string` | `FAST` | `FAST` (greedy, 2-5s), `DEEP` (agentic retrieval, 10-30s), or `FILENAME_ONLY` |
| `max_depth` | `int` | `null` | Maximum directory depth |
| `top_k_files` | `int` | `null` | Number of top files to return |
| `enable_dir_scan` | `bool` | `true` | Enable directory scanning |
Expand Down
9 changes: 8 additions & 1 deletion content/docs/guide/configuration.zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,13 @@ Sirchmunk 通过存储在 `.env` 文件中的环境变量进行配置。运行 `
│ │ └── chat_history.db
│ ├── knowledge/ # 知识簇(Parquet)
│ │ └── knowledge_clusters.parquet
│ ├── compile/ # 编译产物(Beta)
│ │ ├── manifest.json # 文件清单与哈希
│ │ ├── document_catalog.json
│ │ ├── summary_index.json
│ │ ├── trees/ # 层次化树索引
│ │ ├── table_digests/ # 表格提取摘要
│ │ └── xlsx_digests/ # 电子表格摘要
│ └── settings/ # 用户设置(DuckDB)
│ └── settings.db
├── .env # 环境配置
Expand All @@ -67,7 +74,7 @@ Sirchmunk 通过存储在 `.env` 文件中的环境变量进行配置。运行 `
|------|------|--------|------|
| `query` | `string` | *必填* | 搜索查询或问题 |
| `paths` | `string \| string[]` | *可选* | 要搜索的目录或文件;未设置时依次回退到 `SIRCHMUNK_SEARCH_PATHS`、当前工作目录 |
| `mode` | `string` | `FAST` | `FAST`(贪心搜索,2-5s)、`DEEP`(蒙特卡洛采样,10-30s)或 `FILENAME_ONLY` |
| `mode` | `string` | `FAST` | `FAST`(贪心搜索,2-5s)、`DEEP`(智能体检索,10-30s)或 `FILENAME_ONLY` |
| `max_depth` | `int` | `null` | 最大目录深度 |
| `top_k_files` | `int` | `null` | 返回的文件数量 |
| `enable_dir_scan` | `bool` | `true` | 是否启用目录扫描 |
Expand Down
8 changes: 4 additions & 4 deletions content/docs/guide/docker.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,14 +9,14 @@ Pre-built Docker images are available on Alibaba Cloud Container Registry for qu

| Region | Image |
|---|---|
| US West | `modelscope-registry.us-west-1.cr.aliyuncs.com/modelscope-repo/sirchmunk:ubuntu22.04-py312-0.0.6` |
| China Beijing | `modelscope-registry.cn-beijing.cr.aliyuncs.com/modelscope-repo/sirchmunk:ubuntu22.04-py312-0.0.6` |
| US West | `modelscope-registry.us-west-1.cr.aliyuncs.com/modelscope-repo/sirchmunk:ubuntu22.04-py312-0.0.7` |
| China Beijing | `modelscope-registry.cn-beijing.cr.aliyuncs.com/modelscope-repo/sirchmunk:ubuntu22.04-py312-0.0.7` |

## Quick Start

```bash
# Pull the image
docker pull modelscope-registry.cn-beijing.cr.aliyuncs.com/modelscope-repo/sirchmunk:ubuntu22.04-py312-0.0.6
docker pull modelscope-registry.cn-beijing.cr.aliyuncs.com/modelscope-repo/sirchmunk:ubuntu22.04-py312-0.0.7

# Start the service
docker run -d \
Expand All @@ -35,7 +35,7 @@ docker run -d \
-e SIRCHMUNK_SEARCH_PATHS=/mnt/docs \
-v /path/to/your_work_path:/data/sirchmunk \
-v /path/to/your/docs:/mnt/docs:ro \
modelscope-registry.cn-beijing.cr.aliyuncs.com/modelscope-repo/sirchmunk:ubuntu22.04-py312-0.0.6
modelscope-registry.cn-beijing.cr.aliyuncs.com/modelscope-repo/sirchmunk:ubuntu22.04-py312-0.0.7
```

Open http://localhost:8584 to access the WebUI, or call the API directly:
Expand Down
8 changes: 4 additions & 4 deletions content/docs/guide/docker.zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,14 +9,14 @@ weight: 8

| 区域 | 镜像 |
|---|---|
| 美西 | `modelscope-registry.us-west-1.cr.aliyuncs.com/modelscope-repo/sirchmunk:ubuntu22.04-py312-0.0.6` |
| 北京 | `modelscope-registry.cn-beijing.cr.aliyuncs.com/modelscope-repo/sirchmunk:ubuntu22.04-py312-0.0.6` |
| 美西 | `modelscope-registry.us-west-1.cr.aliyuncs.com/modelscope-repo/sirchmunk:ubuntu22.04-py312-0.0.7` |
| 北京 | `modelscope-registry.cn-beijing.cr.aliyuncs.com/modelscope-repo/sirchmunk:ubuntu22.04-py312-0.0.7` |

## 快速开始

```bash
# 拉取镜像(根据地理位置选择最近的 Registry)
docker pull modelscope-registry.cn-beijing.cr.aliyuncs.com/modelscope-repo/sirchmunk:ubuntu22.04-py312-0.0.6
docker pull modelscope-registry.cn-beijing.cr.aliyuncs.com/modelscope-repo/sirchmunk:ubuntu22.04-py312-0.0.7

# 启动服务
docker run -d \
Expand All @@ -35,7 +35,7 @@ docker run -d \
-e SIRCHMUNK_SEARCH_PATHS=/mnt/docs \
-v /path/to/your_work_path:/data/sirchmunk \
-v /path/to/your/docs:/mnt/docs:ro \
modelscope-registry.cn-beijing.cr.aliyuncs.com/modelscope-repo/sirchmunk:ubuntu22.04-py312-0.0.6
modelscope-registry.cn-beijing.cr.aliyuncs.com/modelscope-repo/sirchmunk:ubuntu22.04-py312-0.0.7
```

打开 http://localhost:8584 访问 WebUI,或直接调用 API:
Expand Down
2 changes: 1 addition & 1 deletion content/docs/guide/mcp.md
Original file line number Diff line number Diff line change
Expand Up @@ -89,7 +89,7 @@ List all stored knowledge clusters.
| Mode | Description | LLM Required |
|------|-------------|:------------:|
| **FAST** | Greedy search with 2-level keyword cascade and early stopping (2-5s, ~10x faster than DEEP) | Yes |
| **DEEP** | Full multi-phase analysis with Monte Carlo evidence sampling (10-30s) | Yes |
| **DEEP** | Full multi-phase agentic retrieval analysis with self-correction (10-30s) | Yes |
| **FILENAME_ONLY** | Filename-based search without content analysis | No |

## OpenClaw Integration
Expand Down
2 changes: 1 addition & 1 deletion content/docs/guide/mcp.zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -89,7 +89,7 @@ sirchmunk mcp serve --transport http --port 3000
| 模式 | 描述 | 需要 LLM |
|------|------|:--------:|
| **FAST** | 贪心搜索,两级关键词级联 + early stopping(2-5s,速度约为 DEEP 的 10 倍) | 是 |
| **DEEP** | 蒙特卡洛证据采样完整多阶段分析(10-30s) | 是 |
| **DEEP** | 智能体检索全面分析,支持自纠正(10-30s) | 是 |
| **FILENAME_ONLY** | 文件名搜索,无内容分析 | 否 |

## OpenClaw 集成
Expand Down
8 changes: 6 additions & 2 deletions content/docs/guide/project-structure.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ sirchmunk/
│ │ │ └── components/ # History, monitor, settings storage
│ │ ├── cli/ # CLI entry point and web launcher
│ │ ├── insight/ # Text insight extraction
│ │ ├── learnings/ # Evidence processing, knowledge base
│ │ ├── learnings/ # Evidence processing, knowledge base, compiler
│ │ ├── llm/ # LLM interface (OpenAI-compatible)
│ │ ├── retrieve/ # Indexless retrieval engine
│ │ ├── scan/ # Directory and file scanners
Expand Down Expand Up @@ -57,9 +57,11 @@ The search pipeline coordinator:
- **SearchContext** (`schema/search_context.py`) — Budget, state, and audit management

### Intelligence Layer
Evidence extraction and knowledge synthesis:
Evidence extraction, knowledge synthesis, and offline compilation:
- **EvidenceProcessor** (`learnings/evidence_processor.py`) — Monte Carlo sampling
- **KnowledgeBase** (`learnings/knowledge_base.py`) — Knowledge cluster management
- **KnowledgeCompiler** (`learnings/compiler.py`) — Offline document compilation into tree indices and knowledge clusters (Beta)
- **KnowledgeLint** (`learnings/lint.py`) — Knowledge health checks and auto-repair
- **ReActAgent** (`agentic/react_agent.py`) — Autonomous exploration
- **OpenAIChat** (`llm/openai_chat.py`) — Unified LLM interface

Expand All @@ -79,4 +81,6 @@ Persistence and caching:
| **DirScanner** | `scan/dir_scanner.py` | Directory structure analysis |
| **ReActAgent** | `agentic/react_agent.py` | Budget-bounded autonomous exploration |
| **OpenAIChat** | `llm/openai_chat.py` | Unified LLM interface with streaming and usage tracking |
| **KnowledgeCompiler** | `learnings/compiler.py` | Offline document compilation into tree indices (Beta) |
| **KnowledgeLint** | `learnings/lint.py` | Knowledge health checks and auto-repair |
| **MonitorTracker** | `api/components/monitor_tracker.py` | Real-time system metrics |
8 changes: 6 additions & 2 deletions content/docs/guide/project-structure.zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ sirchmunk/
│ │ │ └── components/ # 历史、监控、设置存储
│ │ ├── cli/ # CLI 入口和 Web 启动器
│ │ ├── insight/ # 文本洞察提取
│ │ ├── learnings/ # 证据处理、知识库
│ │ ├── learnings/ # 证据处理、知识库、编译器
│ │ ├── llm/ # LLM 接口(OpenAI 兼容)
│ │ ├── retrieve/ # 无索引检索引擎
│ │ ├── scan/ # 目录和文件扫描器
Expand Down Expand Up @@ -57,9 +57,11 @@ Sirchmunk 遵循严格的**关注点分离**模式,分为四个独立层次:
- **SearchContext** (`schema/search_context.py`) — 预算、状态和审计管理

### 智能层
证据提取和知识合成
证据提取、知识合成与离线编译
- **EvidenceProcessor** (`learnings/evidence_processor.py`) — 蒙特卡洛采样
- **KnowledgeBase** (`learnings/knowledge_base.py`) — 知识簇管理
- **KnowledgeCompiler** (`learnings/compiler.py`) — 离线文档编译为树索引和知识簇(Beta)
- **KnowledgeLint** (`learnings/lint.py`) — 知识健康检查与自动修复
- **ReActAgent** (`agentic/react_agent.py`) — 自主探索
- **OpenAIChat** (`llm/openai_chat.py`) — 统一 LLM 接口

Expand All @@ -79,4 +81,6 @@ Sirchmunk 遵循严格的**关注点分离**模式,分为四个独立层次:
| **DirScanner** | `scan/dir_scanner.py` | 目录结构分析 |
| **ReActAgent** | `agentic/react_agent.py` | 预算约束下的自主探索 |
| **OpenAIChat** | `llm/openai_chat.py` | 统一 LLM 接口(支持流式和使用量跟踪) |
| **KnowledgeCompiler** | `learnings/compiler.py` | 离线文档编译为树索引(Beta) |
| **KnowledgeLint** | `learnings/lint.py` | 知识健康检查与自动修复 |
| **MonitorTracker** | `api/components/monitor_tracker.py` | 实时系统指标 |
Loading
Loading