Skip to content

Releases: synthesisengineering/ragbot

Ragbot v3.5.0 — substrate cleanup

15 May 15:56

Choose a tag to compare

v3.5.0 — 2026-05-15

Substrate cleanup. Pgvector is the only vector backend, the agent loop
wires at startup, OTLP metric and trace export are independently
configurable, app loggers surface under uvicorn, and the regression
suite no longer fails on dev machines without heavy ML dependencies.
Breaking change: RAGBOT_VECTOR_BACKEND=qdrant and the bundled Qdrant
backend are removed. Operators who ran v3.4 with the Qdrant opt-in
must reindex their workspaces into pgvector before upgrading.

Removed

  • Qdrant vector backend. Deleted synthesis_engine.vectorstore.QdrantBackend,
    the embedded qdrant_data/ storage path, the qdrant-client dependency,
    the RAGBOT_VECTOR_BACKEND environment variable, and the ragbot-qdrant
    Docker volume. Dead _qdrant_client / _get_qdrant_client /
    get_qdrant_point_id helpers in rag.py and chunking/ removed.
    The VectorStore ABC at synthesis_engine.vectorstore is retained so
    substrate consumers outside Ragbot can plug in alternative backends
    behind the same contract.

Changed

  • Agent loop wires at startup. The FastAPI lifespan now constructs an
    AgentLoop with the lifespan's LLM backend, the resolved MCP client, and
    a FilesystemCheckpointStore, then calls
    api.routers.agent.set_default_loop() to register the singleton. The
    /api/agent/run endpoint resolves against a real loop on a fresh install
    — through v3.4 it returned "Agent loop is not configured". Shutdown
    clears the singleton.
  • OTLP metric export is independently configurable. The substrate now
    honours the OTEL standard per-signal env-var hierarchy:
    OTEL_EXPORTER_OTLP_METRICS_ENDPOINT (per-signal override; accepts the
    literal "none" to disable metric OTLP export) falls back to
    OTEL_EXPORTER_OTLP_ENDPOINT. The bundled docker-compose stack sets the
    metrics endpoint to "none" because Jaeger only accepts traces — the
    UNIMPLEMENTED errors from earlier deployments are gone. Prometheus
    exposition at /api/metrics is unaffected.
  • App loggers surface under uvicorn. src/api/main.py calls
    logging.basicConfig at module-import time before uvicorn takes over,
    so api.main, api.routers.*, and synthesis_engine.* log lines now
    flow to docker logs ragbot-api alongside uvicorn's own access logs.
    Override the level with RAGBOT_LOG_LEVEL.
  • get_vector_store() returns None when pgvector is unreachable
    instead of falling back through a backend chain. Callers in rag.py
    treat None as "RAG unavailable; chat-only mode," so the user-facing
    failure mode is graceful.

Fixed

  • /api/agent/run returns 503 "not configured" — fixed by the agent
    loop wiring change above.
  • OTLP metric export prints UNIMPLEMENTED — fixed by the per-signal
    endpoint split.
  • App-namespace logger lines invisible in container logs — fixed by
    logging.basicConfig in src/api/main.py.
  • test_sentence_transformers_imports_cleanly fails on dev machines
    without sentence_transformers installed
    — wrapped both Bug5
    regression tests in pytest.importorskip("sentence_transformers") so
    they skip cleanly on lightweight dev installs and still run in Docker /
    CI where the dependency is present.

Test-suite delta

  • v3.4.0 baseline: 871 passing, 25 skipped, 4 failing (3 Qdrant tests + 1
    sentence_transformers env gap).
  • v3.5.0: 850 passing, 14 skipped, 0 failing (Qdrant tests gone with the
    backend; sentence_transformers tests skip cleanly).

Ragbot v3.4.0 — agent loop, first-class MCP, skills runtime, cross-workspace synthesis

14 May 15:38

Choose a tag to compare


title: "Ragbot v3.4.0 — Ragbot becomes the conversational reference runtime of synthesis engineering"
slug: ragbot-v3-4-0
date: 2026-05-14
canonical_url: https://synthesisengineering.org/posts/2026/05/14/ragbot-v3-4-0/
categories:

  • synthesis-engineering
  • ragbot
  • releases
    tags:
  • ragbot
  • mcp
  • skills
  • agent-loop
  • synthesis-engineering
    author: Rajiv Pant

Ragbot v3.4.0 — Ragbot becomes the conversational reference runtime of synthesis engineering

Ragbot v3.4 is the next-major-features release. It moves the project from a
polished 2024-paradigm chat-with-RAG product to a 2026-shaped conversational
AI runtime: explicit agent loop, first-class MCP in both directions, an
executable skills runtime, cross-workspace synthesis with visible
confidentiality boundaries, durable memory beyond vector RAG, and the
production-grade signals that make the architecture legible to engineering
leadership.

The synthesis-engineering positioning that the architecture had quietly
named for a year is now visible in the README, the ragbot.ai homepage, and
the in-product chrome. Ragbot is the reference runtime for the
conversational interaction primitive inside synthesis engineering, with
sibling reference implementations covering direct manipulation
(synthesis-console),
procedural execution
(Ragenie), and the
portable capability format
(synthesis-skills).
The family will grow as the methodology and the AI landscape evolve.

Ragbot v3.4 — Jaeger trace tree for a chat.request showing retrieval and chat-completion children, one of the production-grade observability signals new in this release

The architectural rationale behind v3.4 — the synthesis-ecosystem
framing, the HCI-primitives map, the deterministic-enforcement thesis —
will be published as a blog series at
synthesisengineering.org and
synthesiscoding.org following this
release.

Why this release matters

The 2026 center of gravity for chat-with-tools products has shifted from
"chat with retrieval" to stateful, tool-using agents with durable memory,
governed execution, and async background work. MCP is industry-default
infrastructure at 97M monthly SDK downloads and 5,800+ public servers.
SKILL.md is a cross-vendor standard adopted by Codex CLI, Gemini CLI, GitHub
Copilot, and Cursor. Memory has three layers in production. Observability
is a buying criterion, not polish. Local inference crossed the credibility
threshold with the Ollama 0.19 MLX backend on Apple Silicon.

Ragbot v3.3 was a polished 2024-paradigm product. It read clean. It worked
well. It did not look like a 2026 product to anyone who had seen the new
shape of the category. v3.4 closes that gap. Every architectural commitment
v3.4 ships traces back to a concrete shift in the category, not to feature
preference.

The 1.6%/98.4% ratio from VILA-Lab's Dive into Claude Code analysis
(source) is the design
instruction. AI decision logic is the small slice. Deterministic
infrastructure — permission gates, context management pipelines, tool
routing, recovery mechanisms — is the rest. v3.4 builds that
infrastructure for the conversational primitive.

Headliner 1: Agent loop runtime

Ragbot is now an execution surface, not just a chat surface. A hand-rolled
graph-state agent loop replaces the single-turn prompt → retrieve → call LLM → return path. The agent can decide between answering directly,
dispatching retrieval, calling a tool, running a skill, or fanning out to
sub-agents.

Ragbot v3.4 agent panel — substantive multi-step response drawing on indexed workspace chunks, demonstrating the agent loop in action

The loop is hand-rolled — no LangGraph, no CrewAI, no AutoGen. Use what
those frameworks teach; do not depend on them. The Plan-and-Execute pattern
is the default for compound questions with explicit replanning on failure.
The lead-agent-with-sub-agents pattern handles parallel research across
workspaces. Sandboxed code execution runs through E2B microVM or
self-hosted Daytona Docker for air-gapped installations; the disabled
sandbox fails closed with an actionable error rather than dropping into a
plain subprocess.

Permission gates fail closed at the tool boundary. The default behavior
denies any tool that does not match a read-only pattern or have an
explicit gate. The 98:2 ratio is enforced literally — most of the work is
the deterministic plumbing. State checkpoints are durable and replayable
through the ragbot agent replay command.

The chat-only no-tools mode remains available. The agent loop is opt-in
per session via the picker toggle or agent=true on /api/chat. Test
coverage gates the rollout: 45 tests across the agent loop core and the
agent capabilities surface (sub-agent dispatcher, sandbox, self-grader).

The {"$ref": "step_id.field"} placeholder syntax in plan-step inputs
lets a multi-step plan thread outputs without a separate scratchpad. The
self-grading loop ("Outcomes" pattern borrowed from Anthropic's
Code w/ Claude 2026 talks) lets the agent score its own output against a
written rubric and iterate.

Headliner 2: First-class MCP — client and server

Ragbot is now both an MCP client and an MCP server.

As a client, Ragbot covers all six MCP primitives — tools, resources,
prompts, Roots, Sampling, and Elicitation — plus MCP Tasks for
long-running calls. Most "MCP-supporting" chat products only implement
tools; doing all six is the engineering-judgment signal. OAuth 2.1 with
Dynamic Client Registration is supported for remote servers, with a
stdio + HTTP/SSE proxy so local stdio-default servers work without
leaking complexity into the user's setup.

Ragbot v3.4 MCP settings panel — empty state with the Add server form expanded, showing the stdio / http / sse transport selector

The MCP settings panel lists configured MCP servers, their connection
state, and the tools and resources they expose. Per-server toggles. Per
workspace allow/deny rules. The configured server list lives at
~/.synthesis/mcp-clients.yaml.

As a server, Ragbot exposes its workspace surface to other MCP-aware
agents — Claude Code, Cursor, ChatGPT desktop, Gemini CLI, and any other
client that speaks the protocol. Two transports: stdio for desktop
integrations, HTTP/SSE via StreamableHTTPSessionManager for network
clients. Bearer-token auth via ~/.synthesis/mcp-server.yaml with
per-token allowed_tools glob filtering.

Five exposed tools:

  • workspace_search(workspace, query, k) — vector + FTS search inside a
    single workspace.
  • workspace_search_multi(workspaces, query, k, budget_tokens)
    multi-workspace search with the confidentiality gate firing before
    retrieval, so denied workspace combinations never read content.
  • document_get(workspace, document_id) — retrieve a single document.
  • skill_run(skill_name, inputs) — execute a discovered skill.
  • agent_run_start(prompt, workspaces, ...) — start an agent run, return
    a task ID. Pair with the existing agent endpoints for status/replay.

Three exposed resources: ragbot://workspaces, ragbot://skills,
ragbot://audit/recent.

Headliner 3: Skills as runtime

In v3.3, Ragbot read skills as markdown and indexed them for RAG. v3.4
makes skills executable in the progressive-disclosure model: names
and descriptions in the system prompt, full body on selection, scripts
and templates on tool call.

Ragbot v3.4 skills panel — seven skills visible: the six bundled starter-pack skills plus the demo skill, filtered to the demo workspace scope

SKILL.md is now Ragbot's native extensibility format. A skill written
for Claude Code, Codex CLI, Cursor, or Gemini CLI runs on Ragbot without
modification. This makes Ragbot the third compatible runtime for the
SKILL.md format (after Claude Code and Codex CLI), which is the kind of
cheap-cost interoperability signal that compounds.

Six starter skills ship in the box:

  • workspace-search-with-citations — search the active workspace and
    return results with [workspace:document_id] citations.
  • draft-and-revise — multi-turn drafting with explicit revision passes.
  • fact-check-claims — verify claims against the workspace and surface
    uncertainty honestly.
  • summarize-document — structured summarization with citation
    retention.
  • agent-self-review — the self-grading skill that powers the
    "Outcomes" pattern.
  • cross-workspace-synthesis — the brand-defining skill (described in
    Headliner 5 below).

The npx skills add synthesisengineering/synthesis-skills install path
turns the 32 public synthesis-skills into runnable capabilities. Skills
discovery walks five roots in priority order:
synthesis_engine/skills/starter_pack/ (built-in),
~/.synthesis/skills/ (synthesis-engineering shared install),
~/.claude/skills/ (Claude Code private skills),
~/.claude/plugins/cache/<vendor>/skills/ (plugin-installed), and
per-workspace skill roots declared in compile-config.yaml. Later wins
on name collision, so operator-installed skills override built-ins.

CLI: ragbot skills list/info/run. REST: /api/skills with
?workspace=W filtering. UI: skills panel with one-click execution and
structured output rendering.

Headliner 4: Synthesis ecosystem positioning and rebrand

Ragbot is officially the reference runtime for the co...

Read more

v3.3.0 — Local Gemma support + redesigned model picker

12 May 22:39

Choose a tag to compare

Highlights

  • Local model support via Ollama. New ollama engine adds Google's Gemma 4 family (E4B, 26B MoE, 31B Dense) as first-class models alongside the cloud providers. LiteLLM routes via the ollama_chat/ prefix. Docker stack reaches host Ollama via host.docker.internal:11434 out of the box (configurable with OLLAMA_API_BASE).
  • Model picker redesign. A single rich dropdown replaces the three-step Provider → Category → Model cascade. Display names ("Claude Opus 4.7" instead of claude-opus-4-7). Pinned and Recent sections at the top. Type-ahead search. ⌘K / Ctrl+K global shortcut. Per-row badges for tier (Fast / Balanced / Powerful), context window, 🧠 thinking, 🏠 local. Pin/unpin persisted server-side.
  • User preferences API. New /api/preferences/pinned-models (GET / PUT) and /api/preferences/recent-models (GET / POST). Pinned and recently-used model selections persist across sessions in ~/.synthesis/ragbot.yaml.
  • Thinking control moves adjacent to Model. Renders inline below the picker, only for thinking-capable models.
  • Security. LiteLLM pinned >=1.83.0 in requirements (excludes the compromised 1.82.7/1.82.8 range from the March 2026 supply-chain incident).

What's new

Local models

  • engines.yaml ships a fourth engine block, ollama, with gemma4:e4b, gemma4:26b, gemma4:31b. No API key required.
  • is_local: true flag on local-provider models and providers, surfaced in /api/models/* so UIs can render the 🏠 badge and skip API-key plumbing.
  • Optional OLLAMA_API_BASE env var if you want to point the API container at a LAN-shared Ollama instance instead of host.docker.internal.

Model metadata

  • display_name, supports_thinking, and is_local added to model responses in /api/models and /api/models/all.
  • engines.yaml now supports a display_name field per model. Falls back to the canonical name if omitted.
  • _normalize_model_idnormalize_model_id (public), now idempotent for user-supplied prefixed IDs.

Preferences API

Endpoint Behaviour
GET /api/preferences/pinned-models Returns current pinned model IDs
PUT /api/preferences/pinned-models Replace the pinned list (deduped, order preserved)
GET /api/preferences/recent-models Newest first, capped at 10
POST /api/preferences/recent-models Record one model use (move-to-front + cap)

Persisted via the new keystore.set_user_config / _save_user_config write path; reads continue to work via the existing ~/.synthesis~/.config/ragbot fallback.

UI

  • Trigger button shows the selected model's display name + provider name + tier badge + context + 🧠/🏠 icons.
  • Open: search bar (autofocus) + Pinned + Recent + by-provider sections. Unavailable rows (no API key) render grayed with "🔒 No key".
  • Keyboard: ↑↓ navigates, Enter selects, Esc closes, ⌘K opens.
  • Cascading Provider / Category dropdowns removed.

Bug fixes

  • Empty-output regression on non-flagship GPT-5.x and Gemini models on long-context RAG calls. The default thinking-effort policy was "off" for non-flagship models with thinking metadata. For models whose modes: list declares no off option (OpenAI / Gemini reasoning is always-on), this meant no reasoning_effort was sent and the provider's own default (medium) consumed the entire output-token budget on internal reasoning, leaving zero visible content. The default policy now uses the lowest listed mode (minimal) for these models. Claude (with mode: adaptive) continues to default to off (correct Anthropic behavior).
  • CLI prefix-mapping consolidated. The hand-rolled if/elif block in src/ragbot.py::run_chat now calls the shared normalize_model_id helper, eliminating drift between the CLI and API code paths.

Docker

  • docker-compose.yml: ~/.synthesis mount switched to read-write so the preferences API can persist ragbot.yaml. keys.yaml still 0600 on the host; the chat path already reads keys, so write capability does not materially weaken the threat model.
  • Added OLLAMA_API_BASE env (defaults to http://host.docker.internal:11434) on ragbot-api.
  • Added a legacy ~/.config/ragbot read-only mount so users on the pre-~/.synthesis/ layout keep working without migration.

Internal

  • tests/test_models_integration.py skips Ollama models that aren't currently pulled (graceful for clean clones).
  • Test suite: 270 → 327 passing.
  • PROVIDER_LABELS gained ollama.
  • Backend Pydantic ModelInfo extended with display_name, supports_thinking, is_local.

Quick start

Same as v3.2.x. For local Gemma:

brew install ollama
brew services start ollama
ollama pull gemma4:e4b      # 9.6 GB — fast tier
ollama pull gemma4:26b      # 18 GB — MoE
ollama pull gemma4:31b      # 20 GB — dense workhorse

docker compose up -d
# Open http://localhost:3000 and select an Ollama model from the picker.

Commits since v3.2.0

  • b211d65 Fix default reasoning effort for non-flagship models with discrete modes
  • d2efbed Update model picker UI
  • 6f107f9 Update model API and preferences
  • 4d6feef Update Docker config and keystore
  • af4f92c Refactor model ID normalization to shared helper
  • 9d5bd22 Update LLM engine configuration and tests

v3.2.0 — Demo mode + refreshed screenshots

25 Apr 15:13

Choose a tag to compare

Highlights

  • Demo mode via `RAGBOT_DEMO=1` or `ragbot --demo`. Hard-isolates discovery to the bundled `demo/ai-knowledge-demo/` workspace and `demo/skills/ragbot-demo-skill/`. Real workspaces declared in `/.synthesis/console.yaml` and any glob-discovered repos under `/workspaces/*/` are invisible while demo mode is on. Auto-indexes the bundled content on first invocation.
  • `/health` and `/api/config` report `demo_mode: true` when active. The healthcheck's `vector_backend.workspaces` count is filtered to demo-visible collections so a real workspace count can't leak through the UI when other collections coexist on the same vector store.
  • Web UI banner. A yellow "🎭 Demo mode" strip renders above the workspace picker whenever the server reports `demo_mode: true` — so screenshots taken in demo mode are unmistakably demo.
  • Refreshed screenshots in `screenshots/v3.2/`, captured against an isolated Playwright Chromium instance running against demo mode. The README hero shots reflect the current Web UI.

Quick start

```bash
git clone https://github.com/rajivpant/ragbot.git
cd ragbot
cp .env.example .env

Add at least one API key (Anthropic, OpenAI, or Google).

pip install -r requirements.txt
docker compose up -d # bundled Postgres + ragbot stack

RAGBOT_DEMO=1 python3 src/ragbot.py chat -p "What is ragbot?"
```

Or for the Web UI:

```bash
RAGBOT_DEMO=1 python3 -m uvicorn src.api.main:app --port 8000 &
cd web && npm install && NEXT_PUBLIC_API_URL=http://localhost:8000 npm run dev

Open http://localhost:3000 — yellow demo banner confirms demo mode.

```

Tests

20 new demo-mode regression tests cover: env-var truthy/falsy recognition, bundled-path discovery, discovery isolation for both workspaces and skills, the demo-workspace-name constants. Final pass: 304 passed, 5 skipped.

Confidentiality posture

The bundled demo content was written from scratch for the demo, not derived from any real workspace. The discovery isolation guarantees real workspace names cannot leak into demo screenshots. The same screenshots in this release were captured by an isolated Playwright Chromium instance with no chrome / tab-bar / sidebar in frame, so even browser-level metadata is clean.

Setup

See README.md, CONFIGURE.md, and README-DOCKER.md.

v3.1.0 — LLM backend abstraction

25 Apr 14:25

Choose a tag to compare

Highlights

  • LLM backend abstraction. Every LLM call now routes through a swappable backend interface (src/ragbot/llm/). Two backends ship:

    • litellm (default) — wraps litellm.completion(). Best provider/model coverage, handles long-tail provider quirks. Pinned >=1.83.0 (post-March-2026 supply-chain incident range).
    • direct — opt-in. Calls anthropic, openai, and google-genai SDKs directly. Smaller dependency surface, no third-party gateway.

    Selection: RAGBOT_LLM_BACKEND={litellm|direct}. Adding a new backend (Bifrost, Portkey, OpenRouter, etc.) is one new file plus one selection arm.

  • Web UI controls for reasoning effort (auto, off, minimal, low, medium, high) and the cross-workspace skills auto-include toggle.

  • /api/chat accepts thinking_effort and additional_workspaces — the same surface as the CLI flags.

  • Provider-quirk handling lives in backends. Backends absorb GPT-5.x max_completion_tokens rename, Claude 4.7+ thinking.type.adaptive shape, and the Anthropic-thinking-requires-temperature=1 constraint. The chat code path stays clean.

Bug fixes (caught during manual testing)

  • Chat CLI workspace lookup ignored ~/.synthesis/console.yaml discovery; now honours overrides only and falls through to the full discovery chain.
  • get_relevant_context budget loop used break instead of continue, blocking smaller results when an oversized whole-file script chunk topped the rankings.
  • Anthropic + thinking now forces temperature=1.0 automatically.
  • Claude 4.7+ uses the new thinking.type.adaptive shape instead of LiteLLM's older enabled shape (which 4.7+ rejects).
  • transformers>=5.6.0 pinned to track tokenizers 0.22.x compatibility (older transformers crashed sentence_transformers import).

API additions

  • ChatRequest.thinking_effort: 'auto'|'off'|'minimal'|'low'|'medium'|'high'
  • ChatRequest.additional_workspaces: string[] (empty array opts out of skills auto-include; omitted lets the server auto-include)

Tests

  • 26 new tests for the LLM backend abstraction (contract, selection, kwargs builder, direct healthcheck).
  • 6 regression tests covering the bug fixes above.
  • Final pass: 284 passed, 5 skipped.

Strategic note

LiteLLM remains a defensible default for ragbot in April 2026, primarily because of its provider/model coverage breadth. The March-2026 supply-chain incident (1.82.7–1.82.8) is bounded by the >=1.83.0 pin, but the API-compatibility lag for Claude 4.7+ shows the recurring tax of multi-provider gateways. The abstraction layer in this release is the response: keep LiteLLM as the default, make swap-in alternatives a configuration change rather than a code rewrite.

Setup

See README.md, CONFIGURE.md, and README-DOCKER.md.

v3.0.0 — Modernization release

25 Apr 07:14

Choose a tag to compare

Highlights

  • Pgvector by default. PostgreSQL with the pgvector extension is the default vector backend, replacing embedded Qdrant. Native full-text search via tsvector + GIN replaces in-process BM25. The legacy embedded Qdrant backend remains as an opt-in fallback (RAGBOT_VECTOR_BACKEND=qdrant).
  • Agent Skills as first-class content. Discover and index Agent Skills (SKILL.md plus references and scripts) from ~/.synthesis/skills, ~/.claude/skills, and plugin caches. New ragbot skills {list,info,index} CLI. The compiler can include skills via a sources.skills block in compile-config.yaml.
  • Workspace-rooted layout. AI Knowledge repos are discovered across ~/workspaces/*/ai-knowledge-* and via the synthesis-engineering shared ~/.synthesis/console.yaml source list. Configuration moved to ~/.synthesis/ (legacy ~/.config/ragbot/ falls through).
  • Reasoning / thinking modes. Models that advertise thinking support (Claude Sonnet 4.6, Claude Opus 4.7, GPT-5.5, GPT-5.5-pro, Gemini 3.x) are wired through LiteLLM's reasoning_effort. Flagship models default to medium; override per-call (--thinking-effort high) or globally (RAGBOT_THINKING_EFFORT).
  • Cross-workspace retrieval. get_relevant_context auto-merges results from the user's selected workspace and the canonical skills workspace via reciprocal rank fusion. Disable with --no-skills.

API additions

  • /api/chat accepts thinking_effort and additional_workspaces fields.
  • /api/config and /health report the active vector_backend (backend name, health, version, workspaces count).

Models (engines.yaml)

  • Anthropic: Haiku 4.5, Sonnet 4.6 (default), Opus 4.7.
  • OpenAI: GPT-5.4-mini, GPT-5.5, GPT-5.5-pro (flagship).
  • Google: Gemini 3.1 Flash Lite, Gemini 3 Flash (default), Gemini 3.1 Pro (flagship).

Dependency notes

  • litellm pinned >=1.83.0 (post-supply-chain-incident range; avoid 1.82.7-1.82.8).
  • Direct google-generativeai usage migrated to google-genai (the legacy SDK is end-of-life).
  • 259 tests pass on the release SHA.

Breaking changes

  • Vector backend default flipped to pgvector. Existing Qdrant users can opt back in with RAGBOT_VECTOR_BACKEND=qdrant. Postgres + pgvector is required for the default path; see CONFIGURE.md.
  • Configuration home moved to ~/.synthesis/. Legacy ~/.config/ragbot/{keys,config}.yaml is read as a fallback.

Setup

See README.md, CONFIGURE.md, and README-DOCKER.md for setup instructions.