nanobot

 _____             _       _
|   | |___ ___ ___| |_ ___| |_
| | | | .'|   | . | . | . |  _|
|_|___|__,|_|_|___|___|___|_|

A personal AI assistant that runs on your terms. Cloud or local. Text or voice. Your machine, your models, your data.

Rust port of nanobot by HKUDS -- rebuilt from scratch for speed, portability, and offline-first operation.

Why

Most AI assistants are cloud-locked SaaS products. nanobot is a single binary that talks to whatever LLM you point it at -- Claude, GPT, Gemini, Groq, or a GGUF running on your own hardware. Add voice and it becomes a conversational assistant you can interrupt mid-sentence. Add channels and it lives in your Telegram, WhatsApp, or Feishu.

No containers. No Python. No dependencies beyond what cargo build pulls in.

Quick start

cargo build --release

# Initialize config and workspace
nanobot onboard

# Add your API key to ~/.nanobot/config.json

# Start chatting
nanobot agent

Features

Talk to any LLM

All providers speak the same OpenAI-compatible protocol. First API key found wins:

OpenRouter / DeepSeek / Anthropic / OpenAI / Gemini / Groq / vLLM

You: What's the weather like?

Go local with `/local`

Toggle between cloud and local inference mid-conversation. nanobot connects to LM Studio, loads a model, and switches over.

You: /local
  Starting LM Studio server on port 1234...
  Loading model...

  LOCAL MODE LM Studio on port 1234
  Model: NVIDIA-Nemotron-Nano-9B-v2-Q4_K_M.gguf

You: /model
Available models:
  [1] gemma-3n-E4B-it-Q4_K_S.gguf (3923 MB)
  [2] Ministral-8B-Instruct-Q4_K_M.gguf (4815 MB)
  [3] NVIDIA-Nemotron-Nano-9B-v2-Q4_K_M.gguf (5352 MB) (active)
  ...
Select model [1-12] or Enter to cancel:

Switch models on the fly. The server process is monitored -- if it crashes during loading, you get the error immediately instead of waiting for a timeout. Stale servers from previous sessions are cleaned up automatically.

Voice mode

cargo build --release --features voice

You: /voice
Voice mode ON. Ctrl+Space or Enter to speak, type for text.

Recording... (press Enter or Ctrl+Space to stop)
You said: "What time is it in Tokyo?"

It's currently about two in the morning in Tokyo.

Voice mode uses on-device models -- no cloud STT/TTS:

Speech-to-text: Whisper (via jack-voice)
Text-to-speech: Pocket TTS (Candle, 24kHz, CPU real-time)

Audio is streamed sentence-by-sentence through PulseAudio. First audio plays in ~300-500ms while remaining sentences synthesize in the background.

Interrupt anytime: press Enter during playback to cut the response short and start speaking. The assistant stops talking and listens.

Realtime voice (continuous)

nanobot realtime --engine pocket --voice alba

Hands-free conversation with VAD-based turn detection. No keys needed -- just speak. The pipeline:

Listen -- Silero VAD detects speech, SmartTurn v3 determines when you're done
Process -- Whisper transcribes, LLM streams response
Speak -- Sentences stream to TTS as they arrive (~300ms to first audio)
Barge-in -- Start speaking during a response to interrupt immediately

Switch to push-to-talk with --mode ptt (hold Space to record).

MLX inference (Apple Silicon)

Run a language model directly on your Mac's GPU -- no server, no HTTP, no separate process. The model lives in nanobot's memory and serves inference, perplexity scoring, and LoRA fine-tuning from the same worker thread.

cargo build --release --features mlx

Set inferenceEngine to "mlx" in ~/.nanobot/config.json:

{
  "agents": {
    "defaults": {
      "inferenceEngine": "mlx",
      "mlxModelDir": "~/.cache/lm-studio/models/mlx-community/Qwen3.5-2B-MLX-8bit",
      "mlxPreset": "qwen3.5-2b"
    }
  }
}

Default model: Qwen3.5-2B (8-bit, ~2GB). The model loads once at startup and stays in GPU memory. All entry points (REPL, gateway, voice, channels) use the same in-process provider.

Online learning: When MLX is active, the perplexity gate auto-enables. Each conversation turn is scored for surprise (cross-entropy loss). Surprising exchanges accumulate in an experience buffer; once enough gather, a LoRA training pass fires in the background. The model learns from its mistakes -- next inference uses updated weights. No manual training step needed.

MLX server (standalone, OpenAI-compatible):

nanobot mlx-serve --port 8766

Exposes /v1/chat/completions (OpenAI) + Ex0bit protocol (/chat SSE, /train, /status, /reset).

Tools

The agent has hands. It can read and write files, run shell commands, search the web, spawn sub-agents, and schedule recurring tasks:

Tool	What it does
File read/write/edit	Workspace file operations
Shell exec	Run commands with timeout and sandboxing
Web search + fetch	SearXNG (default) or Brave Search API + page fetching
Message	Send messages to channels
Spawn	Launch sub-agent conversations
Cron	Schedule recurring tasks with cron expressions

Web Search Setup

By default, web_search uses SearXNG running locally. To set it up:

# Run SearXNG with JSON API enabled
docker run -d --name searxng -p 8888:8080 \
  -e SEARXNG_BASE_URL=http://localhost:8888 \
  searxng/searxng:latest

# Enable JSON format (required for API access)
docker exec searxng sed -i 's/^formats:$/formats:\n    - html\n    - json/' /etc/searxng/settings.yml
docker restart searxng

Add to ~/.nanobot/config.json:

{
  "tools": {
    "web": {
      "provider": "searxng",
      "searxngUrl": "http://localhost:8888"
    }
  }
}

Alternatively, set "provider": "brave" and add a braveApiKey to use Brave Search API (cloud).

Channels

Deploy as a bot on your messaging platforms -- or start them right from the REPL:

Channel	Transport	Quick start
Telegram	Long-polling (POST)	`/telegram` or `/tg`
WhatsApp	WebSocket bridge	`/whatsapp` or `/wa`
Email	IMAP polling + SMTP	`/email`
Feishu (Lark)	WebSocket	gateway mode

Channels run in the background while you keep chatting. Inbound messages and bot responses are displayed in the REPL as they flow through:

[telegram] 4815162342: What's the capital of France?
[telegram] bot: The capital of France is Paris.
You: (you keep chatting locally)

Voice messages on channels

With the voice feature enabled, voice messages sent via Telegram or WhatsApp are automatically transcribed using on-device STT (same Whisper model as /voice mode). The bot replies with both text and a voice note synthesized via TTS. No cloud transcription -- everything runs locally. Requires ffmpeg for audio codec conversion.

Context compaction

Long conversations don't lose context. When history exceeds the token budget, nanobot summarizes older messages via a cheap LLM call instead of silently dropping them. The summary preserves key facts, decisions, and pending actions. Falls back to hard truncation if summarization fails.

Concurrent message processing

In gateway mode, messages from different chats are processed in parallel (up to maxConcurrentChats, default 4). A WhatsApp user and a Telegram user get responses simultaneously instead of waiting in a queue. Messages within the same conversation stay serialized to preserve ordering.

Memory and skills

Memory: Daily notes + long-term MEMORY.md, loaded into every prompt
Skills: Markdown files with YAML frontmatter at {workspace}/skills/{name}/SKILL.md. Skills marked always: true are always loaded; others appear as summaries the agent can read on demand
Sessions: JSONL persistence at ~/.nanobot/sessions/

Interactive commands

Command	Description
`/local`, `/l`	Toggle local/cloud mode
`/model`, `/m`	Select local GGUF model
`/think`, `/t`, `/thinking`	Toggle/adjust thinking (`on`, `off`, or budget tokens)
`/nothink`, `/nt`	Suppress streamed thinking output
`/voice`, `/v`	Toggle voice mode
`/telegram`, `/tg`	Start Telegram channel in background
`/whatsapp`, `/wa`	Start WhatsApp channel in background
`/email`	Start Email channel in background
`/paste`, `/p`	Paste mode -- multiline input until `---`
`/stop`	Stop all running channels
`/status`, `/s`	Show current mode, model, and channels
`/help`, `/h`	Show help
`Ctrl+C`	Exit

CLI commands

Command	Description
`nanobot onboard`	Initialize config and workspace
`nanobot agent`	Interactive chat
`nanobot agent -m "..."`	Single message
`nanobot gateway`	Start with channel adapters
`nanobot status`	Configuration status
`nanobot tune --input bench.json`	Pick best local profile from benchmark JSON
`nanobot channels status`	Channel status
`nanobot cron list`	List scheduled jobs
`nanobot cron add`	Add a scheduled job
`nanobot realtime`	Realtime voice session (continuous mode)
`nanobot realtime --mode ptt`	Realtime voice with push-to-talk
`nanobot mlx-serve`	Start MLX model server (OpenAI-compat + Ex0bit)

Building

# Standard build
cargo build --release

# With voice mode (requires jack-voice + pocket-tts)
cargo build --release --features voice

# With MLX in-process inference (Apple Silicon only)
cargo build --release --features mlx

# Debug with logging
RUST_LOG=debug cargo run -- agent -m "Hello"

Configuration

Config lives at ~/.nanobot/config.json (camelCase keys). Workspace defaults to ~/.nanobot/workspace/.

Key agent settings in config.json:

Key	Default	Description
`agents.defaults.model`	`anthropic/claude-opus-4-5`	LLM model
`agents.defaults.maxTokens`	`8192`	Max response tokens
`agents.defaults.maxContextTokens`	`128000`	Context window size
`agents.defaults.maxConcurrentChats`	`4`	Parallel chat limit (gateway)
`agents.defaults.inferenceEngine`	`auto`	Engine: `auto`, `lms`, or `mlx`
`agents.defaults.mlxModelDir`	(auto-detected)	Path to MLX model directory
`agents.defaults.mlxPreset`	`qwen3.5-2b`	Model config preset

For local mode, install LM Studio and its CLI (lms). Models are managed through LM Studio.

Architecture

              Channels (Telegram / WhatsApp / Feishu)
                              |
                              v
User --> CLI / Voice / Realtime --> AgentLoop --> LLM Provider
                           |   ^        (any OpenAI-compat API)
                           |   |
                           v   |
                        ToolRegistry --> file, shell, web,
                                         message, spawn, cron

Single-binary. No microservices. The agent loop is the core -- it takes a message, builds context (identity + memory + skills + history), calls the LLM, executes any tool calls, and returns a response. Voice mode wraps this with STT on input and streaming TTS on output.

On startup, the TUI clears the terminal, shows an ASCII splash with mode info, and renders LLM responses as styled markdown (headers, code blocks, bold/italic) via termimad. Input uses rustyline with arrow-key history.

Attribution

Rust port of nanobot by HKUDS. Original Python implementation licensed under MIT.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 775 Commits
.cargo		.cargo
.planning		.planning
archive		archive
bridge		bridge
docs		docs
experiments		experiments
scripts		scripts
skills		skills
src		src
tests		tests
thoughts		thoughts
.gitignore		.gitignore
AGENTS.md		AGENTS.md
ARCHITECTURE.md		ARCHITECTURE.md
BACKLOG.md		BACKLOG.md
CLAUDE.md		CLAUDE.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
ROADMAP.md		ROADMAP.md
build.rs		build.rs
clippy.toml		clippy.toml
docker-compose.yml		docker-compose.yml
quality-sentinel.sh		quality-sentinel.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

nanobot

Why

Quick start

Features

Talk to any LLM

Go local with `/local`

Voice mode

Realtime voice (continuous)

MLX inference (Apple Silicon)

Tools

Web Search Setup

Channels

Voice messages on channels

Context compaction

Concurrent message processing

Memory and skills

Interactive commands

CLI commands

Building

Configuration

Architecture

Attribution

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

nanobot

Why

Quick start

Features

Talk to any LLM

Go local with /local

Voice mode

Realtime voice (continuous)

MLX inference (Apple Silicon)

Tools

Web Search Setup

Channels

Voice messages on channels

Context compaction

Concurrent message processing

Memory and skills

Interactive commands

CLI commands

Building

Configuration

Architecture

Attribution

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Go local with `/local`

Packages