_____ _ _
| | |___ ___ ___| |_ ___| |_
| | | | .'| | . | . | . | _|
|_|___|__,|_|_|___|___|___|_|
A personal AI assistant that runs on your terms. Cloud or local. Text or voice. Your machine, your models, your data.
Rust port of nanobot by HKUDS -- rebuilt from scratch for speed, portability, and offline-first operation.
Most AI assistants are cloud-locked SaaS products. nanobot is a single binary that talks to whatever LLM you point it at -- Claude, GPT, Gemini, Groq, or a GGUF running on your own hardware. Add voice and it becomes a conversational assistant you can interrupt mid-sentence. Add channels and it lives in your Telegram, WhatsApp, or Feishu.
No containers. No Python. No dependencies beyond what cargo build pulls in.
cargo build --release
# Initialize config and workspace
nanobot onboard
# Add your API key to ~/.nanobot/config.json
# Start chatting
nanobot agentAll providers speak the same OpenAI-compatible protocol. First API key found wins:
OpenRouter / DeepSeek / Anthropic / OpenAI / Gemini / Groq / vLLM
You: What's the weather like?
Toggle between cloud and local inference mid-conversation. nanobot connects to LM Studio, loads a model, and switches over.
You: /local
Starting LM Studio server on port 1234...
Loading model...
LOCAL MODE LM Studio on port 1234
Model: NVIDIA-Nemotron-Nano-9B-v2-Q4_K_M.gguf
You: /model
Available models:
[1] gemma-3n-E4B-it-Q4_K_S.gguf (3923 MB)
[2] Ministral-8B-Instruct-Q4_K_M.gguf (4815 MB)
[3] NVIDIA-Nemotron-Nano-9B-v2-Q4_K_M.gguf (5352 MB) (active)
...
Select model [1-12] or Enter to cancel:
Switch models on the fly. The server process is monitored -- if it crashes during loading, you get the error immediately instead of waiting for a timeout. Stale servers from previous sessions are cleaned up automatically.
cargo build --release --features voiceYou: /voice
Voice mode ON. Ctrl+Space or Enter to speak, type for text.
Recording... (press Enter or Ctrl+Space to stop)
You said: "What time is it in Tokyo?"
It's currently about two in the morning in Tokyo.
Voice mode uses on-device models -- no cloud STT/TTS:
- Speech-to-text: Whisper (via jack-voice)
- Text-to-speech: Pocket TTS (Candle, 24kHz, CPU real-time)
Audio is streamed sentence-by-sentence through PulseAudio. First audio plays in ~300-500ms while remaining sentences synthesize in the background.
Interrupt anytime: press Enter during playback to cut the response short and start speaking. The assistant stops talking and listens.
nanobot realtime --engine pocket --voice albaHands-free conversation with VAD-based turn detection. No keys needed -- just speak. The pipeline:
- Listen -- Silero VAD detects speech, SmartTurn v3 determines when you're done
- Process -- Whisper transcribes, LLM streams response
- Speak -- Sentences stream to TTS as they arrive (~300ms to first audio)
- Barge-in -- Start speaking during a response to interrupt immediately
Switch to push-to-talk with --mode ptt (hold Space to record).
Run a language model directly on your Mac's GPU -- no server, no HTTP, no separate process. The model lives in nanobot's memory and serves inference, perplexity scoring, and LoRA fine-tuning from the same worker thread.
cargo build --release --features mlxSet inferenceEngine to "mlx" in ~/.nanobot/config.json:
{
"agents": {
"defaults": {
"inferenceEngine": "mlx",
"mlxModelDir": "~/.cache/lm-studio/models/mlx-community/Qwen3.5-2B-MLX-8bit",
"mlxPreset": "qwen3.5-2b"
}
}
}Default model: Qwen3.5-2B (8-bit, ~2GB). The model loads once at startup and stays in GPU memory. All entry points (REPL, gateway, voice, channels) use the same in-process provider.
Online learning: When MLX is active, the perplexity gate auto-enables. Each conversation turn is scored for surprise (cross-entropy loss). Surprising exchanges accumulate in an experience buffer; once enough gather, a LoRA training pass fires in the background. The model learns from its mistakes -- next inference uses updated weights. No manual training step needed.
MLX server (standalone, OpenAI-compatible):
nanobot mlx-serve --port 8766Exposes /v1/chat/completions (OpenAI) + Ex0bit protocol (/chat SSE, /train, /status, /reset).
The agent has hands. It can read and write files, run shell commands, search the web, spawn sub-agents, and schedule recurring tasks:
| Tool | What it does |
|---|---|
| File read/write/edit | Workspace file operations |
| Shell exec | Run commands with timeout and sandboxing |
| Web search + fetch | SearXNG (default) or Brave Search API + page fetching |
| Message | Send messages to channels |
| Spawn | Launch sub-agent conversations |
| Cron | Schedule recurring tasks with cron expressions |
By default, web_search uses SearXNG running locally. To set it up:
# Run SearXNG with JSON API enabled
docker run -d --name searxng -p 8888:8080 \
-e SEARXNG_BASE_URL=http://localhost:8888 \
searxng/searxng:latest
# Enable JSON format (required for API access)
docker exec searxng sed -i 's/^formats:$/formats:\n - html\n - json/' /etc/searxng/settings.yml
docker restart searxngAdd to ~/.nanobot/config.json:
{
"tools": {
"web": {
"provider": "searxng",
"searxngUrl": "http://localhost:8888"
}
}
}Alternatively, set "provider": "brave" and add a braveApiKey to use Brave Search API (cloud).
Deploy as a bot on your messaging platforms -- or start them right from the REPL:
| Channel | Transport | Quick start |
|---|---|---|
| Telegram | Long-polling (POST) | /telegram or /tg |
| WebSocket bridge | /whatsapp or /wa |
|
| IMAP polling + SMTP | /email |
|
| Feishu (Lark) | WebSocket | gateway mode |
Channels run in the background while you keep chatting. Inbound messages and bot responses are displayed in the REPL as they flow through:
[telegram] 4815162342: What's the capital of France?
[telegram] bot: The capital of France is Paris.
You: (you keep chatting locally)
With the voice feature enabled, voice messages sent via Telegram or WhatsApp are automatically transcribed using on-device STT (same Whisper model as /voice mode). The bot replies with both text and a voice note synthesized via TTS. No cloud transcription -- everything runs locally. Requires ffmpeg for audio codec conversion.
Long conversations don't lose context. When history exceeds the token budget, nanobot summarizes older messages via a cheap LLM call instead of silently dropping them. The summary preserves key facts, decisions, and pending actions. Falls back to hard truncation if summarization fails.
In gateway mode, messages from different chats are processed in parallel (up to maxConcurrentChats, default 4). A WhatsApp user and a Telegram user get responses simultaneously instead of waiting in a queue. Messages within the same conversation stay serialized to preserve ordering.
- Memory: Daily notes + long-term MEMORY.md, loaded into every prompt
- Skills: Markdown files with YAML frontmatter at
{workspace}/skills/{name}/SKILL.md. Skills markedalways: trueare always loaded; others appear as summaries the agent can read on demand - Sessions: JSONL persistence at
~/.nanobot/sessions/
| Command | Description |
|---|---|
/local, /l |
Toggle local/cloud mode |
/model, /m |
Select local GGUF model |
/think, /t, /thinking |
Toggle/adjust thinking (on, off, or budget tokens) |
/nothink, /nt |
Suppress streamed thinking output |
/voice, /v |
Toggle voice mode |
/telegram, /tg |
Start Telegram channel in background |
/whatsapp, /wa |
Start WhatsApp channel in background |
/email |
Start Email channel in background |
/paste, /p |
Paste mode -- multiline input until --- |
/stop |
Stop all running channels |
/status, /s |
Show current mode, model, and channels |
/help, /h |
Show help |
Ctrl+C |
Exit |
| Command | Description |
|---|---|
nanobot onboard |
Initialize config and workspace |
nanobot agent |
Interactive chat |
nanobot agent -m "..." |
Single message |
nanobot gateway |
Start with channel adapters |
nanobot status |
Configuration status |
nanobot tune --input bench.json |
Pick best local profile from benchmark JSON |
nanobot channels status |
Channel status |
nanobot cron list |
List scheduled jobs |
nanobot cron add |
Add a scheduled job |
nanobot realtime |
Realtime voice session (continuous mode) |
nanobot realtime --mode ptt |
Realtime voice with push-to-talk |
nanobot mlx-serve |
Start MLX model server (OpenAI-compat + Ex0bit) |
# Standard build
cargo build --release
# With voice mode (requires jack-voice + pocket-tts)
cargo build --release --features voice
# With MLX in-process inference (Apple Silicon only)
cargo build --release --features mlx
# Debug with logging
RUST_LOG=debug cargo run -- agent -m "Hello"Config lives at ~/.nanobot/config.json (camelCase keys). Workspace defaults to ~/.nanobot/workspace/.
Key agent settings in config.json:
| Key | Default | Description |
|---|---|---|
agents.defaults.model |
anthropic/claude-opus-4-5 |
LLM model |
agents.defaults.maxTokens |
8192 |
Max response tokens |
agents.defaults.maxContextTokens |
128000 |
Context window size |
agents.defaults.maxConcurrentChats |
4 |
Parallel chat limit (gateway) |
agents.defaults.inferenceEngine |
auto |
Engine: auto, lms, or mlx |
agents.defaults.mlxModelDir |
(auto-detected) | Path to MLX model directory |
agents.defaults.mlxPreset |
qwen3.5-2b |
Model config preset |
For local mode, install LM Studio and its CLI (lms). Models are managed through LM Studio.
Channels (Telegram / WhatsApp / Feishu)
|
v
User --> CLI / Voice / Realtime --> AgentLoop --> LLM Provider
| ^ (any OpenAI-compat API)
| |
v |
ToolRegistry --> file, shell, web,
message, spawn, cron
Single-binary. No microservices. The agent loop is the core -- it takes a message, builds context (identity + memory + skills + history), calls the LLM, executes any tool calls, and returns a response. Voice mode wraps this with STT on input and streaming TTS on output.
On startup, the TUI clears the terminal, shows an ASCII splash with mode info, and renders LLM responses as styled markdown (headers, code blocks, bold/italic) via termimad. Input uses rustyline with arrow-key history.
Rust port of nanobot by HKUDS. Original Python implementation licensed under MIT.
MIT