ADA

Agentic Digital Assistant

Installation

Project dependencies are managed via uv

$ git clone git@github.com:kluzny/ada.git # clone the repository
$ cd ada # change directory
$ uv venv # initialize a virtual environment
$ uv sync # install python packages
$ cp -pv config.json.example config.json # default configuration file

See the section on backends to determine what additional dependencies you may need to install.

Running

$ python main.py # a minimal REPL for the agent

Backends

ADA supports multiple LLM backends for flexible model deployment. You can switch between backends by changing the backend key in your config.json.

llama-cpp (Local GGUF Models)

The llama-cpp backend runs quantized GGUF models locally using llama-cpp-python. This backend is ideal for running models entirely offline with full privacy.

Features:

Runs models locally without internet connection (after initial download)
Automatic model download and caching to .ada/llms/ directory
Support for CPU and GPU inference (with CUDA/Metal/etc.)
Dynamic context window detection from GGUF metadata
No external service dependencies

Prerequisites:

Note: llama-cpp-python also supports GPU inference with additional compiler flags, for instance:

# building with CUDA support

$ CMAKE_ARGS="-DGGML_CUDA=on" uv pip install llama-cpp-python --verbose

Refer to the official documentation for your specific GPU llama-cpp-python#installation

Configuration:

{
  "backend": "llama-cpp",
  "backends": {
    "llama-cpp": {
      "model": "phi-2",
      "threads": 4,
      "verbose": false,
      "models": [
        {
          "name": "phi-2",
          "url": "https://huggingface.co/TheBloke/phi-2-GGUF/resolve/main/phi-2.Q8_0.gguf"
        }
      ]
    }
  }
}

Configuration Options:

model: Name of the model to use (must match a name in the models array)
models: Array of model definitions with name and url fields
threads: Number of CPU threads to use (default: 4)
verbose: Enable verbose llama.cpp logging (default: false)

Finding Models:

Browse quantized GGUF models on Hugging Face, search for models with the "GGUF" tag.

ollama (Local Model Server)

The ollama backend connects to a running Ollama server for model inference. Ollama provides an easy-to-use model management system with a simple CLI.

Features:

Easy model management via ollama pull <model-name>
Supports the full Ollama model library, although you will want to select models with tool support
Automatic context window detection from model metadata
Can connect to local or remote Ollama servers
Built-in model quantization and optimization

Prerequisites:

Install and start Ollama:

# Install Ollama (see https://ollama.com)
$ curl -fsSL https://ollama.com/install.sh | sh

# Pull a model
$ ollama pull gpt-oss:latest

# Start the Ollama server (if not managed via systemd)
$ ollama serve

Configuration:

{
  "backend": "ollama",
  "backends": {
    "ollama": {
      "url": "http://localhost:11434",
      "model": "gpt-oss:latest"
    }
  }
}

Configuration Options:

url: Ollama server URL (default: http://localhost:11434)
model: Name of the Ollama model to use (e.g., "llama3.2:latest", "mistral:latest")

Available Models: Browse the Ollama model library for available models. Popular options include:

gemma3 - Google's single gpu Gemini model
gpt-oss - OpenAI's open source model
llama3.2 - Meta's Llama 3.2 model
mistral - Mistral AI's base model
phi - Microsoft's efficient small model

Switching Backends

To switch between backends, simply update the backend field in your config.json:

{
  "backend": "ollama", // Change this to "llama-cpp" or "ollama"
  "backends": {
    // Both backend configurations can coexist
  }
}

Both backend configurations can remain in your config file - only the active backend specified by the backend field will be used.

Text-to-Speech (TTS)

ADA includes integrated text-to-speech capabilities powered by Piper TTS, a high-quality open-source neural text-to-speech system.

Features:

High-quality neural speech synthesis
Streaming audio directly to speakers (no disk files)
Real-time playback as audio is generated
Extensive voice model library with multiple languages and genders
Automatic model download and caching to .ada/voices/ directory
Fully offline operation (after initial voice model download)

Configuration:

Enable TTS by adding the tts option to your config.json:

{
  "tts": "en_US-amy-medium"
}

Available Voices:

Piper provides voices in the format <language>-<voice>-<quality>. For the complete list of available voices, see the Piper Voice Models collection.

Personas

Ada ships with some default personas that define system prompts for various modes of operation.

default - A standard chat assistant with no specific customizations
jester - The default, but trained poorly, as a joke.

You can use the mode [name] command to change between personas or simply modes to see all of the available personas.

Memories

You can add custom context to the various personas using the memories directory. Files are organized by persona name and loaded alphabetically after the core system prompt. Memories use the system role. The exact system behaviour is determined by the specific model you are using. Memories are read as plain text, but your LLM may support text, markdown, json or other file formats. Memories are hot reloaded based on which persona is active.

# ./memories/default/001_important_stuff_to_always_place_in_context.txt
This project uses a python virtual environment that needs to be activated.
Preface all python commands with `source ./venv/bin/activate`.

# ./memories/default/002_something_that_loads_after_001.md
Always use python type hints.
Prefer using the native type hints instead of the `typing` module.

# ./memories/jester/001_use_an_uncomfortable_amount_of_puns.lol
Puns are the highest form of humor.
Use puns as much as possible.
Surround the puns with markdown for italics like *pun* .
It's ok to hallucinate in service of providing an additional *pun*portunity.

Roadmap

tools for file system access e.g. find, tree, cat, diff
audio input STT, with wake word 'ada'
multi-step tasks
command autocomplete
resumable/forkable conversations
better tool use using explicit tool_choice calls
support advanced templating for memories, possibly jinja
provide a checksum when the model is downloaded
logging happens in a separate frame
gif/vid of Ada, doing the thing

TODO

Move the audio to a seperate thread, so that input can continue or queue.
Consider how to interrupt ada to stop speaking. Perhaps a queue or bus we send a stop command to?
Does voice synthesis take so much gpu resource, that we can't do inference. Does the next prompt in queue need to wait for speach to complete?
Command to repeat the last thing Ada says, /repeat
Audit all the agent say calls, to make sure Ada isn't 'talking' when she doesn't need to e.g. help output, menus, code blocks
Response objects should have a .body method for all text, then a speakable method for text that should be spoken, again to not read code blocks
Think about how to get better phonetic answers, for instance can we return phonemes alongside text in the llm output, could that simply be infered by an llm pass after?

Bugs

llama-cpp

tool calling is still busted, and very model finicky
auto-detecting the max content only works for a handful of models, perhaps it should have a configurable override

Name		Name	Last commit message	Last commit date
Latest commit History 111 Commits
.ada		.ada
ada		ada
conversations		conversations
logs		logs
memories		memories
scripts		scripts
tests		tests
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
config.json.example		config.json.example
main.py		main.py
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ADA

Installation

Running

Backends

llama-cpp (Local GGUF Models)

ollama (Local Model Server)

Switching Backends

Text-to-Speech (TTS)

Personas

Memories

Roadmap

TODO

Bugs

llama-cpp

About

Uh oh!

Releases

Packages

Languages

License

kluzny/ada

Folders and files

Latest commit

History

Repository files navigation

ADA

Installation

Running

Backends

llama-cpp (Local GGUF Models)

ollama (Local Model Server)

Switching Backends

Text-to-Speech (TTS)

Personas

Memories

Roadmap

TODO

Bugs

llama-cpp

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages