GitHub - lessthanseventy/ex_cortex: Self-improving AI agent platform — local LLMs, multi-agent clusters, tiered memory, and a neuroplasticity loop that writes its own fixes. Built with Elixir, Phoenix LiveView, and Ollama.

No cloud. No SaaS. No data leaving your network. Just local LLMs, a Postgres database, and a Phoenix LiveView UI that updates in real time as your agents think.

ExCortex is an AI agent orchestration platform with a biological nervous system metaphor baked into every layer. You define teams of agents (clusters of neurons), wire them to your data sources (senses), and they work autonomously — reviewing code, triaging incidents, summarizing feeds, filing bugs, and then improving the system itself.

The headline feature: ExCortex includes a Dev Team that reads its own GitHub issues, writes fixes, reviews its own code, runs the test suite, and merges approved changes. It literally gets better while you sleep.

Neuroplasticity — Self-Improvement

This is the thing that makes ExCortex different. The system improves itself at three levels — not just its code, but its own configuration, prompts, and trust in its agents.

Level 1: Feedback — Trust Scoring

Every time a step runs, each neuron's individual verdict is compared against the team consensus. Neurons that consistently contradict the group have their trust score decayed (×0.97, compounding). Over time, this surfaces which agents are reliable and which are drifting — informing roster changes and escalation decisions. This happens automatically, no human input needed.

Level 2: Configuration — Retrospective Proposals

After every step completes, an async retrospective runs in the background. A lightweight LLM reviews the step definition alongside the actual run trace — who ran, what they said, how confident they were — and proposes up to 3 concrete tuning changes:

Proposal Type	What It Tunes
`roster_change`	Swap neurons, change team composition or consensus strategy
`schedule_change`	Adjust polling intervals or cron schedules
`prompt_change`	Tweak system prompts, instructions, or reasoning strategies
`other`	Timeouts, thresholds, model assignments, escalation rules

These proposals land on the Cortex dashboard for you to approve or reject. The system suggests; you decide.

Every synapse (step) has its own tunable knobs: escalation thresholds, reflection confidence floors, model selection, tool iteration limits, dangerous tool handling mode, context providers, and scheduling. Neurons carry their own config — rank, model, strategy, system prompt. Senses have polling intervals. All of it is hot-reloadable through the Instinct UI — no restart required.

Level 3: Code — The Self-Improvement Loop

For changes that go beyond configuration, ExCortex can modify its own source code.

The Analyst Sweep runs every 4 hours. Three steps — a Code Auditor runs mix credo and mix test, a Product Analyst identifies feature opportunities, and a Backlog Manager cross-references existing GitHub issues and files 3–5 new ones labeled self-improvement.

The Self-Improvement Loop picks up those issues:

Issue filed → PM Triage → Planning Consensus → Code Writer → Code Reviewer → QA → UX Review → PM Merge Decision

After PM Triage selects an issue, a Planning Consensus step runs three perspectives in parallel — a Software Architect, a Devil's Advocate, and a Technical PM — each evaluating the implementation approach. If all three reject the issue, the gate blocks the Code Writer from running. This prevents low-value work from consuming compute.

The Code Writer works in an isolated git worktree — never touches the main repo directly. It reads files, makes changes, runs mix test and mix credo, commits, and opens a real PR with structured commit trailers (Constraint, Rejected, Confidence, Scope-risk, Not-tested) for audit trails. The Code Reviewer and QA gate the pipeline — if tests fail, nothing merges. The PM makes the final call: low-risk changes auto-merge, anything touching core logic creates a Proposal for you.

Re-seed the pipeline anytime:

ExCortex.Neuroplasticity.Seed.seed(%{repo: "owner/repo"})

The Full Picture

Automatic:  Trust scores decay on every run → surfaces unreliable neurons
Suggested:  Retrospective proposals after every step → prompt/roster/schedule tuning
Applied:    Self-improvement loop every 4h → code changes via PR

All three layers feed the same Cortex dashboard. You see trust trends, pending proposals, and open PRs in one place. The system gets better continuously — you stay in control of what actually changes.

Multi-Agent Clusters

Clusters are teams of neurons (agents) with distinct perspectives that evaluate inputs independently, then vote on a consensus verdict. Each neuron sees the same content through a different lens — security, style, architecture, compliance — and the final verdict aggregates their confidence scores.

Built-in Pathways

Pathways are cluster blueprints that seed a full team of neurons in one step. ExCortex ships with 20+ pre-built pathways:

Pathway	What It Does
Code Review	Security auditing, style review, architecture analysis
Content Moderation	Safety screening, bias detection, policy compliance
Accessibility Review	WCAG compliance, assistive tech compatibility
Risk Assessment	Risk identification, compliance checks, fraud signals
Performance Audit	Bottleneck detection, scalability analysis
Dependency Audit	Vulnerability scanning, version currency
Incident Triage	Severity classification, response suggestions
Contract Review	Legal document analysis, risk flagging
Dev Team	The self-improvement cluster — code, review, test, merge

Seed pathways from the Genesis page. Each pathway creates a cluster with its neurons in the database and wires them to the evaluation pipeline. Or build your own — pathways are just Elixir modules with metadata functions. There's nothing special about the built-in ones.

Markdown Neuron Definitions

Neurons can also be defined as version-controlled markdown files in priv/neurons/. Each file uses YAML frontmatter for metadata (id, name, category, lobe, ranks) and the body becomes the system prompt. Markdown-defined neurons merge with the code-defined builtins at runtime — markdown wins on id collision, making it easy for the neuroplasticity loop to propose prompt changes as git-diffable PRs.

Data Sources — Senses

Senses are supervised workers that watch external data and feed it into your clusters for evaluation. They poll, push, or stream — whatever the source needs.

Sense	What It Watches
`git`	Commits in a local repository
`directory`	File changes in a directory tree
`feed`	RSS / Atom feeds
`webhook`	Incoming `POST /api/webhooks/:id` requests
`url`	Content changes at a URL (polling)
`websocket`	Live WebSocket streams
`github_issues`	GitHub issues matching a label filter
`obsidian`	New or changed notes in an Obsidian vault
`nextcloud`	Nextcloud activity feed and files
`email`	Email inbox monitoring
`media`	Video/audio files for transcription and analysis

Each sense type has a Reflex — a source template that pre-configures common setups. Point a git sense at your repo, wire it to the Code Review cluster, and every new commit gets a multi-agent review in ~30 seconds.

The webhook sense accepts optional Bearer token authentication. The email sense handles both standard and epoch-timestamp formats.

Chat — Wonder & Muse

Two conversational interfaces for different needs:

Wonder (/wonder) — Pure LLM chat. No context retrieval, no grounding. Just you and the model, thinking out loud.

Muse (/muse) — Data-grounded RAG chat. Muse pulls context from across your entire data surface before answering — Obsidian notes, email, GitHub, dashboard signals, engrams, axioms, and more. It persists every Q&A as a Thought.

What Muse Can Do

Muse has access to 30 tools across 8 categories. It automatically detects what you're asking about and pre-fetches relevant context before the LLM even runs.

Category	Capabilities
Obsidian Vault	Search notes by title or body, read notes, list/toggle/add todos in daily notes
Email	Search inbox (notmuch), read messages, detect unread/newsletters
Knowledge Base	Query engrams (tiered memory), search reference datasets (axioms)
Dashboard	Pull recent signal cards — digest outputs, alerts, reports
GitHub	Search repos, read issues/PRs, list notifications
Web	Fetch URLs, search via DuckDuckGo
Documents	Read PDFs, OCR images, transcribe audio/video, convert formats
System	Run sandboxed commands, query Jaeger traces, list data sources

Ask Muse "what are my open todos?" and it searches your Obsidian daily note for unchecked checkboxes. Ask "what's the latest tech news?" and it pulls from the most recent Tech Digest signal card. Ask "any emails from Bob?" and it runs a notmuch search. No manual tool selection — it figures out what to query based on your question.

Both Wonder and Muse support all configured LLM providers (Ollama local models, Claude via Anthropic API). Muse filters to models with verified tool-calling support.

Pipelines — Ruminations

Ruminations are multi-step pipelines where each step (synapse) is an evaluation by a team of agents with access to tools. A single run of a rumination is called a daydream, and each step execution is an impulse.

A synapse can:

Read files, search GitHub, query your memory store
Run sandboxed shell commands (mix test, mix credo, mix format)
Write files, create commits, open pull requests
File issues, send emails, post to Nextcloud Talk
Trigger other ruminations (recursive pipelines)

Each tool call that modifies the outside world goes through the Proposal system — an approval record you can review, approve, or reject from the dashboard. Safe tools (read, search, fetch) execute immediately. Write and dangerous tools wait for approval.

Bounded Loops & Convergence

Ruminations support iterative execution. Set max_iterations on a rumination and convergence_verdict on its final synapse — the pipeline repeats until the last step's verdict matches the target or the iteration limit is reached. Useful for fix-test-fix cycles where you want the pipeline to keep trying until tests pass.

Keyword Triggers

Ruminations can fire automatically when specific keywords appear in system events. Set trigger: "keyword" and provide keyword_patterns — the KeywordTriggerRunner watches signals, engrams, and sense items for case-insensitive substring matches and fires the pipeline with the matched content as input.

Scratchpad Middleware

Enable the Scratchpad middleware on a synapse to give its pipeline a persistent key:value store that survives across impulses within a daydream. Models write SCRATCHPAD: ... END_SCRATCHPAD blocks in their output to persist data; subsequent steps see the accumulated scratchpad prepended to their input.

Build and manage ruminations from the pipeline builder at /ruminations.

Memory — Engrams

ExCortex has a tiered memory system inspired by how biological memory consolidation works.

Three Tiers

Tier	Name	What It Stores
L0	Impression	One-line summary — fast to scan, cheap to retrieve
L1	Recall	Paragraph-level detail — key facts and context
L2	Body	Full content — the complete artifact

How It Works

Memory.query/2 returns L0 impressions by default — fast, scannable results. Call load_recall/1 to expand to L1 detail, or load_deep/1 for the full L2 body. This keeps queries fast while letting you drill down when you need depth.

The Memory Extractor automatically creates episodic engrams from completed daydreams. The Tier Generator uses an LLM to produce L0/L1 summaries asynchronously — you get full-text L2 immediately and the summaries populate in the background.

Recall Paths track which engrams were accessed during which daydream, giving you an audit trail of what knowledge each agent run consumed.

Browse and search the full engram store at /memory.

54 Agent Tools

Agents call tools during evaluation steps. Every tool is classified by risk level:

Safe — Execute Immediately

read_file · list_files · fetch_url · web_search · query_lore · search_github · read_github_issue · search_obsidian · read_obsidian · search_email · read_email · read_pdf · convert_document · describe_image · read_image_text · transcribe_audio · analyze_video · jq_query · run_sandbox · query_jaeger · search_nextcloud · read_nextcloud · read_nextcloud_notes · query_dictionary

Write — Create Proposal

write_file · edit_file · git_commit · git_push · open_pr · create_obsidian_note · setup_worktree · write_nextcloud · create_nextcloud_note · nextcloud_calendar

Dangerous — Create Proposal

create_github_issue · comment_github · merge_pr · close_issue · git_pull · send_email · run_quest · restart_app · nextcloud_talk

The run_sandbox tool only allows explicitly allowlisted commands: mix test, mix credo, mix excessibility, mix format, mix dialyzer, mix deps.audit.

Architecture

Evaluation Flow

flowchart TD
    Sources["Senses
    git · feed · webhook · email
    github · obsidian · nextcloud
    directory · url · websocket · media"]

    Sources --> Input

    subgraph Eval ["Cluster Evaluation"]
        Input --> NeuronA["Neuron A
        alpha perspective"]
        Input --> NeuronB["Neuron B
        beta perspective"]
        Input --> NeuronC["Neuron C
        strict perspective"]
        NeuronA & NeuronB & NeuronC --> Consensus["Consensus
        vote → verdict + confidence"]
    end

    Eval --> Cortex["Cortex Dashboard
    signals · proposals · history"]
    Cortex --> Memory["Engram Store
    L0 · L1 · L2 memory tiers"]
    Memory --> Eval

Tool Safety Model

flowchart TD
    Agent["Agent"] --> ToolCall{"Tool call"}
    ToolCall -- "safe: read / search / fetch" --> Exec["Execute immediately"]
    ToolCall -- "write: file / commit / PR" --> Proposal["Create Proposal"]
    ToolCall -- "dangerous: issue / email / merge" --> Proposal
    Exec --> Result["Result back to agent"]
    Proposal --> Dashboard["Dashboard: awaiting approval"]
    Dashboard -- "approved" --> Exec
    Dashboard -- "rejected" --> Denied["Denied — agent notified"]

Neuroplasticity Loop

flowchart LR
    Sweep["Analyst Sweep
    every 4h"] --> Issues["GitHub Issues
    labeled self-improvement"]
    Issues --> PM["PM Triage"]
    PM --> Plan["Planning Consensus
    architect · advocate · PM"]
    Plan --> Writer["Code Writer
    read · write · test"]
    Writer --> Reviewer["Code Reviewer
    diff · credo · test"]
    Reviewer --> QA["QA
    full test suite"]
    QA --> UX["UX Review"]
    UX --> Merge{"PM Decision"}
    Merge -- "low risk" --> Auto["Auto-merge"]
    Merge -- "high risk" --> Propose["Proposal for human review"]

Brain Vocabulary

ExCortex uses a biological nervous system metaphor throughout. Here's the full map:

Term	Meaning
Cortex	Main dashboard — the brain's control center
Neuron	An individual agent/role
Cluster	A team of neurons working together
Pathway	A cluster's team definition and configuration
Synapse	A pipeline step — the connection between neurons
Impulse	A single execution of a synapse
Rumination	A multi-step pipeline — deep, structured thinking
Daydream	One run of a rumination
Engram	A memory artifact — tiered (L0/L1/L2)
Signal	A dashboard card — a notification from the nervous system
Sense	A data source — the platform's sensory input
Reflex	A source template — automatic response to stimulus
Expression	A notification channel — how the brain communicates outward
Axiom	Reference data in the Lexicon — foundational knowledge
Wonder	Ephemeral LLM chat — free association
Muse	Data-grounded RAG chat — informed thinking
Thought	A saved query template — a crystallized idea
Instinct	Settings and configuration — base behaviors
Neuroplasticity	The self-improvement loop — the brain rewiring itself
Genesis	Pathway seeding — creating new clusters of neurons

Pages

Route	Page	Purpose
`/` `/cortex`	Cortex	Live dashboard — signals, active ruminations, cluster health, recent memory
`/wonder`	Wonder	Pure LLM chat, no data grounding
`/muse`	Muse	RAG chat over engrams and axioms
`/thoughts`	Thoughts	Saved thought templates — browse, re-run, save to memory
`/neurons`	Neurons	Cluster and agent management
`/ruminations`	Ruminations	Pipeline builder and run history
`/genesis`	Genesis	Pathway seeding — install clusters from the pathway library
`/memory`	Memory	Engram browser with tiered drill-down
`/senses`	Senses	Source management, reflexes, feeds, expressions
`/evaluate`	Evaluate	Direct evaluation interface
`/instinct`	Instinct	Configuration — LLM providers, API keys, feature flags
`/settings`	Settings	Application settings
`/guide`	Guide	Documentation and onboarding

Quickstart

One command (Docker)

docker compose up

That's it. Starts ExCortex, PostgreSQL, Ollama, Jaeger, Prometheus, and Grafana.

Service	URL
ExCortex	http://localhost:4001
Jaeger (traces)	http://localhost:16686
Grafana (metrics)	http://localhost:3000
Nextcloud (optional)	http://localhost:8080

Custom port: PORT=4002 docker compose up

First run

# Reset DB and seed the Dev Team pathway
mix ecto.fresh

Then open Genesis and seed whichever pathways you need.

Local dev (without Docker for the app)

docker compose up db ollama jaeger   # just the dependencies
mix setup                            # deps, db, assets
mix phx.server                       # start ExCortex

With mise

mise setup    # first-time: deps, db, assets
mise dev      # start services + app with live reload
mise stop     # stop background services

Observability

Every request, database query, and LLM call emits OpenTelemetry traces. The Docker Compose stack includes a full observability pipeline:

App → OpenTelemetry Collector → Jaeger (traces) + Prometheus (metrics) → Grafana (dashboards)

Agents can query their own traces using the query_jaeger tool. The self-improvement loop uses this to verify that a code change actually made things faster — not just that the tests pass.

Component	Port	Purpose
OpenTelemetry Collector	4317/4318	Receives traces and metrics from the app
Jaeger	16686	Distributed trace visualization
Prometheus	9090	Metrics storage and querying
Grafana	3000	Dashboards and alerting

Deployment

Standalone Binary (Burrito)

ExCortex ships as a self-contained binary via Burrito. No Erlang or Elixir installation required on the target machine.

mix release.build

Builds for: linux_x86, linux_arm, macos_arm

Run it:

DATABASE_URL="ecto://user:pass@host/ex_cortex" \
SECRET_KEY_BASE="$(mix phx.gen.secret)" \
PHX_SERVER=true \
./burrito_out/ex_cortex_linux_x86 start

Requires PostgreSQL and Ollama running separately. API keys can be set via environment variables at launch or configured live in the Instinct UI (persisted to DB, takes effect without restart).

Docker Compose (services)

Docker Compose runs the supporting infrastructure:

docker compose up -d db ollama             # minimal: just database + LLM
docker compose up -d                       # full: + jaeger, prometheus, grafana, nextcloud

Configuration

Most configuration is managed live in the Instinct UI at /instinct. Settings are persisted to the database and take effect without restart.

Config Priority

Settings DB (Instinct UI) → Application env → Environment variables → Defaults

All config reads go through Settings.resolve/2 — a single function that walks the priority chain.

Environment Variables

# Core
PORT=4001                                    # HTTP port
DATABASE_URL=ecto://user:pass@host/db        # PostgreSQL connection
SECRET_KEY_BASE=...                          # Phoenix secret (auto-generated in dev)

# LLM Providers
OLLAMA_URL=http://localhost:11434            # Local Ollama endpoint
OLLAMA_API_KEY=                              # Optional Ollama auth
ANTHROPIC_API_KEY=...                        # Claude API access

# Observability
OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318

# Integrations (optional)
NEXTCLOUD_URL=http://localhost:8080
NEXTCLOUD_USER=admin
NEXTCLOUD_PASSWORD=admin

LLM Providers

Provider	Models	Use Case
Ollama (local)	`ministral-3:8b`	Fast, lightweight tasks
Ollama (local)	`devstral-small-2:24b`	Reliable tool-calling
Claude (Anthropic)	`claude_haiku`, `claude_sonnet`, `claude_opus`	High-capability tasks

Fallback chain is configurable — if a model is unavailable, the system tries the next one automatically.

The GitHub tools require a gh CLI authenticated on the host and a default repo set in Settings.

Development

Commands

mix test                # run the test suite (auto-creates test DB)
mix credo               # static analysis
mix format              # code formatting (Styler rewrites aggressively)
mix excessibility       # LiveView accessibility snapshot tests
mix dialyzer            # type checking

Mix Aliases

mix setup               # first-time: deps.get, ecto.setup, assets
mix dev                 # start Phoenix server
mix lint                # compile --warnings-as-errors + format check + credo
mix precommit           # lint + test
mix ci                  # full quality gate
mix release.build       # compile + assets + Burrito binary
mix ecto.fresh          # reset DB + seed Dev Team pathway

Quality Notes

Warnings are errors in test environment
Styler aggressively rewrites code formatting — don't fight it
Excessibility generates HTML snapshots for accessibility testing — these always show as modified in git status (not a real problem)
Credo has ~40 pre-existing refactoring opportunities in the baseline

See CLAUDE.md for full contributor conventions.

Tech Stack

Backend

Dependency	Purpose
Phoenix 1.8	Web framework
Phoenix LiveView 1.1	Real-time UI
Ecto + PostgreSQL	Database
Oban	Background job processing
Req + ReqLLM	HTTP client + LLM provider abstraction
OpenTelemetry	Distributed tracing and metrics
Burrito	Standalone binary releases
Fresh	WebSocket client
Owl	TUI rendering

Frontend

Dependency	Purpose
Tailwind CSS v4	Styling
SaladUI	Component library
Heroicons	Icons
MDEx	Markdown rendering

Path Dependencies

Package	Purpose
`ex_cellence`	Core evaluation library — charters, consensus, verdicts
`ex_cellence_dashboard`	Read-only visualization components
`ex_cellence_ui`	Form and input components

_{Built with Elixir, Phoenix, and an unreasonable amount of brain metaphors.}

Name		Name	Last commit message	Last commit date
Latest commit History 670 Commits
.github/workflows		.github/workflows
assets		assets
bin		bin
config		config
docker		docker
docs		docs
lib		lib
priv		priv
test		test
.ex_calibur.pid		.ex_calibur.pid
.formatter.exs		.formatter.exs
.gitignore		.gitignore
.mcp.json		.mcp.json
.mise.toml		.mise.toml
1mix		1mix
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
Dockerfile		Dockerfile
Dockerfile.dev		Dockerfile.dev
README.md		README.md
docker-compose.yml		docker-compose.yml
ex_cellence_server_dev.db		ex_cellence_server_dev.db
ex_cortex		ex_cortex
mix.exs		mix.exs
mix.lock		mix.lock
pa11y.json		pa11y.json

Folders and files

Latest commit

History

Repository files navigation

Table of Contents

Neuroplasticity — Self-Improvement

Level 1: Feedback — Trust Scoring

Level 2: Configuration — Retrospective Proposals

Level 3: Code — The Self-Improvement Loop

The Full Picture

Multi-Agent Clusters

Built-in Pathways

Markdown Neuron Definitions

Data Sources — Senses

Chat — Wonder & Muse

What Muse Can Do

Pipelines — Ruminations

Bounded Loops & Convergence

Keyword Triggers

Scratchpad Middleware

Memory — Engrams

Three Tiers

Categories

How It Works

54 Agent Tools

Safe — Execute Immediately

Write — Create Proposal

Dangerous — Create Proposal

Architecture

Evaluation Flow

Tool Safety Model

Neuroplasticity Loop

Brain Vocabulary

Pages

Quickstart

One command (Docker)

First run

Local dev (without Docker for the app)

With mise

Observability

Deployment

Standalone Binary (Burrito)

Docker Compose (services)

Configuration

Config Priority

Environment Variables

LLM Providers

Development

Commands

Mix Aliases

Quality Notes

Tech Stack

Backend

Frontend

Path Dependencies

About

Topics

Resources

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages