You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Title: perf(backend): clarify backend contract and make DuckDB native-fast
Body: use this plan as the issue body.
Record the created issue number in a new execution ledger: docs/process/issue--backend-performance-execution.md.
Maintain the ledger throughout the branch.
Track phases, status, decisions, changed contract surfaces, benchmark commands, benchmark artifact paths, unresolved risks, and final merge evidence.
Update it after each phase, not only at the end.
Merge only after the final benchmark campaign, docs alignment, codira audit, full validation, and issue acceptance checklist are complete.
Key Changes
Redesign the backend contract before optimizing implementations.
Replace hidden SQLite-shaped assumptions with an explicit index-session contract in src/codira/contracts.py.
Define separate expectations for write sessions and frequent read/query operations.
Core expectations:
one active backend per repository instance;
deterministic query-equivalent results across first-party backends;
full index may rebuild or replace storage wholesale;
incremental index receives changed, deleted, and reused path sets;
failed files must not leave visible partial rows, but per-file DB rollback/savepoints are not required;
ctx, calls, sym, symlist, and audit must use a cheap read path and must not trigger schema repair, derived-index rebuilds, or writer setup.
Remove compatibility requirements for previous backend storage versions; bump schema/contract version and rebuild/fail fast as needed.
Refactor core indexing around the new session flow.
Keep scanning, analyzer selection, and index planning in core.
Move backend mutation into begin_index_session(...) and a write-session object.
The session owns loading previous embeddings, preparing full/incremental storage, persisting analyzed files in batches, rebuilding derived indexes, writing runtime inventory, commit, abort, and close.
Preserve the current index report semantics: successful files count as indexed, failed files are reported, and failed files are excluded from committed backend state.
Rebuild DuckDB persistence as a native writer.
Add a DuckDB writer module that builds typed Arrow batches per logical table and bulk-loads them into DuckDB staging tables.
Remove the hot write path dependency on DB-API-style execute, executemany, cursor adapters, and per-row lastrowid.
Allocate internal IDs in batches, deterministically for full rebuilds and by bounded range allocation for incrementals.
Full index: write to a temporary DuckDB database, create indexes after bulk load, checkpoint, measure size, then atomically replace the active index.
Incremental index: write changed files into staging tables, delete changed/deleted paths, swap staged rows in one run-scoped transaction, then rebuild only required derived state.
Keep embeddings as typed binary Arrow data for now; do not introduce vector search or ANN behavior.
Fix DuckDB warm-readiness and query paths.
Add a read-only/cheap-open path for frequent commands.
Move schema repair and migration checks out of normal query opens.
Ensure unchanged codira index checks metadata, file hashes, analyzer inventory, and runtime inventory before any mutation setup.
Re-measure ctx, sym, calls, symlist, and audit only after full/warm index are fixed, then optimize remaining query-specific regressions.
Optimize SQLite as the control backend.
Port SQLite to the new index-session contract without changing operator behavior.
Keep SQLite savepoints only where they are still useful for incremental row-oriented writes; do not expose them as a contract requirement.
Optimize the observed regressions in ctx, warm index, symlist, audit, sym, and calls.
Focus on backend factory caching, connection reuse, cheaper readiness checks, fewer repeated backend calls from query producers, and batched reads for context assembly.
Add SQLite regression budgets so future backend-agnostic core work cannot silently slow critical commands.
Documentation And Tests
Update docs/architecture/storage-backends.md, docs/plugins/backends.md, docs/process/performance-benchmarking.md, and the new execution ledger.
Add or update contract tests using SQLite, DuckDB, and the in-memory validation backend.
Required scenarios:
full index with successful files and one failing file leaves no partial rows for the failed file;
incremental change/delete/reuse behavior is identical across first-party backends;
unchanged warm index does not enter writer setup;
frequent query commands are read-only and do not perform repair/rebuild work;
DuckDB bulk writer does not use per-row executemany/lastrowid paths;
SQLite and DuckDB query outputs match for ctx, sym, calls, symlist, and audit.
Run docstring enforcement after edits:
uv run codira index
uv run codira audit --json
fix all reported docstring issues using NumPy style
rerun uv run codira index and uv run codira audit --json
Performance And Merge Gates
Establish same-branch pre-change baselines if the branch begins before implementation; otherwise compare against the saved .artifacts/analysis/2026-05-19-measurement-campaign-analysis.md and new post-
change artifacts.
Run paired SQLite and DuckDB campaigns on the short manifest first, then the broader benchmark manifest before merge.
Required acceptance:
DuckDB full index mean within 25% of SQLite or within 5 seconds, whichever is more forgiving;
DuckDB warm index, ctx, sym, calls, symlist, and audit within 10% of SQLite or within 250 ms, whichever is more forgiving;
DuckDB index size no more than 2x SQLite on each short-campaign repo, with the stretch goal of 1.5x;
SQLite critical-command regression no more than 3% or 100 ms versus same-branch baseline;
no failed pre-commit run --all-files;
no failed pytest -q.
If DuckDB cannot meet the strict gates after the native Arrow writer and cheap-readiness work, stop the branch before merge and update the GitHub issue with measured blocker evidence.
Backend Contract And Native Backend Performance Plan
Summary
Implementation Ledger
Key Changes
Documentation And Tests
Performance And Merge Gates
change artifacts.