Pandoc is a powerful tool for parsing, combining and processing text-based documents. Pandoc also includes a filter system that allows for the transformation of documents and sub-elements.
Polydoc brings Pandoc’s filtering capabilities to the JVM/Clojure ecosystem, providing:
- Advanced Pandoc filters compiled with GraalVM for reduced latency
- JVM-native documentation tooling (no Python dependencies required, JavaScript via GraalJS)
- Support for multiple document formats via Pandoc
- SQLite-powered full-text search with FTS5
- Book building system with automatic indexing
- Interactive documentation viewing (coming soon)
Documentation is a fundamental part of software engineering. There are many different tools for managing documentation, but polydoc offers several advantages:
- JVM-Native: If you’re working on the JVM, you don’t need to bring Python or Go into your stack just for documentation. JavaScript execution uses GraalJS (included with GraalVM).
- Pandoc Integration: Leverage Pandoc’s powerful document transformation capabilities.
- Advanced Filters: Execute code (Clojure, SQLite, JavaScript), render diagrams (PlantUML), and compose documents (include filter).
- Full-Text Search: Built-in SQLite FTS5 search index automatically maintained.
- Book Building: Combine multiple documents into searchable books with metadata.
- clojure-exec: Execute Clojure code blocks and show results
- sqlite-exec: Run SQLite queries and format results as tables
- javascript-exec: Execute JavaScript code blocks with GraalJS (GraalVM’s JavaScript engine)
- plantuml: Render PlantUML diagrams to images
- include: Compose documents from multiple source files
- Load configuration from
polydoc.yml(Pandoc-compatible metadata format) - Process multiple markdown/org files through filters
- Extract sections and headers automatically
- Store content in SQLite database
- Automatic FTS5 full-text search index (via database triggers)
- Generate HTML output
- FTS5 full-text search across all book content
- Boolean operators: AND, OR, NOT
- Phrase search:
"exact phrase" - Field-specific search:
title:introduction - Result highlighting with context snippets
- Result ranking by relevance
- HTTP-based documentation browser with http-kit
- Section navigation (Previous/Next buttons)
- Collapsible table of contents
- Full-text search interface
- Pico CSS styling for clean UI
- Browser automation testing with Etaoin/Firefox
- Python and Shell execution filters
- PDF and EPUB output formats
- GraalVM native compilation
- Viewer enhancements (themes, bookmarks, history)
- GraalVM 21 or later (includes GraalJS for JavaScript filter)
- Clojure CLI tools
- Pandoc 2.0 or later
- PlantUML JAR (for PlantUML filter)
Clone the repository and use directly:
git clone <repository-url>
cd polydoc
clojure -M:main --helppolydoc filter # Execute individual Pandoc filters
polydoc book # Build complete books from polydoc.yml
polydoc search # Search documentation with full-text search
polydoc view # Interactive viewer (coming soon)Execute individual Pandoc filters on document AST:
# Process markdown through Clojure execution filter
pandoc input.md -t json | clojure -M:main filter -t clojure-exec | pandoc -f json -o output.htmlExample markdown:
```{.clojure .exec}
(+ 1 2 3)
```Output shows code and result:
```clojure
(+ 1 2 3)
;; => 6
```pandoc input.md -t json | clojure -M:main filter -t sqlite-exec | pandoc -f json -o output.htmlExample markdown:
```{.sqlite .exec database="mydata.db"}
SELECT name, age FROM users ORDER BY age DESC LIMIT 5;
```Output shows query and results as a formatted table.
pandoc input.md -t json | clojure -M:main filter -t javascript-exec | pandoc -f json -o output.htmlExample markdown:
```{.javascript .exec}
const sum = [1, 2, 3, 4, 5].reduce((a, b) => a + b, 0);
console.log("Sum:", sum);
```pandoc input.md -t json | clojure -M:main filter -t plantuml | pandoc -f json -o output.htmlExample markdown:
```plantuml
@startuml
Alice -> Bob: Hello
Bob -> Alice: Hi there!
@enduml
```Output replaces code block with rendered diagram image.
pandoc input.md -t json | clojure -M:main filter -t include | pandoc -f json -o output.htmlExample markdown:
```{.include file="chapter1.md"}
```Includes content from external file inline.
Build complete books with automatic indexing and search:
Create a polydoc.yml file with Pandoc-compatible metadata:
---
# Standard Pandoc metadata
title: "My Documentation"
author: "Your Name"
date: "2025-11-26"
lang: "en-US"
description: "Comprehensive documentation"
# Table of contents
toc: true
toc-depth: 3
# Polydoc-specific: Book configuration
book:
id: "my-docs"
version: "1.0.0"
database: "docs.db"
# Filters to apply during build
filters:
- clojure-exec
- sqlite-exec
- plantuml
- include
# Document sections (in order)
sections:
- docs/introduction.md
- docs/tutorial.md
- docs/reference.md
# Extended format with per-file options
- file: docs/advanced.md
title: "Advanced Topics"
filters:
- clojure-exec
- plantuml
---clojure -M:main book -c polydoc.yml -o output/This will:
- Load configuration from
polydoc.yml - Process each section file through configured filters
- Extract headers and content
- Insert into SQLite database with FTS5 search index
- Generate HTML output in
output/directory
- Database:
docs.db(or path from config)bookstable: Book metadatasectionstable: All extracted sections with contentsections_ftstable: FTS5 full-text search index (auto-synced via triggers)
- HTML Output:
output/my-docs.html- Combined document from all sections
- Processed through all configured filters
Search your documentation using FTS5 full-text search:
clojure -M:main search -d docs.db -q "pandoc filter"# Boolean AND (default)
clojure -M:main search -d docs.db -q "clojure AND filter"
# Boolean OR
clojure -M:main search -d docs.db -q "clojure OR python"
# Exclude terms with NOT
clojure -M:main search -d docs.db -q "filter NOT javascript"
# Exact phrase search
clojure -M:main search -d docs.db -q '"book building system"'
# Field-specific search
clojure -M:main search -d docs.db -q "title:introduction"# Show up to 20 results (default: 10)
clojure -M:main search -d docs.db -q "documentation" -l 20# Search within specific book only
clojure -M:main search -d docs.db -q "query" -b 1This tool is written in JVM Clojure and designed to be compiled to native code via GraalVM.
The Clojure REPL is the primary interface for development. There are two main namespaces: user and dev.
When you connect to a REPL, you’ll be in the user namespace. Run (dev) to load the dev namespace. The dev namespace provides functions for development workflow:
(refresh) ;; Refresh all namespaces (uses clj-reload)
(lint) ;; Lint the project with clj-kondo
(run-all) ;; Run all tests# Start nREPL server on port 7889
clojure -M:jvm-base:dev:nrepl
# In another terminal, connect with your editor
# Or use clj-nrepl-eval for command-line evaluation:
clj-nrepl-eval -p 7889 "(+ 1 2 3)"# Run all tests via Kaocha
clojure -M:dev:test
# Run via REPL
(require 'clojure.test)
(clojure.test/run-all-tests #"polydoc.*")Current status: 106 tests passing, 403 assertions, 0 failures ✅
Polydoc uses Etaoin for browser automation testing with Firefox/GeckoDriver. Browser tests verify the interactive viewer functionality.
**Prerequisites for browser tests:**
# macOS (via Homebrew)
brew install --cask firefox
brew install geckodriver
# Ubuntu/Debian
sudo apt-get install firefox
# GeckoDriver - download from GitHub releases
wget https://github.com/mozilla/geckodriver/releases/latest/download/geckodriver-linux64.tar.gz
tar -xzf geckodriver-linux64.tar.gz
sudo mv geckodriver /usr/local/bin/**Running browser tests:**
# Browser tests run as part of the full test suite
clojure -M:dev:test
# Run only viewer tests
clojure -M:dev:test --focus polydoc.viewer.server-test
# In REPL
(require '[clojure.test :as test])
(test/run-tests 'polydoc.viewer.server-test)Browser tests include:
- Page loading and navigation
- Section browsing (Previous/Next)
- Table of contents interaction
- Full-text search functionality
- Responsive layout verification
# Via command line
clojure -M:lint -m clj-kondo.main --lint src test
# Or in REPL (from dev namespace)
(lint)The project uses GitHub Actions for CI/CD:
# Run CI tasks locally with Babashka
bb ci
# This runs:
# 1. Clean build artifacts
# 2. Format checking
# 3. Code linting
# 4. Full test suite (including browser tests)GitHub Actions automatically:
- Sets up GraalVM, Clojure, Pandoc, PlantUML
- Installs Firefox and GeckoDriver for browser tests
- Caches dependencies for faster builds
- Runs all tests with coverage reporting
- Uploads test results as artifacts
- cli-matic: Command-line interface framework
- Pandoc: Document conversion and filtering
- clojure.data.json: JSON parsing for Pandoc AST
- next.jdbc: JDBC wrapper for database access
- honey.sql: SQL DSL for query generation
- SQLite: Database with FTS5 full-text search
- Malli: Schema validation for configuration
- clj-yaml: YAML configuration parsing
- http-kit: HTTP server for interactive viewer
- hiccup: HTML generation
- etaoin: Browser automation testing (Firefox/GeckoDriver)
- clj-kondo: Code linting
- kaocha: Test runner with coverage
polydoc.yml → metadata/load-metadata
↓
builder/initialize-database
↓
For each section file:
1. markdown → Pandoc AST (JSON)
2. Apply filters (clojure-exec, etc.)
3. Extract sections (headers + content)
4. Insert into database
5. FTS5 triggers auto-update search index
↓
Generate HTML output
Books Table:
- Stores book-level metadata from
polydoc.yml - Fields: book_id, title, author, description, version, metadata_json
Sections Table:
- One row per section/header in documents
- Fields: section_id, book_id, heading_text, heading_level, content_markdown, content_html, content_plain, content_hash
- Linked to book via foreign key
Sections_FTS Table:
- FTS5 virtual table for full-text search
- Auto-synced with sections table via triggers
- Indexes: heading_text, content_plain
All filters follow the same pattern:
(ns polydoc.filters.my-filter
(:require [polydoc.filters.core :as filters]))
(defn process-code-block [block]
(if (matches-filter? block)
(transform-block block)
block))
(defn my-filter [ast]
(filters/walk-pandoc-ast ast process-code-block))
(defn main [{:keys [input output]}]
(filters/run-filter input output my-filter))See the examples/ directory for:
clojure-exec-demo.md- Clojure code executionsqlite-exec-demo.md- SQLite query examplesjavascript-exec-demo.md- JavaScript executionplantuml-demo.md- Diagram renderinginclude-demo.md- Document composition
Each example includes:
- Input markdown
- Expected output
- Test JSON for validation
Phase 1: Core Infrastructure ✅
- CLI framework with cli-matic
- Filter utilities and base infrastructure
- Clojure execution filter
- Test framework and examples
Phase 2: Additional Filters ✅ (5/8)
- SQLite execution filter
- JavaScript execution filter
- PlantUML rendering filter
- Include filter for document composition
Phase 3: Book Building ✅
- Database schema with FTS5
- Metadata loading and validation
- Section extraction and indexing
- Book builder orchestration
Phase 4: Search System ✅
- FTS5 full-text search API
- Advanced search operators
- CLI search command with highlighting
Phase 5: Interactive Viewer ✅
- HTTP server with http-kit
- HTML templates with Pico CSS
- Section navigation (Previous/Next)
- Table of contents with current highlighting
- Search interface with result highlighting
- Browser automation tests with Etaoin/Firefox
Phase 6: Viewer Enhancements
- Theme switching (light/dark mode)
- Bookmarks and history
- Keyboard shortcuts
- Export functionality
Phase 6: Testing & Quality
- Expand test coverage
- Property-based tests
- Performance optimization
- Comprehensive documentation
Phase 7: GraalVM Compilation
- Native image configuration
- Resource bundling
- Platform testing
- CI/CD setup
Phase 8: Polish & Release
- Error handling improvements
- Progress indicators
- Homebrew formula
- Release documentation
Contributions are welcome! Please:
- Read
AGENTS.mdfor development guidelines - Follow the REPL-driven development workflow
- Ensure all tests pass before submitting
- Add tests for new features
- Update documentation
[License information to be added]