Polydoc - JVM-Native Pandoc Documentation System

Pandoc is a powerful tool for parsing, combining and processing text-based documents. Pandoc also includes a filter system that allows for the transformation of documents and sub-elements.

Polydoc brings Pandoc’s filtering capabilities to the JVM/Clojure ecosystem, providing:

Advanced Pandoc filters compiled with GraalVM for reduced latency
JVM-native documentation tooling (no Python dependencies required, JavaScript via GraalJS)
Support for multiple document formats via Pandoc
SQLite-powered full-text search with FTS5
Book building system with automatic indexing
Interactive documentation viewing (coming soon)

Rationale

Documentation is a fundamental part of software engineering. There are many different tools for managing documentation, but polydoc offers several advantages:

JVM-Native: If you’re working on the JVM, you don’t need to bring Python or Go into your stack just for documentation. JavaScript execution uses GraalJS (included with GraalVM).
Pandoc Integration: Leverage Pandoc’s powerful document transformation capabilities.
Advanced Filters: Execute code (Clojure, SQLite, JavaScript), render diagrams (PlantUML), and compose documents (include filter).
Full-Text Search: Built-in SQLite FTS5 search index automatically maintained.
Book Building: Combine multiple documents into searchable books with metadata.

Features

Implemented ✅

Pandoc Filters

clojure-exec: Execute Clojure code blocks and show results
sqlite-exec: Run SQLite queries and format results as tables
javascript-exec: Execute JavaScript code blocks with GraalJS (GraalVM’s JavaScript engine)
plantuml: Render PlantUML diagrams to images
include: Compose documents from multiple source files

Book Building System

Load configuration from polydoc.yml (Pandoc-compatible metadata format)
Process multiple markdown/org files through filters
Extract sections and headers automatically
Store content in SQLite database
Automatic FTS5 full-text search index (via database triggers)
Generate HTML output

Search System

FTS5 full-text search across all book content
Boolean operators: AND, OR, NOT
Phrase search: "exact phrase"
Field-specific search: title:introduction
Result highlighting with context snippets
Result ranking by relevance

Interactive Viewer

HTTP-based documentation browser with http-kit
Section navigation (Previous/Next buttons)
Collapsible table of contents
Full-text search interface
Pico CSS styling for clean UI
Browser automation testing with Etaoin/Firefox

Coming Soon ⏳

Python and Shell execution filters
PDF and EPUB output formats
GraalVM native compilation
Viewer enhancements (themes, bookmarks, history)

Installation

Prerequisites

GraalVM 21 or later (includes GraalJS for JavaScript filter)
Clojure CLI tools
Pandoc 2.0 or later
PlantUML JAR (for PlantUML filter)

Using Clojure CLI

Clone the repository and use directly:

git clone <repository-url>
cd polydoc
clojure -M:main --help

Usage

Command Overview

polydoc filter  # Execute individual Pandoc filters
polydoc book    # Build complete books from polydoc.yml
polydoc search  # Search documentation with full-text search
polydoc view    # Interactive viewer (coming soon)

Filters

Execute individual Pandoc filters on document AST:

Clojure Execution Filter

# Process markdown through Clojure execution filter
pandoc input.md -t json | clojure -M:main filter -t clojure-exec | pandoc -f json -o output.html

Example markdown:

```{.clojure .exec}
(+ 1 2 3)
```

Output shows code and result:

```clojure
(+ 1 2 3)
;; => 6
```

SQLite Execution Filter

pandoc input.md -t json | clojure -M:main filter -t sqlite-exec | pandoc -f json -o output.html

Example markdown:

```{.sqlite .exec database="mydata.db"}
SELECT name, age FROM users ORDER BY age DESC LIMIT 5;
```

Output shows query and results as a formatted table.

JavaScript Execution Filter

pandoc input.md -t json | clojure -M:main filter -t javascript-exec | pandoc -f json -o output.html

Example markdown:

```{.javascript .exec}
const sum = [1, 2, 3, 4, 5].reduce((a, b) => a + b, 0);
console.log("Sum:", sum);
```

PlantUML Rendering Filter

pandoc input.md -t json | clojure -M:main filter -t plantuml | pandoc -f json -o output.html

Example markdown:

```plantuml
@startuml
Alice -> Bob: Hello
Bob -> Alice: Hi there!
@enduml
```

Output replaces code block with rendered diagram image.

Include Filter

pandoc input.md -t json | clojure -M:main filter -t include | pandoc -f json -o output.html

Example markdown:

```{.include file="chapter1.md"}
```

Includes content from external file inline.

Book Building

Build complete books with automatic indexing and search:

Configuration: polydoc.yml

Create a polydoc.yml file with Pandoc-compatible metadata:

---
# Standard Pandoc metadata
title: "My Documentation"
author: "Your Name"
date: "2025-11-26"
lang: "en-US"
description: "Comprehensive documentation"

# Table of contents
toc: true
toc-depth: 3

# Polydoc-specific: Book configuration
book:
  id: "my-docs"
  version: "1.0.0"
  database: "docs.db"
  
  # Filters to apply during build
  filters:
    - clojure-exec
    - sqlite-exec
    - plantuml
    - include

# Document sections (in order)
sections:
  - docs/introduction.md
  - docs/tutorial.md
  - docs/reference.md
  
  # Extended format with per-file options
  - file: docs/advanced.md
    title: "Advanced Topics"
    filters:
      - clojure-exec
      - plantuml
---

Build Command

clojure -M:main book -c polydoc.yml -o output/

This will:

Load configuration from polydoc.yml
Process each section file through configured filters
Extract headers and content
Insert into SQLite database with FTS5 search index
Generate HTML output in output/ directory

What Gets Created

Database: docs.db (or path from config)
- books table: Book metadata
- sections table: All extracted sections with content
- sections_fts table: FTS5 full-text search index (auto-synced via triggers)
HTML Output: output/my-docs.html
- Combined document from all sections
- Processed through all configured filters

Search

Search your documentation using FTS5 full-text search:

Basic Search

clojure -M:main search -d docs.db -q "pandoc filter"

Search with Operators

# Boolean AND (default)
clojure -M:main search -d docs.db -q "clojure AND filter"

# Boolean OR
clojure -M:main search -d docs.db -q "clojure OR python"

# Exclude terms with NOT
clojure -M:main search -d docs.db -q "filter NOT javascript"

# Exact phrase search
clojure -M:main search -d docs.db -q '"book building system"'

# Field-specific search
clojure -M:main search -d docs.db -q "title:introduction"

Limit Results

# Show up to 20 results (default: 10)
clojure -M:main search -d docs.db -q "documentation" -l 20

Filter by Book

# Search within specific book only
clojure -M:main search -d docs.db -q "query" -b 1

Development

This tool is written in JVM Clojure and designed to be compiled to native code via GraalVM.

Development Environment

The Clojure REPL is the primary interface for development. There are two main namespaces: user and dev.

When you connect to a REPL, you’ll be in the user namespace. Run (dev) to load the dev namespace. The dev namespace provides functions for development workflow:

(refresh) ;; Refresh all namespaces (uses clj-reload)
(lint)    ;; Lint the project with clj-kondo
(run-all) ;; Run all tests

Starting the REPL

# Start nREPL server on port 7889
clojure -M:jvm-base:dev:nrepl

# In another terminal, connect with your editor
# Or use clj-nrepl-eval for command-line evaluation:
clj-nrepl-eval -p 7889 "(+ 1 2 3)"

Running Tests

# Run all tests via Kaocha
clojure -M:dev:test

# Run via REPL
(require 'clojure.test)
(clojure.test/run-all-tests #"polydoc.*")

Current status: 106 tests passing, 403 assertions, 0 failures ✅

Browser Testing

Polydoc uses Etaoin for browser automation testing with Firefox/GeckoDriver. Browser tests verify the interactive viewer functionality.

**Prerequisites for browser tests:**

# macOS (via Homebrew)
brew install --cask firefox
brew install geckodriver

# Ubuntu/Debian
sudo apt-get install firefox
# GeckoDriver - download from GitHub releases
wget https://github.com/mozilla/geckodriver/releases/latest/download/geckodriver-linux64.tar.gz
tar -xzf geckodriver-linux64.tar.gz
sudo mv geckodriver /usr/local/bin/

**Running browser tests:**

# Browser tests run as part of the full test suite
clojure -M:dev:test

# Run only viewer tests
clojure -M:dev:test --focus polydoc.viewer.server-test

# In REPL
(require '[clojure.test :as test])
(test/run-tests 'polydoc.viewer.server-test)

Browser tests include:

Page loading and navigation
Section browsing (Previous/Next)
Table of contents interaction
Full-text search functionality
Responsive layout verification

Linting

# Via command line
clojure -M:lint -m clj-kondo.main --lint src test

# Or in REPL (from dev namespace)
(lint)

Continuous Integration

The project uses GitHub Actions for CI/CD:

# Run CI tasks locally with Babashka
bb ci

# This runs:
# 1. Clean build artifacts
# 2. Format checking
# 3. Code linting
# 4. Full test suite (including browser tests)

GitHub Actions automatically:

Sets up GraalVM, Clojure, Pandoc, PlantUML
Installs Firefox and GeckoDriver for browser tests
Caches dependencies for faster builds
Runs all tests with coverage reporting
Uploads test results as artifacts

Libraries Used

cli-matic: Command-line interface framework
Pandoc: Document conversion and filtering
clojure.data.json: JSON parsing for Pandoc AST
next.jdbc: JDBC wrapper for database access
honey.sql: SQL DSL for query generation
SQLite: Database with FTS5 full-text search
Malli: Schema validation for configuration
clj-yaml: YAML configuration parsing
http-kit: HTTP server for interactive viewer
hiccup: HTML generation
etaoin: Browser automation testing (Firefox/GeckoDriver)
clj-kondo: Code linting
kaocha: Test runner with coverage

Architecture

Data Flow: Book Building

polydoc.yml → metadata/load-metadata
              ↓
           builder/initialize-database
              ↓
    For each section file:
      1. markdown → Pandoc AST (JSON)
      2. Apply filters (clojure-exec, etc.)
      3. Extract sections (headers + content)
      4. Insert into database
      5. FTS5 triggers auto-update search index
              ↓
           Generate HTML output

Database Schema

Books Table:

Stores book-level metadata from polydoc.yml
Fields: book_id, title, author, description, version, metadata_json

Sections Table:

One row per section/header in documents
Fields: section_id, book_id, heading_text, heading_level, content_markdown, content_html, content_plain, content_hash
Linked to book via foreign key

Sections_FTS Table:

FTS5 virtual table for full-text search
Auto-synced with sections table via triggers
Indexes: heading_text, content_plain

Filter Architecture

All filters follow the same pattern:

(ns polydoc.filters.my-filter
  (:require [polydoc.filters.core :as filters]))

(defn process-code-block [block]
  (if (matches-filter? block)
    (transform-block block)
    block))

(defn my-filter [ast]
  (filters/walk-pandoc-ast ast process-code-block))

(defn main [{:keys [input output]}]
  (filters/run-filter input output my-filter))

Examples

See the examples/ directory for:

clojure-exec-demo.md - Clojure code execution
sqlite-exec-demo.md - SQLite query examples
javascript-exec-demo.md - JavaScript execution
plantuml-demo.md - Diagram rendering
include-demo.md - Document composition

Each example includes:

Input markdown
Expected output
Test JSON for validation

Project Status

Completed (52% - 25/48 tasks)

Phase 1: Core Infrastructure ✅

CLI framework with cli-matic
Filter utilities and base infrastructure
Clojure execution filter
Test framework and examples

Phase 2: Additional Filters ✅ (5/8)

SQLite execution filter
JavaScript execution filter
PlantUML rendering filter
Include filter for document composition

Phase 3: Book Building ✅

Database schema with FTS5
Metadata loading and validation
Section extraction and indexing
Book builder orchestration

Phase 4: Search System ✅

FTS5 full-text search API
Advanced search operators
CLI search command with highlighting

Phase 5: Interactive Viewer ✅

HTTP server with http-kit
HTML templates with Pico CSS
Section navigation (Previous/Next)
Table of contents with current highlighting
Search interface with result highlighting
Browser automation tests with Etaoin/Firefox

In Progress

Phase 6: Viewer Enhancements

Theme switching (light/dark mode)
Bookmarks and history
Keyboard shortcuts
Export functionality

Planned

Phase 6: Testing & Quality

Expand test coverage
Property-based tests
Performance optimization
Comprehensive documentation

Phase 7: GraalVM Compilation

Native image configuration
Resource bundling
Platform testing
CI/CD setup

Phase 8: Polish & Release

Error handling improvements
Progress indicators
Homebrew formula
Release documentation

Contributing

Contributions are welcome! Please:

Read AGENTS.md for development guidelines
Follow the REPL-driven development workflow
Ensure all tests pass before submitting
Add tests for new features
Update documentation

License

[License information to be added]

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
.clj-kondo/clj-commons/etaoin		.clj-kondo/clj-commons/etaoin
.github/workflows		.github/workflows
dev		dev
examples		examples
resources/migrations		resources/migrations
src/polydoc		src/polydoc
test		test
.cljstyle		.cljstyle
.gitignore		.gitignore
.sdkmanrc		.sdkmanrc
COMPLETION_PLAN.md		COMPLETION_PLAN.md
DEVELOPMENT_PLAN.md		DEVELOPMENT_PLAN.md
POLYDOC_SKILL.md		POLYDOC_SKILL.md
PROGRESS.md		PROGRESS.md
PROJECT_REVIEW.md		PROJECT_REVIEW.md
REVIEW_SUMMARY.md		REVIEW_SUMMARY.md
SESSION_SUMMARY.md		SESSION_SUMMARY.md
agents.md		agents.md
bb.edn		bb.edn
clojure_build.md		clojure_build.md
deps.edn		deps.edn
opencode.json		opencode.json
plan_15.json		plan_15.json
polydoc-skills.db		polydoc-skills.db
readme.org		readme.org
tests.edn		tests.edn

iwillig/polydoc

Folders and files

Latest commit

History

Repository files navigation

Polydoc - JVM-Native Pandoc Documentation System

Rationale

Features

Implemented ✅

Pandoc Filters

Book Building System

Search System

Interactive Viewer

Coming Soon ⏳

Installation

Prerequisites

Using Clojure CLI

Usage

Command Overview

Filters

Clojure Execution Filter

SQLite Execution Filter

JavaScript Execution Filter

PlantUML Rendering Filter

Include Filter

Book Building

Configuration: polydoc.yml

Build Command

What Gets Created

Search

Basic Search

Search with Operators

Limit Results

Filter by Book

Development

Development Environment

Starting the REPL

Running Tests

Browser Testing

Linting

Continuous Integration

Libraries Used

Architecture

Data Flow: Book Building

Database Schema

Filter Architecture

Examples

Project Status

Completed (52% - 25/48 tasks)

In Progress

Planned

Contributing

License

Resources

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages