Skip to content

iwillig/polydoc

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Polydoc - JVM-Native Pandoc Documentation System

Pandoc is a powerful tool for parsing, combining and processing text-based documents. Pandoc also includes a filter system that allows for the transformation of documents and sub-elements.

Polydoc brings Pandoc’s filtering capabilities to the JVM/Clojure ecosystem, providing:

  • Advanced Pandoc filters compiled with GraalVM for reduced latency
  • JVM-native documentation tooling (no Python dependencies required, JavaScript via GraalJS)
  • Support for multiple document formats via Pandoc
  • SQLite-powered full-text search with FTS5
  • Book building system with automatic indexing
  • Interactive documentation viewing (coming soon)

Rationale

Documentation is a fundamental part of software engineering. There are many different tools for managing documentation, but polydoc offers several advantages:

  • JVM-Native: If you’re working on the JVM, you don’t need to bring Python or Go into your stack just for documentation. JavaScript execution uses GraalJS (included with GraalVM).
  • Pandoc Integration: Leverage Pandoc’s powerful document transformation capabilities.
  • Advanced Filters: Execute code (Clojure, SQLite, JavaScript), render diagrams (PlantUML), and compose documents (include filter).
  • Full-Text Search: Built-in SQLite FTS5 search index automatically maintained.
  • Book Building: Combine multiple documents into searchable books with metadata.

Features

Implemented ✅

Pandoc Filters

  • clojure-exec: Execute Clojure code blocks and show results
  • sqlite-exec: Run SQLite queries and format results as tables
  • javascript-exec: Execute JavaScript code blocks with GraalJS (GraalVM’s JavaScript engine)
  • plantuml: Render PlantUML diagrams to images
  • include: Compose documents from multiple source files

Book Building System

  • Load configuration from polydoc.yml (Pandoc-compatible metadata format)
  • Process multiple markdown/org files through filters
  • Extract sections and headers automatically
  • Store content in SQLite database
  • Automatic FTS5 full-text search index (via database triggers)
  • Generate HTML output

Search System

  • FTS5 full-text search across all book content
  • Boolean operators: AND, OR, NOT
  • Phrase search: "exact phrase"
  • Field-specific search: title:introduction
  • Result highlighting with context snippets
  • Result ranking by relevance

Interactive Viewer

  • HTTP-based documentation browser with http-kit
  • Section navigation (Previous/Next buttons)
  • Collapsible table of contents
  • Full-text search interface
  • Pico CSS styling for clean UI
  • Browser automation testing with Etaoin/Firefox

Coming Soon ⏳

  • Python and Shell execution filters
  • PDF and EPUB output formats
  • GraalVM native compilation
  • Viewer enhancements (themes, bookmarks, history)

Installation

Prerequisites

  • GraalVM 21 or later (includes GraalJS for JavaScript filter)
  • Clojure CLI tools
  • Pandoc 2.0 or later
  • PlantUML JAR (for PlantUML filter)

Using Clojure CLI

Clone the repository and use directly:

git clone <repository-url>
cd polydoc
clojure -M:main --help

Usage

Command Overview

polydoc filter  # Execute individual Pandoc filters
polydoc book    # Build complete books from polydoc.yml
polydoc search  # Search documentation with full-text search
polydoc view    # Interactive viewer (coming soon)

Filters

Execute individual Pandoc filters on document AST:

Clojure Execution Filter

# Process markdown through Clojure execution filter
pandoc input.md -t json | clojure -M:main filter -t clojure-exec | pandoc -f json -o output.html

Example markdown:

```{.clojure .exec}
(+ 1 2 3)
```

Output shows code and result:

```clojure
(+ 1 2 3)
;; => 6
```

SQLite Execution Filter

pandoc input.md -t json | clojure -M:main filter -t sqlite-exec | pandoc -f json -o output.html

Example markdown:

```{.sqlite .exec database="mydata.db"}
SELECT name, age FROM users ORDER BY age DESC LIMIT 5;
```

Output shows query and results as a formatted table.

JavaScript Execution Filter

pandoc input.md -t json | clojure -M:main filter -t javascript-exec | pandoc -f json -o output.html

Example markdown:

```{.javascript .exec}
const sum = [1, 2, 3, 4, 5].reduce((a, b) => a + b, 0);
console.log("Sum:", sum);
```

PlantUML Rendering Filter

pandoc input.md -t json | clojure -M:main filter -t plantuml | pandoc -f json -o output.html

Example markdown:

```plantuml
@startuml
Alice -> Bob: Hello
Bob -> Alice: Hi there!
@enduml
```

Output replaces code block with rendered diagram image.

Include Filter

pandoc input.md -t json | clojure -M:main filter -t include | pandoc -f json -o output.html

Example markdown:

```{.include file="chapter1.md"}
```

Includes content from external file inline.

Book Building

Build complete books with automatic indexing and search:

Configuration: polydoc.yml

Create a polydoc.yml file with Pandoc-compatible metadata:

---
# Standard Pandoc metadata
title: "My Documentation"
author: "Your Name"
date: "2025-11-26"
lang: "en-US"
description: "Comprehensive documentation"

# Table of contents
toc: true
toc-depth: 3

# Polydoc-specific: Book configuration
book:
  id: "my-docs"
  version: "1.0.0"
  database: "docs.db"
  
  # Filters to apply during build
  filters:
    - clojure-exec
    - sqlite-exec
    - plantuml
    - include

# Document sections (in order)
sections:
  - docs/introduction.md
  - docs/tutorial.md
  - docs/reference.md
  
  # Extended format with per-file options
  - file: docs/advanced.md
    title: "Advanced Topics"
    filters:
      - clojure-exec
      - plantuml
---

Build Command

clojure -M:main book -c polydoc.yml -o output/

This will:

  1. Load configuration from polydoc.yml
  2. Process each section file through configured filters
  3. Extract headers and content
  4. Insert into SQLite database with FTS5 search index
  5. Generate HTML output in output/ directory

What Gets Created

  • Database: docs.db (or path from config)
    • books table: Book metadata
    • sections table: All extracted sections with content
    • sections_fts table: FTS5 full-text search index (auto-synced via triggers)
  • HTML Output: output/my-docs.html
    • Combined document from all sections
    • Processed through all configured filters

Search

Search your documentation using FTS5 full-text search:

Basic Search

clojure -M:main search -d docs.db -q "pandoc filter"

Search with Operators

# Boolean AND (default)
clojure -M:main search -d docs.db -q "clojure AND filter"

# Boolean OR
clojure -M:main search -d docs.db -q "clojure OR python"

# Exclude terms with NOT
clojure -M:main search -d docs.db -q "filter NOT javascript"

# Exact phrase search
clojure -M:main search -d docs.db -q '"book building system"'

# Field-specific search
clojure -M:main search -d docs.db -q "title:introduction"

Limit Results

# Show up to 20 results (default: 10)
clojure -M:main search -d docs.db -q "documentation" -l 20

Filter by Book

# Search within specific book only
clojure -M:main search -d docs.db -q "query" -b 1

Development

This tool is written in JVM Clojure and designed to be compiled to native code via GraalVM.

Development Environment

The Clojure REPL is the primary interface for development. There are two main namespaces: user and dev.

When you connect to a REPL, you’ll be in the user namespace. Run (dev) to load the dev namespace. The dev namespace provides functions for development workflow:

(refresh) ;; Refresh all namespaces (uses clj-reload)
(lint)    ;; Lint the project with clj-kondo
(run-all) ;; Run all tests

Starting the REPL

# Start nREPL server on port 7889
clojure -M:jvm-base:dev:nrepl

# In another terminal, connect with your editor
# Or use clj-nrepl-eval for command-line evaluation:
clj-nrepl-eval -p 7889 "(+ 1 2 3)"

Running Tests

# Run all tests via Kaocha
clojure -M:dev:test

# Run via REPL
(require 'clojure.test)
(clojure.test/run-all-tests #"polydoc.*")

Current status: 106 tests passing, 403 assertions, 0 failures

Browser Testing

Polydoc uses Etaoin for browser automation testing with Firefox/GeckoDriver. Browser tests verify the interactive viewer functionality.

**Prerequisites for browser tests:**

# macOS (via Homebrew)
brew install --cask firefox
brew install geckodriver

# Ubuntu/Debian
sudo apt-get install firefox
# GeckoDriver - download from GitHub releases
wget https://github.com/mozilla/geckodriver/releases/latest/download/geckodriver-linux64.tar.gz
tar -xzf geckodriver-linux64.tar.gz
sudo mv geckodriver /usr/local/bin/

**Running browser tests:**

# Browser tests run as part of the full test suite
clojure -M:dev:test

# Run only viewer tests
clojure -M:dev:test --focus polydoc.viewer.server-test

# In REPL
(require '[clojure.test :as test])
(test/run-tests 'polydoc.viewer.server-test)

Browser tests include:

  • Page loading and navigation
  • Section browsing (Previous/Next)
  • Table of contents interaction
  • Full-text search functionality
  • Responsive layout verification

Linting

# Via command line
clojure -M:lint -m clj-kondo.main --lint src test

# Or in REPL (from dev namespace)
(lint)

Continuous Integration

The project uses GitHub Actions for CI/CD:

# Run CI tasks locally with Babashka
bb ci

# This runs:
# 1. Clean build artifacts
# 2. Format checking
# 3. Code linting
# 4. Full test suite (including browser tests)

GitHub Actions automatically:

  • Sets up GraalVM, Clojure, Pandoc, PlantUML
  • Installs Firefox and GeckoDriver for browser tests
  • Caches dependencies for faster builds
  • Runs all tests with coverage reporting
  • Uploads test results as artifacts

Libraries Used

  • cli-matic: Command-line interface framework
  • Pandoc: Document conversion and filtering
  • clojure.data.json: JSON parsing for Pandoc AST
  • next.jdbc: JDBC wrapper for database access
  • honey.sql: SQL DSL for query generation
  • SQLite: Database with FTS5 full-text search
  • Malli: Schema validation for configuration
  • clj-yaml: YAML configuration parsing
  • http-kit: HTTP server for interactive viewer
  • hiccup: HTML generation
  • etaoin: Browser automation testing (Firefox/GeckoDriver)
  • clj-kondo: Code linting
  • kaocha: Test runner with coverage

Architecture

Data Flow: Book Building

polydoc.yml → metadata/load-metadata
              ↓
           builder/initialize-database
              ↓
    For each section file:
      1. markdown → Pandoc AST (JSON)
      2. Apply filters (clojure-exec, etc.)
      3. Extract sections (headers + content)
      4. Insert into database
      5. FTS5 triggers auto-update search index
              ↓
           Generate HTML output

Database Schema

Books Table:

  • Stores book-level metadata from polydoc.yml
  • Fields: book_id, title, author, description, version, metadata_json

Sections Table:

  • One row per section/header in documents
  • Fields: section_id, book_id, heading_text, heading_level, content_markdown, content_html, content_plain, content_hash
  • Linked to book via foreign key

Sections_FTS Table:

  • FTS5 virtual table for full-text search
  • Auto-synced with sections table via triggers
  • Indexes: heading_text, content_plain

Filter Architecture

All filters follow the same pattern:

(ns polydoc.filters.my-filter
  (:require [polydoc.filters.core :as filters]))

(defn process-code-block [block]
  (if (matches-filter? block)
    (transform-block block)
    block))

(defn my-filter [ast]
  (filters/walk-pandoc-ast ast process-code-block))

(defn main [{:keys [input output]}]
  (filters/run-filter input output my-filter))

Examples

See the examples/ directory for:

  • clojure-exec-demo.md - Clojure code execution
  • sqlite-exec-demo.md - SQLite query examples
  • javascript-exec-demo.md - JavaScript execution
  • plantuml-demo.md - Diagram rendering
  • include-demo.md - Document composition

Each example includes:

  • Input markdown
  • Expected output
  • Test JSON for validation

Project Status

Completed (52% - 25/48 tasks)

Phase 1: Core Infrastructure

  • CLI framework with cli-matic
  • Filter utilities and base infrastructure
  • Clojure execution filter
  • Test framework and examples

Phase 2: Additional Filters ✅ (5/8)

  • SQLite execution filter
  • JavaScript execution filter
  • PlantUML rendering filter
  • Include filter for document composition

Phase 3: Book Building

  • Database schema with FTS5
  • Metadata loading and validation
  • Section extraction and indexing
  • Book builder orchestration

Phase 4: Search System

  • FTS5 full-text search API
  • Advanced search operators
  • CLI search command with highlighting

Phase 5: Interactive Viewer

  • HTTP server with http-kit
  • HTML templates with Pico CSS
  • Section navigation (Previous/Next)
  • Table of contents with current highlighting
  • Search interface with result highlighting
  • Browser automation tests with Etaoin/Firefox

In Progress

Phase 6: Viewer Enhancements

  • Theme switching (light/dark mode)
  • Bookmarks and history
  • Keyboard shortcuts
  • Export functionality

Planned

Phase 6: Testing & Quality

  • Expand test coverage
  • Property-based tests
  • Performance optimization
  • Comprehensive documentation

Phase 7: GraalVM Compilation

  • Native image configuration
  • Resource bundling
  • Platform testing
  • CI/CD setup

Phase 8: Polish & Release

  • Error handling improvements
  • Progress indicators
  • Homebrew formula
  • Release documentation

Contributing

Contributions are welcome! Please:

  1. Read AGENTS.md for development guidelines
  2. Follow the REPL-driven development workflow
  3. Ensure all tests pass before submitting
  4. Add tests for new features
  5. Update documentation

License

[License information to be added]

Resources

About

A tool chain for working with pandoc in Cloujure.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published