Skip to content
kiku edited this page Nov 1, 2025 · 3 revisions

Welcome to DocStripper Wiki

Welcome to the DocStripper documentation wiki! This wiki contains comprehensive guides, tutorials, and reference materials for using and contributing to DocStripper.

πŸ“š Documentation Pages

πŸš€ Quick Links

πŸ“– What is DocStripper?

DocStripper is an AI-powered batch document cleaner that automatically removes noise from text documents. Supports .txt, .docx, and .pdf files in both web and CLI versions:

  • Page numbers - Lines with only digits (1, 2, 3...), Roman numerals (I, II, III), or letters (A, B, C)
  • Headers/Footers - Common patterns like "Page X of Y", "Confidential", "DRAFT", "INTERNAL USE ONLY"
  • Repeating Headers/Footers - Headers/footers that appear on β‰₯70% of pages (detected automatically)
  • Duplicate lines - Consecutive identical lines
  • Empty lines - Whitespace-only lines
  • Punctuation lines - Lines with only symbols (---, ***, ===) or single bullets (β€’, *, Β·)
  • Hyphenation - Safe dehyphenation: "auto-\nmatic" β†’ "automatic" (only lowercase continuations)

Features

  • πŸ€– Smart Clean (Beta) - AI-powered cleaning using on-device LLM
    • Mode-aware: Conservative/Aggressive modes influence LLM prompts
    • Post-processing: Applies dehyphenation, merge lines, and whitespace normalization after LLM processing
  • ⚑ Fast Clean - Instant rule-based cleaning
  • πŸ›‘οΈ Conservative Mode - Safe defaults that preserve lists and tables
  • ⚑ Aggressive Mode - More aggressive cleaning with merge and whitespace normalization
  • πŸ”’ 100% Private - All processing happens in your browser
  • 🌐 Web App - No installation required
  • πŸ–₯️ CLI Tool - Command-line interface for batch processing
  • πŸ“± Mobile Responsive - Optimized for mobile devices

Getting Started

  1. Try it online: Visit https://kiku-jw.github.io/DocStripper/
  2. Read the Installation Guide: Installation
  3. Learn how to use it: Usage Guide
  4. Check out examples: See the Usage Guide for examples

Need Help?

  • Check the FAQ for common questions
  • Join Discussions
  • Open an Issue for bugs or feature requests

Made with ❀️ for clean documents

Clone this wiki locally