-
-
Notifications
You must be signed in to change notification settings - Fork 1
Home
kiku edited this page Nov 1, 2025
·
3 revisions
Welcome to the DocStripper documentation wiki! This wiki contains comprehensive guides, tutorials, and reference materials for using and contributing to DocStripper.
- Installation Guide - How to install and set up DocStripper
- Usage Guide - How to use DocStripper web app and CLI tool
- API Documentation - API reference for developers
- Contributing Guide - How to contribute to DocStripper
- FAQ - Frequently asked questions
- Web Application: https://kiku-jw.github.io/DocStripper/
- GitHub Repository: https://github.com/kiku-jw/DocStripper
- Issues: https://github.com/kiku-jw/DocStripper/issues
- Discussions: https://github.com/kiku-jw/DocStripper/discussions
DocStripper is an AI-powered batch document cleaner that automatically removes noise from text documents. Supports .txt, .docx, and .pdf files in both web and CLI versions:
- Page numbers - Lines with only digits (1, 2, 3...), Roman numerals (I, II, III), or letters (A, B, C)
- Headers/Footers - Common patterns like "Page X of Y", "Confidential", "DRAFT", "INTERNAL USE ONLY"
- Repeating Headers/Footers - Headers/footers that appear on β₯70% of pages (detected automatically)
- Duplicate lines - Consecutive identical lines
- Empty lines - Whitespace-only lines
- Punctuation lines - Lines with only symbols (---, ***, ===) or single bullets (β’, *, Β·)
- Hyphenation - Safe dehyphenation: "auto-\nmatic" β "automatic" (only lowercase continuations)
- π€ Smart Clean (Beta) - AI-powered cleaning using on-device LLM
- Mode-aware: Conservative/Aggressive modes influence LLM prompts
- Post-processing: Applies dehyphenation, merge lines, and whitespace normalization after LLM processing
- β‘ Fast Clean - Instant rule-based cleaning
- π‘οΈ Conservative Mode - Safe defaults that preserve lists and tables
- β‘ Aggressive Mode - More aggressive cleaning with merge and whitespace normalization
- π 100% Private - All processing happens in your browser
- π Web App - No installation required
- π₯οΈ CLI Tool - Command-line interface for batch processing
- π± Mobile Responsive - Optimized for mobile devices
- Try it online: Visit https://kiku-jw.github.io/DocStripper/
- Read the Installation Guide: Installation
- Learn how to use it: Usage Guide
- Check out examples: See the Usage Guide for examples
- Check the FAQ for common questions
- Join Discussions
- Open an Issue for bugs or feature requests
Made with β€οΈ for clean documents