Skip to content

feat(detect): Filter CM/ECF header stamps#207

Open
mlissner wants to merge 3 commits into
fix-zero-height-pixmap-segfault-20260402from
filter-cmecf-header-stamps-20260407
Open

feat(detect): Filter CM/ECF header stamps#207
mlissner wants to merge 3 commits into
fix-zero-height-pixmap-segfault-20260402from
filter-cmecf-header-stamps-20260407

Conversation

@mlissner
Copy link
Copy Markdown
Member

@mlissner mlissner commented Apr 7, 2026

Summary

  • Filter CM/ECF header stamps (case number, doc number, filing date, page number) that courts add to every page of a filing
  • Uses a triple gate (all three must match) to avoid false filtering:
    1. Position — text must be in the top ~43 points of the page (y < 20 for the ca5 font exception)
    2. Font — LiberationSans (the standard CM/ECF stamp font), or any font at y < 20
    3. Content — text must match (Doc|Document|DktEntry)...Filed...Page regex
  • Extracts get_content_spans() to separate header filtering from intersection detection
  • Updates get_intersecting_chars() to accept pre-filtered spans instead of fetching them internally

Test plan

🤖 Generated with Claude Code

mlissner and others added 3 commits April 7, 2026 02:26
Extract span retrieval into get_content_spans() which filters out
court header stamps before the intersection stage. Stamps are
identified by a triple gate: position (top of page), font
(LiberationSans, or any font at y<20 for ca5), and content
(Doc/Document/DktEntry + Filed + Page regex).

Also updates get_intersecting_chars() to accept pre-filtered spans
instead of fetching them internally, separating concerns.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@mlissner mlissner marked this pull request as ready for review April 7, 2026 09:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant