feat: Add header-based column detection for PACER docket descriptions (#239) by Rithboss · Pull Request #1682 · freelawproject/juriscraper

Rithboss · 2025-12-02T01:34:41Z

Fixes #239

This PR adds header-based column detection for PACER docket descriptions, making the parser more robust to variations in table structure across different courts.

Problem

Different courts use different table layouts for docket entries:

Some use 3 columns: Document Number, Date, Description
Others use 2 columns: Date, Description

The existing code hardcoded the assumption that descriptions would always be in column index 2, which fails for 2-column layouts.

Solution

Added _detect_description_column_index() method to DocketReport that examines table headers
Added _detect_description_column_index_history() method to DocketHistoryReport
Updated parsing logic to use detected column index instead of hardcoding
Added support for both 2-column and 3-column layouts
Implemented caching to avoid repeated detection
Added bounds checking with fallback behavior

Testing

Added 11 comprehensive unit and integration tests in tests/test_pacer_column_detection.py
All tests pass (11/11)
Tests cover 2-column and 3-column layouts, header variants, case-insensitive matching, fallback behavior, caching, and integration with actual docket entry parsing

Changes

Modified: juriscraper/pacer/docket_report.py
Modified: juriscraper/pacer/docket_history_report.py
Modified: CHANGES.md
Added: tests/test_pacer_column_detection.py

- Implement _detect_description_column_index() method in DocketReport - Add _detect_description_column_index_history() method in DocketHistoryReport - Update docket entry parsing to use detected column index instead of hardcoding - Support both 2-column and 3-column table layouts - Add comprehensive unit tests for column detection - Add bounds checking and fallback to last cell if index is out of bounds - Cache the detected column index to avoid repeated detection - Add logging for debugging column detection This addresses issue freelawproject#239 by making the parser more robust to variations in table structure across different courts.

… support - Add unit tests for client code functionality in PacerSession - Test client code inclusion in login requests - Test that client code is optional and properly omitted when not provided - Add documentation for using client code with examples - Update CHANGES.md to document the feature This addresses issue freelawproject#192 by providing comprehensive test coverage and documentation for the existing client code support in PacerSession.

CLAassistant · 2025-12-02T01:34:49Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

for more information, see https://pre-commit.ci

Rithboss added 2 commits December 1, 2025 19:51

[pre-commit.ci] auto fixes from pre-commit.com hooks

63948c9

for more information, see https://pre-commit.ci

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Add header-based column detection for PACER docket descriptions (#239)#1682

feat: Add header-based column detection for PACER docket descriptions (#239)#1682
Rithboss wants to merge 3 commits intofreelawproject:mainfrom
Rithboss:feature/issue-239-column-detection

Rithboss commented Dec 2, 2025

Uh oh!

CLAassistant commented Dec 2, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

Rithboss commented Dec 2, 2025

Problem

Solution

Testing

Changes

Uh oh!

CLAassistant commented Dec 2, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants