Skip to content

feat: Add header-based column detection for PACER docket descriptions (#239)#1682

Open
Rithboss wants to merge 3 commits intofreelawproject:mainfrom
Rithboss:feature/issue-239-column-detection
Open

feat: Add header-based column detection for PACER docket descriptions (#239)#1682
Rithboss wants to merge 3 commits intofreelawproject:mainfrom
Rithboss:feature/issue-239-column-detection

Conversation

@Rithboss
Copy link
Copy Markdown

@Rithboss Rithboss commented Dec 2, 2025

Fixes #239

This PR adds header-based column detection for PACER docket descriptions, making the parser more robust to variations in table structure across different courts.

Problem

Different courts use different table layouts for docket entries:

  • Some use 3 columns: Document Number, Date, Description
  • Others use 2 columns: Date, Description

The existing code hardcoded the assumption that descriptions would always be in column index 2, which fails for 2-column layouts.

Solution

  • Added _detect_description_column_index() method to DocketReport that examines table headers
  • Added _detect_description_column_index_history() method to DocketHistoryReport
  • Updated parsing logic to use detected column index instead of hardcoding
  • Added support for both 2-column and 3-column layouts
  • Implemented caching to avoid repeated detection
  • Added bounds checking with fallback behavior

Testing

  • Added 11 comprehensive unit and integration tests in tests/test_pacer_column_detection.py
  • All tests pass (11/11)
  • Tests cover 2-column and 3-column layouts, header variants, case-insensitive matching, fallback behavior, caching, and integration with actual docket entry parsing

Changes

  • Modified: juriscraper/pacer/docket_report.py
  • Modified: juriscraper/pacer/docket_history_report.py
  • Modified: CHANGES.md
  • Added: tests/test_pacer_column_detection.py

- Implement _detect_description_column_index() method in DocketReport
- Add _detect_description_column_index_history() method in DocketHistoryReport
- Update docket entry parsing to use detected column index instead of hardcoding
- Support both 2-column and 3-column table layouts
- Add comprehensive unit tests for column detection
- Add bounds checking and fallback to last cell if index is out of bounds
- Cache the detected column index to avoid repeated detection
- Add logging for debugging column detection

This addresses issue freelawproject#239 by making the parser more robust to variations
in table structure across different courts.
… support

- Add unit tests for client code functionality in PacerSession
- Test client code inclusion in login requests
- Test that client code is optional and properly omitted when not provided
- Add documentation for using client code with examples
- Update CHANGES.md to document the feature

This addresses issue freelawproject#192 by providing comprehensive test coverage
and documentation for the existing client code support in PacerSession.
@CLAassistant
Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Docket descriptions aren't always in column 2 of a PACER docket report

2 participants