Skip to content

Add get_scrubber_for_format API to DateScrubber #208

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

nitsanavni
Copy link
Contributor

@nitsanavni nitsanavni commented May 27, 2025

Summary

• Added new get_scrubber_for_format() API that accepts datetime format strings directly
• More intuitive than requiring example dates - users can specify patterns like %Y%m%d_%H%M%S
• Fully tested with approval tests following the same pattern as existing tests
• Works seamlessly with the Options pattern

Key Benefits

Intuitive: Direct format specification vs providing example dates
Familiar: Uses standard Python datetime format codes
Flexible: Any datetime format pattern supported by the internal engine
Consistent: Integrates with existing Options().with_scrubber() pattern

API Usage

# New intuitive API - specify the format directly
scrubber = DateScrubber.get_scrubber_for_format('%Y%m%d_%H%M%S')

# Works with Options pattern
verify(
    "Event at 20250527_125703",
    options=Options().with_scrubber(DateScrubber.get_scrubber_for_format('%Y%m%d_%H%M%S'))
)

# vs existing API - requires example date
scrubber = DateScrubber.get_scrubber_for('20250527_125703')

Test plan

  • New API creates working scrubbers for various datetime formats
  • Scrubbers correctly replace dates with <date0> placeholders
  • API works with Options pattern like existing scrubbers
  • All tests use approval pattern for consistency
  • All existing tests still pass

Builds on #207

🤖 Generated with Claude Code

Summary by Sourcery

Add an API to generate date scrubbers from datetime format strings, refactor internal format handling to use explicit format definitions with automatic regex conversion, and extend tests to cover the new usage and updated patterns.

New Features:

  • Add get_scrubber_for_format method to create scrubbers from datetime format strings directly
  • Allow users to specify standard Python datetime patterns (e.g. '%Y%m%d_%H%M%S') instead of example dates

Enhancements:

  • Consolidate supported formats into an internal _get_internal_formats list with parsing and display examples
  • Automate conversion of datetime format specifiers into regex patterns
  • Refactor get_supported_formats to derive regex patterns from datetime formats for external API compatibility

Tests:

  • Add approval tests for the new get_scrubber_for_format API and its integration with the Options pattern
  • Update existing tests to validate parsing examples via _get_internal_formats and refresh the approved regex table

nitsanavni and others added 2 commits May 27, 2025 15:54
- Uses datetime.strptime() for robust date parsing instead of complex regex patterns
- Internal implementation now uses readable format strings like %Y%m%d_%H%M%S
- External API maintains backward compatibility with regex patterns
- Added support for YYYYMMDD_HHMMSS format (20250527_125703) from issue #124
- Easier to maintain: adding new date formats now requires only datetime format strings
- All existing functionality preserved, all tests pass

Related to #124

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
- Added new API method that accepts datetime format strings directly
- Users can now create scrubbers with patterns like '%Y%m%d_%H%M%S'
- More intuitive than having to provide example dates
- Full test coverage with approval tests
- Works seamlessly with Options pattern

Example usage:
  scrubber = DateScrubber.get_scrubber_for_format('%Y%m%d_%H%M%S')
  verify(text, options=Options().with_scrubber(scrubber))

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
Copy link

sourcery-ai bot commented May 27, 2025

Reviewer's Guide

This PR refactors DateScrubber to centralize format definitions and enable scrubbing via explicit datetime format strings, adds a new get_scrubber_for_format API, converts format strings to regex under the hood, updates get_scrubber_for to use datetime parsing against internal formats, and expands test coverage to validate the new API.

Sequence Diagram for new DateScrubber.get_scrubber_for_format() API

sequenceDiagram
    actor User
    User->>+DateScrubber: get_scrubber_for_format(dateFormat)
    Note left of User: Developer calls new API with format string
    DateScrubber->>+DateScrubber: __init__(dateFormat)
    Note right of DateScrubber: Instantiates DateScrubber with format string
    DateScrubber->>DateScrubber: _convert_format_to_regex(dateFormat)
    DateScrubber-->>DateScrubber: regexPattern (sets self.date_regex)
    DateScrubber-->>-DateScrubber: dateScrubberInstance
    DateScrubber->>User: scrubber (dateScrubberInstance.scrub)
    Note left of User: Returns the scrub method of the configured instance
Loading

Sequence Diagram: Using DateScrubber with Options Pattern

sequenceDiagram
    actor User
    User->>+DateScrubber: get_scrubber_for_format("%Y%m%d_%H%M%S")
    DateScrubber-->>-User: scrubberFunc

    User->>+Options: __init__()
    Options-->>-User: optionsInstance

    User->>+optionsInstance: with_scrubber(scrubberFunc)
    optionsInstance-->>-User: optionsInstance (configured)

    User->>+verify: verify("Event at 20250527_125703", optionsInstance)
    Note over verify,User: verify is a placeholder for the approval testing function
    verify->>+scrubberFunc: scrub("Event at 20250527_125703")
    Note right of scrubberFunc: scrubberFunc is DateScrubber.scrub method
    scrubberFunc->>scrubberFunc: Uses self.date_regex (derived from "%Y%m%d_%H%M%S")
    scrubberFunc-->>-verify: "Event at <date0>"
    verify-->>-User: VerificationResult
Loading

File-Level Changes

Change Details Files
Centralize and extend internal date formats
  • Introduce _get_internal_formats returning (format, parsing_examples, display_examples)
  • Expand list of supported datetime format patterns including ISO, microseconds, and custom patterns
  • Adjust get_supported_formats to build regex patterns and examples from internal formats
date_scrubber.py
Refactor DateScrubber to use format strings and regex conversion
  • Change init to accept a datetime format string
  • Implement _convert_format_to_regex mapping format codes to regex and escaping literals
  • Store date_format and generated date_regex instead of raw regex input
date_scrubber.py
Add get_scrubber_for_format API
  • Define static method get_scrubber_for_format that returns a scrub function for a given format string
  • Ensure seamless integration with existing Options().with_scrubber() pattern
date_scrubber.py
Update get_scrubber_for to leverage datetime parsing
  • Loop through internal formats and use datetime.strptime to identify matching format
  • Construct scrubber based on matched format instead of regex trial scrubbing
  • Enhance error message to list supported formats
date_scrubber.py
Revise and extend test suite for new API
  • Modify existing tests to use _get_internal_formats and updated scrubber signature
  • Update approved table snapshot to reflect new regex patterns
  • Add test_get_scrubber_for_format and test_get_scrubber_for_format_with_options for direct-format API
test_date_scrubber.py
*.approved.md
*.approved.txt

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @nitsanavni - I've reviewed your changes - here's some feedback:

  • Tests should not rely on the private _get_internal_formats; either update them to use the public API (get_supported_formats) or make _get_internal_formats part of the public contract to avoid coupling to internals.
  • Add explicit validation for unsupported or malformed datetime format specifiers in get_scrubber_for_format so users get clear errors instead of ending up with a non-matching regex.
  • Consider anchoring the generated regex patterns (e.g. with word boundaries) to avoid accidental partial matches inside larger strings.
Here's what I looked at during the review
  • 🟢 General issues: all looks good
  • 🟡 Testing: 1 issue found
  • 🟢 Documentation: all looks good

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment on lines +41 to +50
def test_get_scrubber_for_format() -> None:
"""Test the new API that accepts datetime format strings directly."""
# Test common datetime format patterns
test_cases = [
("%Y%m%d_%H%M%S", "20250527_125703", "Log: 20250527_125703 - System started"),
(
"%Y-%m-%dT%H:%M:%SZ",
"2021-01-01T12:34:56Z",
"Timestamp: 2021-01-01T12:34:56Z end",
),
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (testing): Consider testing scrubbing of multiple date instances in a single string.

Add a test case with multiple dates in the input string to verify that all instances are scrubbed correctly (e.g., as , , etc.).


# Replace format codes with regex patterns first
regex_pattern = date_format
for format_code, regex in format_to_regex.items():
Copy link

@sourcery-ai sourcery-ai bot May 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (code-quality): Remove unnecessary calls to \ when the values are not used (🔧 Fixed)

Comment on lines 63 to 67
results.append(f"Format: {format_pattern}")
results.append(f"Input: {test_string}")
results.append(f"Output: {result}")
results.append("") # Empty line for readability

Copy link

@sourcery-ai sourcery-ai bot May 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (code-quality): Merge consecutive list appends into a single extend (🔧 Fixed)

nitsanavni and others added 2 commits May 27, 2025 20:41
- Reverted __init__ to take regex patterns (preserves backward compatibility)
- Added from_format() class method for datetime format strings
- get_scrubber_for_format() now uses the factory method
- Original regex-based API unchanged: DateScrubber(regex_pattern)
- New datetime API via factory: DateScrubber.from_format('%Y%m%d_%H%M%S')
- All tests pass, both APIs work correctly

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
- Remove unnecessary .items() calls when only iterating over dict keys
- Replace multiple append() calls with single extend() for better performance

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant