Skip to content

Refactor DateScrubber to use datetime format strings internally #207

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

nitsanavni
Copy link
Contributor

@nitsanavni nitsanavni commented May 27, 2025

Summary

• Refactored DateScrubber to use Python's datetime.strptime() internally for robust date parsing
• Internal implementation now uses readable datetime format strings like %Y%m%d_%H%M%S instead of complex regex patterns
• External API maintains full backward compatibility - still returns regex patterns
• Added support for YYYYMMDD_HHMMSS format (20250527_125703) from issue #124

Key Benefits

Maintainable: Adding new date formats now requires only datetime format strings
Robust: Leverages Python's built-in datetime parsing for validation
Backward Compatible: External API unchanged, all existing tests pass
Developer Friendly: Self-documenting format strings vs complex regex

Test plan

  • All existing date scrubber tests pass
  • New format 20250527_125703 works correctly
  • External API still returns regex patterns for compatibility
  • Error messages show regex patterns as expected
  • Code formatted with black

Example

Before: [12]\d{3}[01]\d[0-3]\d_[0-2]\d[0-5]\d[0-5]\d
After: %Y%m%d_%H%M%S (internally) → still generates regex for external API

Related to #124

🤖 Generated with Claude Code

Summary by Sourcery

Refactor DateScrubber to leverage datetime.strptime with format strings for robust parsing, introduce a conversion utility to maintain regex-based scrubbing and backward compatibility, and add support for the YYYYMMDD_HHMMSS format.

New Features:

  • Add support for YYYYMMDD_HHMMSS (%Y%m%d_%H%M%S) date format parsing

Enhancements:

  • Refactor DateScrubber to use datetime format strings internally and generate regex patterns on demand
  • Implement _convert_format_to_regex to map datetime directives to regex for scrubbing
  • Update get_supported_formats to preserve external API by converting internal formats into regex patterns
  • Enhance get_scrubber_for to attempt datetime.strptime parsing before fallback

Tests:

  • Revise tests to validate parsing via datetime formats and update approved regex table to match new patterns

- Uses datetime.strptime() for robust date parsing instead of complex regex patterns
- Internal implementation now uses readable format strings like %Y%m%d_%H%M%S
- External API maintains backward compatibility with regex patterns
- Added support for YYYYMMDD_HHMMSS format (20250527_125703) from issue #124
- Easier to maintain: adding new date formats now requires only datetime format strings
- All existing functionality preserved, all tests pass

Related to #124

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
Copy link

sourcery-ai bot commented May 27, 2025

Reviewer's Guide

Refactored DateScrubber to leverage datetime format strings and parsing internally while preserving the external regex-based API, and added support for the new YYYYMMDD_HHMMSS format.

Sequence Diagram for DateScrubber.get_scrubber_for()

sequenceDiagram
    participant Caller
    participant DS_Static as DateScrubber (Static)
    participant DT as datetime
    participant DS_Instance as DateScrubber (Instance)

    Caller->>DS_Static: get_scrubber_for(example)
    DS_Static->>DS_Static: _get_internal_formats()
    activate DS_Static
    DS_Static-->>DS_Static: internal_formats_list
    deactivate DS_Static

    loop For each format in internal_formats_list
        DS_Static->>DT: strptime(example, date_format)
        alt strptime success
            DT-->>DS_Static: parsed_datetime_object
            DS_Static->>DS_Instance: new DateScrubber(date_format)
            activate DS_Instance
            DS_Instance-->>DS_Static: scrubber_instance
            deactivate DS_Instance
            DS_Static-->>Caller: scrubber_instance.scrub (callable)
            Note right of Caller: Returns the scrub method
        else strptime failure (ValueError)
            DT-->>DS_Static: ValueError
        end
    end
    Note over DS_Static: If loop completes, no match was found.
    DS_Static->>DS_Static: get_supported_formats() 
    activate DS_Static
    DS_Static-->>DS_Static: supported_formats_for_error_msg
    deactivate DS_Static
    DS_Static->>Caller: raise Exception("No match found...")
Loading

Sequence Diagram for DateScrubber.get_supported_formats()

sequenceDiagram
    participant Caller
    participant DS_Static as DateScrubber (Static)
    participant DS_Instance as DateScrubber (Instance)

    Caller->>DS_Static: get_supported_formats()
    DS_Static->>DS_Static: _get_internal_formats()
    activate DS_Static
    DS_Static-->>DS_Static: internal_formats_list
    deactivate DS_Static

    loop For each (date_format, _, display_examples) in internal_formats_list
        DS_Static->>DS_Instance: new DateScrubber(date_format)
        activate DS_Instance
        Note over DS_Instance: __init__ calls _convert_format_to_regex(date_format)
        DS_Instance-->>DS_Static: scrubber_instance (self.date_regex is now set)
        deactivate DS_Instance
        DS_Static->>DS_Instance: Get date_regex from scrubber_instance
        DS_Instance-->>DS_Static: regex_pattern
        DS_Static->>DS_Static: Add (regex_pattern, display_examples) to formats list
    end
    DS_Static-->>Caller: formats_list
Loading

File-Level Changes

Change Details Files
Internal date parsing now uses datetime format strings instead of hard-coded regex patterns
  • Introduced _get_internal_formats returning tuples of datetime formats with parsing/display examples
  • Replaced inline regex list with datetime-based format list
  • Updated constructor to accept a format string and set date_regex via conversion
  • Added _convert_format_to_regex mapping datetime codes to regex using placeholders
approvaltests/scrubbers/date_scrubber.py
External API remains regex-based via new conversion layer
  • Added static get_supported_formats to iterate internal formats and emit regex patterns with examples
  • Updated get_scrubber_for to try datetime.strptime on examples before falling back
approvaltests/scrubbers/date_scrubber.py
Added support for YYYYMMDD_HHMMSS date format
  • Appended "%Y%m%d_%H%M%S" entry with parsing and display examples to internal formats list
approvaltests/scrubbers/date_scrubber.py
Updated tests to align with datetime-based formats and new regex outputs
  • test_supported_formats now iterates over _get_internal_formats and checks parsing_examples
  • Adjusted approved markdown table to reflect new regex patterns and added new format row
  • Modified unsupported format test to use a datetime format string instead of raw regex
tests/scrubbers/test_date_scrubber.py
tests/scrubbers/test_date_scrubber.test_supported_formats_as_table.approved.md
tests/scrubbers/test_date_scrubber.test_unsupported_format.approved.txt

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @nitsanavni - I've reviewed your changes - here's some feedback:

  • The %z mapping currently only matches offsets like +0300 but your display examples use +03:00; consider updating the regex to accept the colon-separated form as well.
  • Using simple string replacements in _convert_format_to_regex can mis-replace overlapping tokens (e.g. %m vs %M); consider a more robust tokenization or regex-based substitution approach.
  • You rebuild the regex on every DateScrubber instantiation—caching the compiled patterns per format could significantly reduce redundant work.
Here's what I looked at during the review
  • 🟡 General issues: 1 issue found
  • 🟢 Testing: all looks good
  • 🟢 Documentation: all looks good

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

escaped_placeholder = re.escape(placeholder)
regex_pattern = regex_pattern.replace(escaped_placeholder, regex)

return regex_pattern
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (bug_risk): Regex patterns are not anchored, may match substrings unexpectedly

Anchor the pattern with ^ and $, or use word boundaries, to ensure only complete date tokens are matched.

Suggested change
return regex_pattern
# Anchor the pattern to match the entire string
return f"^{regex_pattern}$"


# Replace format codes with regex patterns first
regex_pattern = date_format
for format_code, regex in format_to_regex.items():
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (code-quality): Remove unnecessary calls to dict.items when the values are not used (remove-dict-items)

Suggested change
for format_code, regex in format_to_regex.items():
for format_code in format_to_regex:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant