-
Notifications
You must be signed in to change notification settings - Fork 53
Refactor DateScrubber to use datetime format strings internally #207
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
- Uses datetime.strptime() for robust date parsing instead of complex regex patterns - Internal implementation now uses readable format strings like %Y%m%d_%H%M%S - External API maintains backward compatibility with regex patterns - Added support for YYYYMMDD_HHMMSS format (20250527_125703) from issue #124 - Easier to maintain: adding new date formats now requires only datetime format strings - All existing functionality preserved, all tests pass Related to #124 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
Reviewer's GuideRefactored DateScrubber to leverage datetime format strings and parsing internally while preserving the external regex-based API, and added support for the new YYYYMMDD_HHMMSS format. Sequence Diagram for DateScrubber.get_scrubber_for()sequenceDiagram
participant Caller
participant DS_Static as DateScrubber (Static)
participant DT as datetime
participant DS_Instance as DateScrubber (Instance)
Caller->>DS_Static: get_scrubber_for(example)
DS_Static->>DS_Static: _get_internal_formats()
activate DS_Static
DS_Static-->>DS_Static: internal_formats_list
deactivate DS_Static
loop For each format in internal_formats_list
DS_Static->>DT: strptime(example, date_format)
alt strptime success
DT-->>DS_Static: parsed_datetime_object
DS_Static->>DS_Instance: new DateScrubber(date_format)
activate DS_Instance
DS_Instance-->>DS_Static: scrubber_instance
deactivate DS_Instance
DS_Static-->>Caller: scrubber_instance.scrub (callable)
Note right of Caller: Returns the scrub method
else strptime failure (ValueError)
DT-->>DS_Static: ValueError
end
end
Note over DS_Static: If loop completes, no match was found.
DS_Static->>DS_Static: get_supported_formats()
activate DS_Static
DS_Static-->>DS_Static: supported_formats_for_error_msg
deactivate DS_Static
DS_Static->>Caller: raise Exception("No match found...")
Sequence Diagram for DateScrubber.get_supported_formats()sequenceDiagram
participant Caller
participant DS_Static as DateScrubber (Static)
participant DS_Instance as DateScrubber (Instance)
Caller->>DS_Static: get_supported_formats()
DS_Static->>DS_Static: _get_internal_formats()
activate DS_Static
DS_Static-->>DS_Static: internal_formats_list
deactivate DS_Static
loop For each (date_format, _, display_examples) in internal_formats_list
DS_Static->>DS_Instance: new DateScrubber(date_format)
activate DS_Instance
Note over DS_Instance: __init__ calls _convert_format_to_regex(date_format)
DS_Instance-->>DS_Static: scrubber_instance (self.date_regex is now set)
deactivate DS_Instance
DS_Static->>DS_Instance: Get date_regex from scrubber_instance
DS_Instance-->>DS_Static: regex_pattern
DS_Static->>DS_Static: Add (regex_pattern, display_examples) to formats list
end
DS_Static-->>Caller: formats_list
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @nitsanavni - I've reviewed your changes - here's some feedback:
- The
%z
mapping currently only matches offsets like+0300
but your display examples use+03:00
; consider updating the regex to accept the colon-separated form as well. - Using simple string replacements in
_convert_format_to_regex
can mis-replace overlapping tokens (e.g.%m
vs%M
); consider a more robust tokenization or regex-based substitution approach. - You rebuild the regex on every
DateScrubber
instantiation—caching the compiled patterns per format could significantly reduce redundant work.
Here's what I looked at during the review
- 🟡 General issues: 1 issue found
- 🟢 Testing: all looks good
- 🟢 Documentation: all looks good
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
escaped_placeholder = re.escape(placeholder) | ||
regex_pattern = regex_pattern.replace(escaped_placeholder, regex) | ||
|
||
return regex_pattern |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (bug_risk): Regex patterns are not anchored, may match substrings unexpectedly
Anchor the pattern with ^
and $
, or use word boundaries, to ensure only complete date tokens are matched.
return regex_pattern | |
# Anchor the pattern to match the entire string | |
return f"^{regex_pattern}$" |
|
||
# Replace format codes with regex patterns first | ||
regex_pattern = date_format | ||
for format_code, regex in format_to_regex.items(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (code-quality): Remove unnecessary calls to dict.items
when the values are not used (remove-dict-items
)
for format_code, regex in format_to_regex.items(): | |
for format_code in format_to_regex: |
Summary
• Refactored DateScrubber to use Python's
datetime.strptime()
internally for robust date parsing• Internal implementation now uses readable datetime format strings like
%Y%m%d_%H%M%S
instead of complex regex patterns• External API maintains full backward compatibility - still returns regex patterns
• Added support for YYYYMMDD_HHMMSS format (
20250527_125703
) from issue #124Key Benefits
• Maintainable: Adding new date formats now requires only datetime format strings
• Robust: Leverages Python's built-in datetime parsing for validation
• Backward Compatible: External API unchanged, all existing tests pass
• Developer Friendly: Self-documenting format strings vs complex regex
Test plan
20250527_125703
works correctlyExample
Before:
[12]\d{3}[01]\d[0-3]\d_[0-2]\d[0-5]\d[0-5]\d
After:
%Y%m%d_%H%M%S
(internally) → still generates regex for external APIRelated to #124
🤖 Generated with Claude Code
Summary by Sourcery
Refactor DateScrubber to leverage datetime.strptime with format strings for robust parsing, introduce a conversion utility to maintain regex-based scrubbing and backward compatibility, and add support for the YYYYMMDD_HHMMSS format.
New Features:
Enhancements:
Tests: