-
Notifications
You must be signed in to change notification settings - Fork 53
Add get_scrubber_for_format API to DateScrubber #208
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
- Uses datetime.strptime() for robust date parsing instead of complex regex patterns - Internal implementation now uses readable format strings like %Y%m%d_%H%M%S - External API maintains backward compatibility with regex patterns - Added support for YYYYMMDD_HHMMSS format (20250527_125703) from issue #124 - Easier to maintain: adding new date formats now requires only datetime format strings - All existing functionality preserved, all tests pass Related to #124 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
- Added new API method that accepts datetime format strings directly - Users can now create scrubbers with patterns like '%Y%m%d_%H%M%S' - More intuitive than having to provide example dates - Full test coverage with approval tests - Works seamlessly with Options pattern Example usage: scrubber = DateScrubber.get_scrubber_for_format('%Y%m%d_%H%M%S') verify(text, options=Options().with_scrubber(scrubber)) 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
Reviewer's GuideThis PR refactors DateScrubber to centralize format definitions and enable scrubbing via explicit datetime format strings, adds a new get_scrubber_for_format API, converts format strings to regex under the hood, updates get_scrubber_for to use datetime parsing against internal formats, and expands test coverage to validate the new API. Sequence Diagram for new DateScrubber.get_scrubber_for_format() APIsequenceDiagram
actor User
User->>+DateScrubber: get_scrubber_for_format(dateFormat)
Note left of User: Developer calls new API with format string
DateScrubber->>+DateScrubber: __init__(dateFormat)
Note right of DateScrubber: Instantiates DateScrubber with format string
DateScrubber->>DateScrubber: _convert_format_to_regex(dateFormat)
DateScrubber-->>DateScrubber: regexPattern (sets self.date_regex)
DateScrubber-->>-DateScrubber: dateScrubberInstance
DateScrubber->>User: scrubber (dateScrubberInstance.scrub)
Note left of User: Returns the scrub method of the configured instance
Sequence Diagram: Using DateScrubber with Options PatternsequenceDiagram
actor User
User->>+DateScrubber: get_scrubber_for_format("%Y%m%d_%H%M%S")
DateScrubber-->>-User: scrubberFunc
User->>+Options: __init__()
Options-->>-User: optionsInstance
User->>+optionsInstance: with_scrubber(scrubberFunc)
optionsInstance-->>-User: optionsInstance (configured)
User->>+verify: verify("Event at 20250527_125703", optionsInstance)
Note over verify,User: verify is a placeholder for the approval testing function
verify->>+scrubberFunc: scrub("Event at 20250527_125703")
Note right of scrubberFunc: scrubberFunc is DateScrubber.scrub method
scrubberFunc->>scrubberFunc: Uses self.date_regex (derived from "%Y%m%d_%H%M%S")
scrubberFunc-->>-verify: "Event at <date0>"
verify-->>-User: VerificationResult
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @nitsanavni - I've reviewed your changes - here's some feedback:
- Tests should not rely on the private
_get_internal_formats
; either update them to use the public API (get_supported_formats
) or make_get_internal_formats
part of the public contract to avoid coupling to internals. - Add explicit validation for unsupported or malformed datetime format specifiers in
get_scrubber_for_format
so users get clear errors instead of ending up with a non-matching regex. - Consider anchoring the generated regex patterns (e.g. with word boundaries) to avoid accidental partial matches inside larger strings.
Here's what I looked at during the review
- 🟢 General issues: all looks good
- 🟡 Testing: 1 issue found
- 🟢 Documentation: all looks good
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
def test_get_scrubber_for_format() -> None: | ||
"""Test the new API that accepts datetime format strings directly.""" | ||
# Test common datetime format patterns | ||
test_cases = [ | ||
("%Y%m%d_%H%M%S", "20250527_125703", "Log: 20250527_125703 - System started"), | ||
( | ||
"%Y-%m-%dT%H:%M:%SZ", | ||
"2021-01-01T12:34:56Z", | ||
"Timestamp: 2021-01-01T12:34:56Z end", | ||
), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (testing): Consider testing scrubbing of multiple date instances in a single string.
Add a test case with multiple dates in the input string to verify that all instances are scrubbed correctly (e.g., as , , etc.).
|
||
# Replace format codes with regex patterns first | ||
regex_pattern = date_format | ||
for format_code, regex in format_to_regex.items(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (code-quality): Remove unnecessary calls to \ when the values are not used (🔧 Fixed)
results.append(f"Format: {format_pattern}") | ||
results.append(f"Input: {test_string}") | ||
results.append(f"Output: {result}") | ||
results.append("") # Empty line for readability | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (code-quality): Merge consecutive list appends into a single extend (🔧 Fixed)
- Reverted __init__ to take regex patterns (preserves backward compatibility) - Added from_format() class method for datetime format strings - get_scrubber_for_format() now uses the factory method - Original regex-based API unchanged: DateScrubber(regex_pattern) - New datetime API via factory: DateScrubber.from_format('%Y%m%d_%H%M%S') - All tests pass, both APIs work correctly 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
- Remove unnecessary .items() calls when only iterating over dict keys - Replace multiple append() calls with single extend() for better performance 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
Summary
• Added new
get_scrubber_for_format()
API that accepts datetime format strings directly• More intuitive than requiring example dates - users can specify patterns like
%Y%m%d_%H%M%S
• Fully tested with approval tests following the same pattern as existing tests
• Works seamlessly with the Options pattern
Key Benefits
• Intuitive: Direct format specification vs providing example dates
• Familiar: Uses standard Python datetime format codes
• Flexible: Any datetime format pattern supported by the internal engine
• Consistent: Integrates with existing Options().with_scrubber() pattern
API Usage
Test plan
<date0>
placeholdersBuilds on #207
🤖 Generated with Claude Code
Summary by Sourcery
Add an API to generate date scrubbers from datetime format strings, refactor internal format handling to use explicit format definitions with automatic regex conversion, and extend tests to cover the new usage and updated patterns.
New Features:
Enhancements:
_get_internal_formats
list with parsing and display examplesget_supported_formats
to derive regex patterns from datetime formats for external API compatibilityTests:
get_scrubber_for_format
API and its integration with the Options pattern_get_internal_formats
and refresh the approved regex table