-
Notifications
You must be signed in to change notification settings - Fork 1.2k
[NA] [P SDK] Updating Anonymizer documentation and refactoring #4072
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…document advanced anonymization scenarios - Enhanced `RecursiveAnonymizer` to support anonymization with field path tracking for nested structures, including arrays and dictionaries. - Added `field_name` and `object_type` context to anonymizers for more granular data processing. - Refactored `encode_and_anonymize` to integrate `RecursiveAnonymizer` and handle nested anonymization. - Introduced examples demonstrating advanced use cases like Microsoft Presidio integration and recursive rule support. - Updated documentation to include nested anonymization strategies, third-party PII tool integration, and context-aware data redaction. - Added unit and E2E tests covering multi-level dictionaries and lists.
📋 PR Linter Failed❌ Incomplete Details Section. The |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR extends the anonymization functionality in the Opik Python SDK by enhancing RecursiveAnonymizer to support nested data structures with field path tracking, and adds comprehensive documentation for advanced anonymization scenarios including third-party PII tool integration.
Key Changes:
- Enhanced
RecursiveAnonymizerto track field paths during nested traversal (e.g.,"metadata.user.email") - Refactored anonymization workflow to pass
object_typecontext to anonymizers - Added
RecursiveAnonymizerto public API exports - Added documentation examples for Microsoft Presidio integration and nested field anonymization
Reviewed Changes
Copilot reviewed 13 out of 13 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
test_jsonable_encoder.py |
Removed obsolete anonymization tests that were moved to test_encoder_helpers.py |
test_encoder_helpers.py |
New test file containing all anonymization tests with enhanced field path tracking validation |
test_recursive_anonymizer.py |
Comprehensive test suite for recursive anonymizer field path tracking and nested structure handling |
test_anonymization.py |
Fixed comment formatting from multi-line string to proper comment syntax |
online_message_processor.py |
Added object_type parameter to encode_and_anonymize calls |
encoder_helpers.py |
Refactored to extract anonymize_encoded_obj helper and pass object_type to anonymizers |
batchers.py |
Added object_type parameter to encode_and_anonymize calls |
jsonable_encoder.py |
Removed encode_and_anonymize function (moved to encoder_helpers.py) |
rules_anonymizer.py |
Updated anonymize_text signature to accept **kwargs |
recursive_anonymizer.py |
Enhanced to build field paths and propagate kwargs through recursive calls |
factory.py |
Added mixed-list type hint for anonymizer rules |
__init__.py |
Exported RecursiveAnonymizer to public API |
anonymizers.mdx |
Added documentation for nested anonymization and Microsoft Presidio integration |
apps/opik-documentation/documentation/fern/docs/production/anonymizers.mdx
Outdated
Show resolved
Hide resolved
SDK E2E Tests Results0 tests 0 ✅ 0s ⏱️ Results for commit 0cb6d46. ♻️ This comment has been updated with latest results. |
|
🌿 Preview your docs: https://opik-preview-bbd66064-736a-445f-8dbe-3ca28a0724bb.docs.buildwithfern.com/docs/opik No broken links found |
apps/opik-documentation/documentation/fern/docs/production/anonymizers.mdx
Outdated
Show resolved
Hide resolved
…for span/trace handling
- Updated `encode_and_anonymize` and `anonymize_encoded_obj` to use `Literal["span", "trace"]` for stricter typing of `object_type`.
- Refactored method calls to replace dynamic `object_type` with explicit string literals ("span" or "trace").
- Adjusted documentation and corresponding tests to align with the updated `object_type` usage.
- Improved code clarity and type safety by removing reliance on `Any` or type inference.
|
🌿 Preview your docs: https://opik-preview-fa4772af-abcb-4598-af8c-d3cbc162663c.docs.buildwithfern.com/docs/opik No broken links found |
|
🌿 Preview your docs: https://opik-preview-a9fa4c39-fc3c-4829-8cb1-8a221b3e5730.docs.buildwithfern.com/docs/opik The following broken links where found: Page: |
|
🌿 Preview your docs: https://opik-preview-8189591b-5a60-421b-9e28-48f38f05f607.docs.buildwithfern.com/docs/opik No broken links found |
|
🌿 Preview your docs: https://opik-preview-406ef69c-8b6c-459e-9f0d-feba95eaf3eb.docs.buildwithfern.com/docs/opik No broken links found |
|
🌿 Preview your docs: https://opik-preview-3b9bd58b-cccb-454e-9a69-04173dc8329a.docs.buildwithfern.com/docs/opik No broken links found |
|
🌿 Preview your docs: https://opik-preview-a3fa5185-3fbc-46a4-91c0-ee4454403911.docs.buildwithfern.com/docs/opik No broken links found |
…b runner limitations - Commented out `--guardrails` command in SDK E2E test workflow to prevent storage issues. - Added module-level skip in `test_guardrails.py` with a reason for disabled guardrails.
|
🌿 Preview your docs: https://opik-preview-88ca3091-b108-47a9-9e14-24b17cf21b51.docs.buildwithfern.com/docs/opik No broken links found |
Details
This PR extends the anonymization functionality in the Opik Python SDK by enhancing
RecursiveAnonymizerto support nested data structures with field path tracking, and adds comprehensive documentation for advanced anonymization scenarios including third-party PII tool integration.Key Changes:
RecursiveAnonymizerto track field paths during nested traversal (e.g.,"metadata.user.email")object_typecontext to anonymizersRecursiveAnonymizerto public API exportsChange checklist
Issues
Testing
Added extra tests
Documentation
Updated
Anonymizerssection of the documentation