Skip to content

Conversation

@yaricom
Copy link
Contributor

@yaricom yaricom commented Nov 14, 2025

Details

This PR extends the anonymization functionality in the Opik Python SDK by enhancing RecursiveAnonymizer to support nested data structures with field path tracking, and adds comprehensive documentation for advanced anonymization scenarios including third-party PII tool integration.

Key Changes:

  • Enhanced RecursiveAnonymizer to track field paths during nested traversal (e.g., "metadata.user.email")
  • Refactored anonymization workflow to pass object_type context to anonymizers
  • Added RecursiveAnonymizer to public API exports
  • Added documentation examples for Microsoft Presidio integration and nested field anonymization

Change checklist

  • User facing
  • Documentation update

Issues

  • OPIK-NA

Testing

Added extra tests

Documentation

Updated Anonymizers section of the documentation

…document advanced anonymization scenarios

- Enhanced `RecursiveAnonymizer` to support anonymization with field path tracking for nested structures, including arrays and dictionaries.
- Added `field_name` and `object_type` context to anonymizers for more granular data processing.
- Refactored `encode_and_anonymize` to integrate `RecursiveAnonymizer` and handle nested anonymization.
- Introduced examples demonstrating advanced use cases like Microsoft Presidio integration and recursive rule support.
- Updated documentation to include nested anonymization strategies, third-party PII tool integration, and context-aware data redaction.
- Added unit and E2E tests covering multi-level dictionaries and lists.
Copilot AI review requested due to automatic review settings November 14, 2025 15:06
@yaricom yaricom requested review from a team as code owners November 14, 2025 15:06
@github-actions
Copy link
Contributor

📋 PR Linter Failed

Incomplete Details Section. The ## Details section cannot be empty.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR extends the anonymization functionality in the Opik Python SDK by enhancing RecursiveAnonymizer to support nested data structures with field path tracking, and adds comprehensive documentation for advanced anonymization scenarios including third-party PII tool integration.

Key Changes:

  • Enhanced RecursiveAnonymizer to track field paths during nested traversal (e.g., "metadata.user.email")
  • Refactored anonymization workflow to pass object_type context to anonymizers
  • Added RecursiveAnonymizer to public API exports
  • Added documentation examples for Microsoft Presidio integration and nested field anonymization

Reviewed Changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
test_jsonable_encoder.py Removed obsolete anonymization tests that were moved to test_encoder_helpers.py
test_encoder_helpers.py New test file containing all anonymization tests with enhanced field path tracking validation
test_recursive_anonymizer.py Comprehensive test suite for recursive anonymizer field path tracking and nested structure handling
test_anonymization.py Fixed comment formatting from multi-line string to proper comment syntax
online_message_processor.py Added object_type parameter to encode_and_anonymize calls
encoder_helpers.py Refactored to extract anonymize_encoded_obj helper and pass object_type to anonymizers
batchers.py Added object_type parameter to encode_and_anonymize calls
jsonable_encoder.py Removed encode_and_anonymize function (moved to encoder_helpers.py)
rules_anonymizer.py Updated anonymize_text signature to accept **kwargs
recursive_anonymizer.py Enhanced to build field paths and propagate kwargs through recursive calls
factory.py Added mixed-list type hint for anonymizer rules
__init__.py Exported RecursiveAnonymizer to public API
anonymizers.mdx Added documentation for nested anonymization and Microsoft Presidio integration

@github-actions
Copy link
Contributor

github-actions bot commented Nov 14, 2025

SDK E2E Tests Results

0 tests   0 ✅  0s ⏱️
0 suites  0 💤
0 files    0 ❌

Results for commit 0cb6d46.

♻️ This comment has been updated with latest results.

@github-actions
Copy link
Contributor

🌿 Preview your docs: https://opik-preview-bbd66064-736a-445f-8dbe-3ca28a0724bb.docs.buildwithfern.com/docs/opik

No broken links found

…for span/trace handling

- Updated `encode_and_anonymize` and `anonymize_encoded_obj` to use `Literal["span", "trace"]` for stricter typing of `object_type`.
- Refactored method calls to replace dynamic `object_type` with explicit string literals ("span" or "trace").
- Adjusted documentation and corresponding tests to align with the updated `object_type` usage.
- Improved code clarity and type safety by removing reliance on `Any` or type inference.
@github-actions
Copy link
Contributor

🌿 Preview your docs: https://opik-preview-fa4772af-abcb-4598-af8c-d3cbc162663c.docs.buildwithfern.com/docs/opik

No broken links found

alexkuzmik
alexkuzmik previously approved these changes Nov 14, 2025
@comet-ml comet-ml deleted a comment from github-actions bot Nov 14, 2025
@github-actions
Copy link
Contributor

@github-actions
Copy link
Contributor

🌿 Preview your docs: https://opik-preview-8189591b-5a60-421b-9e28-48f38f05f607.docs.buildwithfern.com/docs/opik

No broken links found

@yaricom yaricom requested review from a team as code owners November 17, 2025 11:49
@github-actions
Copy link
Contributor

🌿 Preview your docs: https://opik-preview-406ef69c-8b6c-459e-9f0d-feba95eaf3eb.docs.buildwithfern.com/docs/opik

No broken links found

@github-actions
Copy link
Contributor

🌿 Preview your docs: https://opik-preview-3b9bd58b-cccb-454e-9a69-04173dc8329a.docs.buildwithfern.com/docs/opik

No broken links found

@github-actions
Copy link
Contributor

🌿 Preview your docs: https://opik-preview-a3fa5185-3fbc-46a4-91c0-ee4454403911.docs.buildwithfern.com/docs/opik

No broken links found

…b runner limitations

- Commented out `--guardrails` command in SDK E2E test workflow to prevent storage issues.
- Added module-level skip in `test_guardrails.py` with a reason for disabled guardrails.
@github-actions
Copy link
Contributor

🌿 Preview your docs: https://opik-preview-88ca3091-b108-47a9-9e14-24b17cf21b51.docs.buildwithfern.com/docs/opik

No broken links found

@yaricom yaricom merged commit 7fbdc6e into main Nov 17, 2025
100 checks passed
@yaricom yaricom deleted the NA-refactor-and-docs-update-anonymization branch November 17, 2025 12:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants