Skip to content

Conversation

@lorenzoriano
Copy link

@lorenzoriano lorenzoriano commented Dec 4, 2025

Summary

This PR fixes a critical bug in the ODM (Object-Document Mapper) layer that prevented proper handling of deeply nested embedded document fields (3+ levels deep). The issue manifested as a TypeError when merging schemas from multiple samples with complex nested structures.

Problem Description

The bug occurred when processing embedded documents with 3 or more levels of nesting, such as:

person.pose.joint.angle  # 3 levels deep

Root Cause

The issue stemmed from two related problems in the embedded document field merging logic:

  1. Type mismatch during recursive merging (_merge_embedded_doc_fields):

    • First sample would add nested fields as a list (the format returned by _parse_embedded_doc_fields)
    • Second sample would attempt to merge, triggering a recursive call
    • The recursive call would try to perform dictionary operations on a list: fields_dict[name] = field
    • This caused: TypeError: 'list' object does not support item assignment
  2. Inconsistent type checking (_finalize_embedded_doc_fields):

    • Used exact equality (==) instead of issubclass() to check for EmbeddedDocumentField
    • Failed to recognize subclasses of EmbeddedDocumentField (commonly used in dynamic label schemas)
    • Resulted in nested fields remaining as dictionaries instead of being properly converted to lists

Solution

Changes to _merge_embedded_doc_fields

  • Introduced defensive normalization that accepts both list and dict formats
  • Added _to_field_mapping() helper to ensure consistent dict representation during merging
  • Modified recursive merging logic to normalize inputs before dictionary operations
  • This ensures the function can handle any input format without throwing type errors

Changes to _finalize_embedded_doc_fields

  • Replaced exact type checking (field["ftype"] == fof.EmbeddedDocumentField) with proper subclass checking
  • Added inspect.isclass() guard before issubclass() to handle both class and instance references
  • Now correctly identifies and processes all EmbeddedDocumentField subclasses
  • Ensures nested schemas are recursively converted from dict to list format

Test Coverage

Added comprehensive test suite in MergeEmbeddedDocFieldsTests:

  1. test_preserves_list_field_merging_behavior: Verifies existing ListField merging still works
  2. test_subfield_conflict_behavior_matches_previous_logic: Ensures conflict resolution unchanged
  3. test_merges_subclassed_embedded_fields: Tests merging with subclassed embedded fields
  4. test_accepts_list_inputs_and_normalizes_output: Validates list-to-dict normalization
  5. test_finalize_handles_subclassed_embedded_fields: Confirms subclass detection in finalization
  6. test_full_pipeline_with_subclassed_embedded_fields: End-to-end pipeline test
  7. test_deeply_nested_embedded_doc_merging: Minimal reproducible example demonstrating the original bug

The final test (test_deeply_nested_embedded_doc_merging) creates a 3-level nested structure and verifies that the merge completes without errors. Before this fix, this test would fail with:

TypeError: 'list' object does not support item assignment

Impact

This fix is critical for:

  • Dynamic label schemas: Particularly those using subclassed embedded document fields
  • Complex nested structures: Any dataset with 3+ levels of embedded documents
  • HRM2 integration: Which uses dynamic pose schemas with deep nesting
  • General robustness: Improves type safety and error handling in the ODM layer

Related Issues

This bug was discovered while implementing the HRM2 model integration, which uses deeply nested pose structures with dynamic label schemas.

Summary by CodeRabbit

  • Bug Fixes

    • Improved handling of subclassed and nested embedded-document schemas so nested types are correctly recognized and merged, preserving subfields and avoiding incorrect conflicts.
  • Refactor

    • Reworked embedded-document initialization, merge, and finalization to use consistent type detection and recursive, non-mutating merging.
  • Tests

    • Added unit tests verifying nested embedded-document merging and subclass semantics.

✏️ Tip: You can customize this high-level summary in your review settings.

…ntField

- Updated the _finalize_embedded_doc_fields function to correctly identify and process subclasses of EmbeddedDocumentField, ensuring nested fields are converted from dict to list format.
- Added unit tests to verify the handling of subclassed embedded fields and to ensure the full pipeline works correctly with deeply nested structures, addressing a previously encountered TypeError during merging.
@lorenzoriano lorenzoriano requested review from a team as code owners December 4, 2025 21:01
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 4, 2025

Walkthrough

Replaces direct equality checks for EmbeddedDocumentField with class-based detection (using inspect.isclass and issubclass) in fiftyone/core/odm/utils.py. Adjusts logic in _merge_embedded_doc_fields, _init_embedded_doc_fields, and _finalize_embedded_doc_fields to initialize, recursively merge, and finalize nested EmbeddedDocumentField subclasses by recognizing ftype as a class subclass of EmbeddedDocumentField. Removes a blank import line (non-functional). Adds unit tests that exercise merging of nested embedded-document schemas and subclass-preserving behavior.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

  • fiftyone/core/odm/utils.py: review class-based type checks, recursive merge/finalize paths, and correct handling of EmbeddedDocumentField subclasses.
  • tests/unittests/odm_tests.py: verify new tests for nested merge behavior and subclass preservation.

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Description check ⚠️ Warning The description covers proposed changes, test coverage, and impact. However, the Release Notes section required by the template is not filled in. Complete the Release Notes section by selecting whether this is user-facing and, if yes, providing a 1-2 sentence description suitable for release notes.
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main fix: handling deeply nested embedded documents in the _finalize_embedded_doc_fields method.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch lorenzo/odm_fix

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f39e033 and 2e84162.

📒 Files selected for processing (2)
  • fiftyone/core/odm/utils.py (1 hunks)
  • tests/unittests/odm_tests.py (2 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
fiftyone/core/odm/utils.py (1)
fiftyone/core/fields.py (1)
  • EmbeddedDocumentField (1775-2100)
tests/unittests/odm_tests.py (2)
fiftyone/core/fields.py (3)
  • EmbeddedDocumentField (1775-2100)
  • FloatField (958-1000)
  • ListField (1045-1095)
fiftyone/core/odm/utils.py (2)
  • _merge_embedded_doc_fields (507-531)
  • _finalize_embedded_doc_fields (545-561)
🪛 Ruff (0.14.7)
tests/unittests/odm_tests.py

68-68: Unnecessary pass statement

Remove unnecessary pass

(PIE790)


70-70: Missing return type annotation for private function _build_embedded_field

(ANN202)


98-98: Use a regular assert instead of unittest-style assertIn

Replace assertIn(...) with assert ...

(PT009)


99-99: Use a regular assert instead of unittest-style assertEqual

Replace assertEqual(...) with assert ...

(PT009)


119-119: Use a regular assert instead of unittest-style assertIsNone

Replace assertIsNone(...) with assert ...

(PT009)


128-128: Use a regular assert instead of unittest-style assertIn

Replace assertIn(...) with assert ...

(PT009)


129-129: Use a regular assert instead of unittest-style assertIn

Replace assertIn(...) with assert ...

(PT009)


137-137: Use a regular assert instead of unittest-style assertIsInstance

Replace assertIsInstance(...) with assert ...

(PT009)


138-138: Use a regular assert instead of unittest-style assertIn

Replace assertIn(...) with assert ...

(PT009)


139-139: Use a regular assert instead of unittest-style assertIn

Replace assertIn(...) with assert ...

(PT009)


169-169: Use a regular assert instead of unittest-style assertIsInstance

Replace assertIsInstance(...) with assert ...

(PT009)


170-170: Use a regular assert instead of unittest-style assertEqual

Replace assertEqual(...) with assert ...

(PT009)


174-174: Use a regular assert instead of unittest-style assertEqual

Replace assertEqual(...) with assert ...

(PT009)


175-175: Use a regular assert instead of unittest-style assertIsInstance

Replace assertIsInstance(...) with assert ...

(PT009)


180-180: Use a regular assert instead of unittest-style assertEqual

Replace assertEqual(...) with assert ...

(PT009)


201-201: Use a regular assert instead of unittest-style assertIsInstance

Replace assertIsInstance(...) with assert ...

(PT009)


202-202: Use a regular assert instead of unittest-style assertEqual

Replace assertEqual(...) with assert ...

(PT009)


205-205: Use a regular assert instead of unittest-style assertEqual

Replace assertEqual(...) with assert ...

(PT009)


208-208: Use a regular assert instead of unittest-style assertIsInstance

Replace assertIsInstance(...) with assert ...

(PT009)


210-210: Use a regular assert instead of unittest-style assertEqual

Replace assertEqual(...) with assert ...

(PT009)


229-229: Missing return type annotation for private function build_deeply_nested_field

(ANN202)


253-253: Missing return type annotation for private function build_deeply_nested_field_variant

(ANN202)


290-290: Use a regular assert instead of unittest-style assertIn

Replace assertIn(...) with assert ...

(PT009)


294-294: Use a regular assert instead of unittest-style assertIn

Replace assertIn(...) with assert ...

(PT009)


295-295: Use a regular assert instead of unittest-style assertIn

Replace assertIn(...) with assert ...

(PT009)


296-296: Use a regular assert instead of unittest-style assertIn

Replace assertIn(...) with assert ...

(PT009)


300-300: Use a regular assert instead of unittest-style assertIn

Replace assertIn(...) with assert ...

(PT009)


301-301: Use a regular assert instead of unittest-style assertIn

Replace assertIn(...) with assert ...

(PT009)


302-302: Use a regular assert instead of unittest-style assertIn

Replace assertIn(...) with assert ...

(PT009)


306-306: Use a regular assert instead of unittest-style assertIn

Replace assertIn(...) with assert ...

(PT009)


307-307: Use a regular assert instead of unittest-style assertIn

Replace assertIn(...) with assert ...

(PT009)


312-312: Use a regular assert instead of unittest-style assertIsInstance

Replace assertIsInstance(...) with assert ...

(PT009)


313-313: Use a regular assert instead of unittest-style assertEqual

Replace assertEqual(...) with assert ...

(PT009)


317-317: Use a regular assert instead of unittest-style assertIsInstance

Replace assertIsInstance(...) with assert ...

(PT009)


320-320: Use a regular assert instead of unittest-style assertIsInstance

Replace assertIsInstance(...) with assert ...

(PT009)


323-323: Use a regular assert instead of unittest-style assertIsInstance

Replace assertIsInstance(...) with assert ...

(PT009)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (13)
  • GitHub Check: test-windows / test-python (windows-latest, 3.12)
  • GitHub Check: e2e / test-e2e
  • GitHub Check: lint / eslint
  • GitHub Check: test-windows / test-python (windows-latest, 3.9)
  • GitHub Check: test-windows / test-python (windows-latest, 3.11)
  • GitHub Check: test-windows / test-python (windows-latest, 3.10)
  • GitHub Check: build / build
  • GitHub Check: test / test-python (ubuntu-latest-m, 3.9)
  • GitHub Check: test / test-python (ubuntu-latest-m, 3.12)
  • GitHub Check: test / test-python (ubuntu-latest-m, 3.10)
  • GitHub Check: test / test-python (ubuntu-latest-m, 3.11)
  • GitHub Check: test / test-app
  • GitHub Check: build
🔇 Additional comments (3)
fiftyone/core/odm/utils.py (1)

550-559: LGTM! Correct fix for subclass detection.

The inspect.isclass(ftype) guard before issubclass() prevents TypeError on non-class values, and using issubclass() instead of exact equality properly handles EmbeddedDocumentField subclasses like dynamic label schemas.

tests/unittests/odm_tests.py (2)

212-323: Excellent regression test for the core bug.

This test directly exercises the 3+ level nesting scenario that triggered the original TypeError. The structure verification at lines 290-323 is thorough—checking each nesting level after both merge and finalize operations.


141-180: Well-documented test case.

The docstring clearly explains the bug scenario (lines 142-153) and what would have failed without the fix. This serves as excellent documentation for future maintainers.

fields.append(field)
if field["ftype"] == fof.EmbeddedDocumentField:
ftype = field["ftype"]
is_embedded_doc = inspect.isclass(ftype) and issubclass(
Copy link
Contributor

@brimoor brimoor Dec 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you provide the example call to foo.get_implied_field_kwargs() that resulted in a case where there was a field whose type was a strict subclass of EmbeddedDocumentField in this method?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ftype == fof.EmbeddedDocumentField comparisons are used in a couple other methods in this module as well, so I would think that if we need a patch here, then we'd need a patch in those methods too

Copy link
Contributor

@brimoor brimoor Dec 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally speaking, we do add more EmbeddedDocument subclasses to represent different data models in FO over time, but we do not define EmbeddedDocumentField subclasses, as the latter is just the generic concept of a field that contains a document.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup you're right, there isn’t an in-repo call path today that supplies a strict subclass of EmbeddedDocumentField. The issubclass guards are defensive to avoid silently skipping nested schemas if a downstream plugin or future extension ever introduces a subclass (which is what I ended up doing at one point and discovered this issue). I updated the remaining equality check in _finalize_embedded_doc_fields to use the same issubclass pattern we already use in _merge_embedded_doc_fields, so merge/finalize behavior stays consistent and safe for hypothetical subclasses while remaining identical for current inputs.

Copy link
Contributor

@brimoor brimoor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still trying to wrap my head around the actual error case here. The methods you're tweaking are strictly invoked by calling get_implied_field_kwargs() so can you provide an example that raises an exception on develop?

import fiftyone.core.odm as foo

# what fails here?
value = ...
kwargs = foo.get_implied_field_kwargs(value, ...)

The unit tests are contrived ways of invoking the private methods directly, which users would never do. I'd like to have "integration" tests that validate the behavior of get_implied_field_kwargs() instead.

return []

if isinstance(fields_dict, list):
# Older callers may still provide a list that already filtered out
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we don't need to support backwards-compatibility in these private methods. We can assume they are strictly invoked by get_implied_field_kwargs()

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

…nested schemas

- Removed legacy handling of list input in _finalize_embedded_doc_fields to streamline processing.
- Introduced new integration tests for the get_implied_field_kwargs function to validate merging of nested embedded document schemas, ensuring correct field type inference and structure.
@lorenzoriano
Copy link
Author

I'm still trying to wrap my head around the actual error case here. The methods you're tweaking are strictly invoked by calling get_implied_field_kwargs() so can you provide an example that raises an exception on develop?

import fiftyone.core.odm as foo

# what fails here?
value = ...
kwargs = foo.get_implied_field_kwargs(value, ...)

The unit tests are contrived ways of invoking the private methods directly, which users would never do. I'd like to have "integration" tests that validate the behavior of get_implied_field_kwargs() instead.

The bug shows up when get_implied_field_kwargs() is asked to infer a schema for nested embedded-document structures, specifically when:

  • The value is a list of embedded documents, and
  • Those embedded documents themselves contain embedded-document fields.

I had this issue when working on the new (and now removed) labels in HRM2. The old implementation of _merge_embedded_doc_fields() assumed its fields_dict argument was always a dict, but in practice it was sometimes a list of field-specs coming. When merging them, it would end up doing something equivalent to:
fields_dict[name] = field # fields_dict is actually a list here
which on develop raises:
TypeError: list indices must be integers or slices, not str
from inside _merge_embedded_doc_fields().

I’ve added a test that uses get_implied_field_kwargs(). The new GetImpliedFieldKwargsTests.test_list_of_embedded_docs_merges_nested_schema builds a list of embedded documents where the nested field is a subclass of EmbeddedDocumentField, and each element’s inner document exposes different fields.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
tests/unittests/odm_tests.py (1)

64-68: Remove unnecessary pass.

The docstring alone is sufficient for the class body.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 684d729 and 6791871.

📒 Files selected for processing (2)
  • fiftyone/core/odm/utils.py (2 hunks)
  • tests/unittests/odm_tests.py (2 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
fiftyone/core/odm/utils.py (2)
fiftyone/core/dataset.py (2)
  • name (685-687)
  • name (690-704)
fiftyone/core/fields.py (3)
  • EmbeddedDocumentField (1775-2100)
  • ListField (1045-1095)
  • DictField (1130-1190)
🪛 Ruff (0.14.7)
tests/unittests/odm_tests.py

68-68: Unnecessary pass statement

Remove unnecessary pass

(PIE790)


70-70: Missing return type annotation for private function _build_embedded_field

(ANN202)


98-98: Use a regular assert instead of unittest-style assertIn

Replace assertIn(...) with assert ...

(PT009)


99-99: Use a regular assert instead of unittest-style assertEqual

Replace assertEqual(...) with assert ...

(PT009)


119-119: Use a regular assert instead of unittest-style assertIsNone

Replace assertIsNone(...) with assert ...

(PT009)


128-128: Use a regular assert instead of unittest-style assertIn

Replace assertIn(...) with assert ...

(PT009)


129-129: Use a regular assert instead of unittest-style assertIn

Replace assertIn(...) with assert ...

(PT009)


137-137: Use a regular assert instead of unittest-style assertIsInstance

Replace assertIsInstance(...) with assert ...

(PT009)


138-138: Use a regular assert instead of unittest-style assertIn

Replace assertIn(...) with assert ...

(PT009)


139-139: Use a regular assert instead of unittest-style assertIn

Replace assertIn(...) with assert ...

(PT009)


150-150: Unnecessary pass statement

Remove unnecessary pass

(PIE790)


180-180: Use a regular assert instead of unittest-style assertEqual

Replace assertEqual(...) with assert ...

(PT009)


181-181: Use a regular assert instead of unittest-style assertEqual

Replace assertEqual(...) with assert ...

(PT009)


182-182: Use a regular assert instead of unittest-style assertEqual

Replace assertEqual(...) with assert ...

(PT009)


189-189: Use a regular assert instead of unittest-style assertTrue

Replace assertTrue(...) with assert ...

(PT009)


192-192: Use a regular assert instead of unittest-style assertEqual

Replace assertEqual(...) with assert ...

(PT009)


195-195: Use a regular assert instead of unittest-style assertEqual

Replace assertEqual(...) with assert ...

(PT009)

fiftyone/core/odm/utils.py

13-13: Import from collections.abc instead: Iterable

Import from collections.abc

(UP035)


13-13: typing.Dict is deprecated, use dict instead

(UP035)


13-13: typing.List is deprecated, use list instead

(UP035)


498-498: Use X | Y for type annotations

(UP007)


501-501: Missing return type annotation for private function _parse_embedded_doc_list_fields

(ANN202)


515-515: Use X | Y for type annotations

Convert to X | Y

(UP007)


536-536: Use X | Y for type annotations

Convert to X | Y

(UP007)


548-548: Use X | Y for type annotations

Convert to X | Y

(UP007)


603-603: Missing return type annotation for private function _finalize_embedded_doc_fields

(ANN202)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (13)
  • GitHub Check: test-windows / test-python (windows-latest, 3.12)
  • GitHub Check: test / test-python (ubuntu-latest-m, 3.10)
  • GitHub Check: test-windows / test-python (windows-latest, 3.9)
  • GitHub Check: test-windows / test-python (windows-latest, 3.11)
  • GitHub Check: test-windows / test-python (windows-latest, 3.10)
  • GitHub Check: test / test-python (ubuntu-latest-m, 3.12)
  • GitHub Check: test / test-app
  • GitHub Check: test / test-python (ubuntu-latest-m, 3.9)
  • GitHub Check: test / test-python (ubuntu-latest-m, 3.11)
  • GitHub Check: lint / eslint
  • GitHub Check: build / build
  • GitHub Check: e2e / test-e2e
  • GitHub Check: build
🔇 Additional comments (11)
tests/unittests/odm_tests.py (6)

13-15: LGTM!

The new imports for fof and odmu are correctly added to support the new test classes.


80-99: LGTM!

Test correctly validates that ListField merging behavior is preserved with the new implementation.


101-119: LGTM!

Good coverage of the conflict resolution behavior where mismatched subfield types result in None.


121-129: LGTM!

Key test case validating the issubclass() fix for handling subclassed EmbeddedDocumentField types.


131-139: LGTM!

Validates the defensive normalization of list inputs to dict format.


159-195: LGTM!

Comprehensive integration test that validates the full merge/finalize pipeline through the public get_implied_field_kwargs API. The test correctly exercises the scenario described in the PR objectives.

fiftyone/core/odm/utils.py (5)

496-498: LGTM!

Type aliases clearly document the dual-format (dict/list) representations and improve readability of the merging functions.


501-510: LGTM!

Correctly updated to use the return value from _merge_embedded_doc_fields and finalize the result.


591-598: LGTM!

The recursive merge correctly normalizes existing nested schemas before merging, which is the core fix for the deeply nested document issue.


603-629: LGTM!

Correctly converts the merged dict-form schemas back to list form, with proper recursive handling for nested embedded documents using the issubclass() pattern.


569-572: The review comment's concern about field spec mutation is not applicable. Field specifications are freshly constructed by _parse_embedded_doc_fields on each call and are never reused across multiple merge operations. The mutation at line 572 is safe because:

  1. Each field dict originates from a fresh call to _parse_embedded_doc_fields, which constructs new dict objects.
  2. Within _parse_embedded_doc_list_fields, each iteration creates new specs; they are not cached or stored for reuse.
  3. The assignment merged[name] = field stores a reference to this fresh dict, which is then immediately normalized on the next line without external dependencies.
  4. The docstring explicitly states the current design "avoids mutating list-based inputs in-place, which previously caused TypeError"—this refers to the incoming fields parameter structure, not the safety of normalizing individual field dicts.

The proposed diff in the original comment would not improve safety and would introduce unnecessary overhead without addressing an actual problem.

import json
import numbers
import sys
from typing import Any, Dict, Iterable, List, Union
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Consider modern type annotations.

For Python 3.9+, prefer collections.abc.Iterable and built-in generics (dict, list) over typing equivalents. This is optional if maintaining older Python compatibility.

🧰 Tools
🪛 Ruff (0.14.7)

13-13: Import from collections.abc instead: Iterable

Import from collections.abc

(UP035)


13-13: typing.Dict is deprecated, use dict instead

(UP035)


13-13: typing.List is deprecated, use list instead

(UP035)

🤖 Prompt for AI Agents
In fiftyone/core/odm/utils.py around line 13, the file imports typing.Iterable,
Dict, and List; update to modern annotations by importing Iterable from
collections.abc and using built-in generics (dict, list) in your type
annotations (keep typing.Any and typing.Union if still needed for
compatibility), e.g., replace "from typing import Any, Dict, Iterable, List,
Union" with an import of Any and Union from typing plus "from collections.abc
import Iterable" and update any annotations that used Dict[...] and List[...] to
dict[...] and list[...].

@brimoor
Copy link
Contributor

brimoor commented Dec 5, 2025

I’ve added a test that uses get_implied_field_kwargs(). The new GetImpliedFieldKwargsTests.test_list_of_embedded_docs_merges_nested_schema builds a list of embedded documents where the nested field is a subclass of EmbeddedDocumentField, and each element’s inner document exposes different fields.

Okay so here's the test case you added:

import fiftyone as fo
import fiftyone.core.odm as foo
import fiftyone.core.fields as fof

class _GetImpliedInner(foo.EmbeddedDocument):
    x = fof.FloatField()
    score = fof.FloatField()

class _GetImpliedSubclassedEmbeddedDocField(fof.EmbeddedDocumentField):
    pass

class _GetImpliedOuter(foo.EmbeddedDocument):
    # option 1
    pose = fof.EmbeddedDocumentField(_GetImpliedInner)
    
    # option 2
    # pose = _GetImpliedSubclassedEmbeddedDocField(_GetImpliedInner)

values = [
    _GetImpliedOuter(pose=_GetImpliedInner(x=1.0)),
    _GetImpliedOuter(pose=_GetImpliedInner(score=0.5)),
]

kwargs = foo.get_implied_field_kwargs(values)
fo.pprint(kwargs)

If I run with option 1 on develop, I get the expected output:

{
    'ftype': <class 'fiftyone.core.fields.ListField'>,
    'subfield': <class 'fiftyone.core.fields.EmbeddedDocumentField'>,
    'embedded_doc_type': <class '__main__._GetImpliedOuter'>,
    'fields': [
        {
            'ftype': <class 'fiftyone.core.fields.EmbeddedDocumentField'>,
            'db_field': 'pose',
            'description': None,
            'info': None,
            'read_only': False,
            'created_at': None,
            'embedded_doc_type': <class '__main__._GetImpliedInner'>,
            'fields': [
                {
                    'ftype': <class 'fiftyone.core.fields.FloatField'>,
                    'db_field': 'x',
                    'description': None,
                    'info': None,
                    'read_only': False,
                    'created_at': None,
                    'name': 'x',
                },
                {
                    'ftype': <class 'fiftyone.core.fields.FloatField'>,
                    'db_field': 'score',
                    'description': None,
                    'info': None,
                    'read_only': False,
                    'created_at': None,
                    'name': 'score',
                },
            ],
            'name': 'pose',
        },
    ],
}

If I run option 2 on develop, I don't get an error, but it fails to pick up the score field.

If I only change ftype == fof.EmbeddedDocumentField to issubclass(ftype, fof.EmbeddedDocumentField) on develop, then option 2 works as expected too.

So, given that defining and using EmbeddedDocumentField subclasses is NOT something we officially support, I'm still on the hunt for a concrete example where get_implied_field_kwargs() doesn't work on develop that would justify the changes in this PR 😄

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6791871 and 65f240b.

📒 Files selected for processing (2)
  • fiftyone/core/odm/utils.py (3 hunks)
  • tests/unittests/odm_tests.py (2 hunks)
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: swheaton
Repo: voxel51/fiftyone PR: 6023
File: tests/unittests/factory/delegated_operation_doc_tests.py:55-57
Timestamp: 2025-06-16T21:08:14.104Z
Learning: The DelegatedOperationDocument class in fiftyone/factory/repos/delegated_operation_doc.py inherits from object (plain Python class), not from Pydantic BaseModel. Using setattr to modify its attributes in tests is appropriate and does not bypass any validation.
📚 Learning: 2025-09-10T03:01:27.501Z
Learnt from: swheaton
Repo: voxel51/fiftyone PR: 6334
File: fiftyone/core/expressions.py:2727-2741
Timestamp: 2025-09-10T03:01:27.501Z
Learning: FiftyOne requires MongoDB >= 6.0 as documented at https://docs.voxel51.com/user_guide/config.html#using-a-different-mongodb-version

Applied to files:

  • tests/unittests/odm_tests.py
📚 Learning: 2024-11-11T19:26:20.542Z
Learnt from: brimoor
Repo: voxel51/fiftyone PR: 5086
File: fiftyone/operators/store/service.py:64-64
Timestamp: 2024-11-11T19:26:20.542Z
Learning: The codebase requires Python >= 3.9, so using features introduced in Python 3.9, such as type hinting with `list[str]`, is acceptable.

Applied to files:

  • fiftyone/core/odm/utils.py
🧬 Code graph analysis (2)
tests/unittests/odm_tests.py (2)
fiftyone/core/fields.py (3)
  • FloatField (958-1000)
  • EmbeddedDocumentField (1775-2100)
  • ListField (1045-1095)
fiftyone/core/odm/utils.py (1)
  • get_implied_field_kwargs (364-435)
fiftyone/core/odm/utils.py (1)
fiftyone/core/fields.py (1)
  • EmbeddedDocumentField (1775-2100)
🪛 Ruff (0.14.7)
tests/unittests/odm_tests.py

71-71: Unnecessary pass statement

Remove unnecessary pass

(PIE790)


101-101: Use a regular assert instead of unittest-style assertEqual

Replace assertEqual(...) with assert ...

(PT009)


102-102: Use a regular assert instead of unittest-style assertEqual

Replace assertEqual(...) with assert ...

(PT009)


103-103: Use a regular assert instead of unittest-style assertEqual

Replace assertEqual(...) with assert ...

(PT009)


110-110: Use a regular assert instead of unittest-style assertTrue

Replace assertTrue(...) with assert ...

(PT009)


113-113: Use a regular assert instead of unittest-style assertEqual

Replace assertEqual(...) with assert ...

(PT009)


116-116: Use a regular assert instead of unittest-style assertEqual

Replace assertEqual(...) with assert ...

(PT009)

fiftyone/core/odm/utils.py

537-537: Missing return type annotation for private function _init_embedded_doc_fields

Add return type annotation: None

(ANN202)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (13)
  • GitHub Check: test-windows / test-python (windows-latest, 3.11)
  • GitHub Check: test-windows / test-python (windows-latest, 3.10)
  • GitHub Check: test-windows / test-python (windows-latest, 3.12)
  • GitHub Check: test-windows / test-python (windows-latest, 3.9)
  • GitHub Check: test / test-python (ubuntu-latest-m, 3.9)
  • GitHub Check: test / test-app
  • GitHub Check: test / test-python (ubuntu-latest-m, 3.11)
  • GitHub Check: test / test-python (ubuntu-latest-m, 3.12)
  • GitHub Check: test / test-python (ubuntu-latest-m, 3.10)
  • GitHub Check: lint / eslint
  • GitHub Check: e2e / test-e2e
  • GitHub Check: build / build
  • GitHub Check: build
🔇 Additional comments (1)
fiftyone/core/odm/utils.py (1)

512-515: Subclass-aware deep embedded-doc merge/finalize logic looks correct

Using inspect.isclass(ftype) and issubclass(ftype, fof.EmbeddedDocumentField) in both _merge_embedded_doc_fields and _finalize_embedded_doc_fields, together with the recursive _init_embedded_doc_fields, ensures that:

  • During merging, all embedded-doc specs (including subclasses) have their field["fields"] normalized to dicts at every depth, so recursive merges no longer hit the “list indices must be integers or slices” failure for 3+ levels of nesting.
  • During finalization, the same subclass-aware check correctly converts those nested dicts back to lists, so callers of get_implied_field_kwargs() always see the expected list schema format even when plugins provide EmbeddedDocumentField subclasses.

The change preserves existing behavior for non-embedded types and base EmbeddedDocumentField while fixing the nested-subclass case exercised by the new tests.

Also applies to: 531-533, 540-543, 556-559

Comment on lines +63 to +116
class _GetImpliedInner(foo.EmbeddedDocument):
x = fof.FloatField()
score = fof.FloatField()


class _GetImpliedSubclassedEmbeddedDocField(fof.EmbeddedDocumentField):
"""Lightweight subclass used to mimic plugin-provided embedded fields."""

pass


class _GetImpliedOuter(foo.EmbeddedDocument):
# Use a subclassed `EmbeddedDocumentField` so that `ftype` in the inferred
# schema is a subclass, not the base `EmbeddedDocumentField`
pose = _GetImpliedSubclassedEmbeddedDocField(_GetImpliedInner)


class GetImpliedFieldKwargsTests(unittest.TestCase):
"""Integration-style tests that exercise `_merge_embedded_doc_fields`
via the public :func:`get_implied_field_kwargs` API.
"""

def test_list_of_embedded_docs_merges_nested_schema(self):
# Two `_GetImpliedOuter` documents whose nested `_GetImpliedInner`
# subdocuments populate
# different fields. The merged schema for `pose` should be the union
# of the observed inner fields.
inner_with_x = _GetImpliedInner(x=1.0)
inner_with_score = _GetImpliedInner(score=0.5)

values = [
_GetImpliedOuter(pose=inner_with_x),
_GetImpliedOuter(pose=inner_with_score),
]

kwargs = foo.get_implied_field_kwargs(values)

# Top-level: list of `_GetImpliedOuter` embedded documents
self.assertEqual(kwargs["ftype"], fof.ListField)
self.assertEqual(kwargs["subfield"], fof.EmbeddedDocumentField)
self.assertEqual(kwargs["embedded_doc_type"], _GetImpliedOuter)

# Nested: `pose` should itself be an embedded document whose schema
# includes both `x` and `score`, demonstrating that nested embedded
# schemas from multiple list elements are correctly merged when the
# field type is a subclass of `EmbeddedDocumentField`
pose_spec = next(f for f in kwargs["fields"] if f["name"] == "pose")
self.assertTrue(
issubclass(pose_spec["ftype"], fof.EmbeddedDocumentField)
)
self.assertEqual(pose_spec["embedded_doc_type"], _GetImpliedInner)

inner_field_names = {f["name"] for f in pose_spec["fields"]}
self.assertEqual(inner_field_names, {"x", "score"})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Good integration test for nested subclassed EmbeddedDocumentField merging

The _GetImpliedInner / _GetImpliedOuter scaffolding and GetImpliedFieldKwargsTests.test_list_of_embedded_docs_merges_nested_schema accurately exercise the get_implied_field_kwargs()_parse_embedded_doc_list_fields_merge/_finalize_embedded_doc_fields path, and the assertions on:

  • top-level list + embedded-doc subfield/embedded_doc_type
  • pose_spec["ftype"] being a subclass of EmbeddedDocumentField
  • union of inner field names {"x", "score"}

match the intended semantics of the bugfix and should protect against regressions in the deep-nesting/subclass case.

Optionally, you can drop the redundant pass in _GetImpliedSubclassedEmbeddedDocField since the docstring alone gives it a non-empty body.

🧰 Tools
🪛 Ruff (0.14.7)

71-71: Unnecessary pass statement

Remove unnecessary pass

(PIE790)


101-101: Use a regular assert instead of unittest-style assertEqual

Replace assertEqual(...) with assert ...

(PT009)


102-102: Use a regular assert instead of unittest-style assertEqual

Replace assertEqual(...) with assert ...

(PT009)


103-103: Use a regular assert instead of unittest-style assertEqual

Replace assertEqual(...) with assert ...

(PT009)


110-110: Use a regular assert instead of unittest-style assertTrue

Replace assertTrue(...) with assert ...

(PT009)


113-113: Use a regular assert instead of unittest-style assertEqual

Replace assertEqual(...) with assert ...

(PT009)


116-116: Use a regular assert instead of unittest-style assertEqual

Replace assertEqual(...) with assert ...

(PT009)

🤖 Prompt for AI Agents
In tests/unittests/odm_tests.py around lines 63 to 116, the review notes the
test is good and suggests an optional cleanup: remove the redundant `pass`
statement from the `_GetImpliedSubclassedEmbeddedDocField` class since the
docstring already makes the body non-empty; simply delete the `pass` line to
keep the class minimal and tidy.

@lorenzoriano
Copy link
Author

I’ve added a test that uses get_implied_field_kwargs(). The new GetImpliedFieldKwargsTests.test_list_of_embedded_docs_merges_nested_schema builds a list of embedded documents where the nested field is a subclass of EmbeddedDocumentField, and each element’s inner document exposes different fields.

Okay so here's the test case you added:

import fiftyone as fo
import fiftyone.core.odm as foo
import fiftyone.core.fields as fof

class _GetImpliedInner(foo.EmbeddedDocument):
    x = fof.FloatField()
    score = fof.FloatField()

class _GetImpliedSubclassedEmbeddedDocField(fof.EmbeddedDocumentField):
    pass

class _GetImpliedOuter(foo.EmbeddedDocument):
    # option 1
    pose = fof.EmbeddedDocumentField(_GetImpliedInner)
    
    # option 2
    # pose = _GetImpliedSubclassedEmbeddedDocField(_GetImpliedInner)

values = [
    _GetImpliedOuter(pose=_GetImpliedInner(x=1.0)),
    _GetImpliedOuter(pose=_GetImpliedInner(score=0.5)),
]

kwargs = foo.get_implied_field_kwargs(values)
fo.pprint(kwargs)

If I run with option 1 on develop, I get the expected output:

{
    'ftype': <class 'fiftyone.core.fields.ListField'>,
    'subfield': <class 'fiftyone.core.fields.EmbeddedDocumentField'>,
    'embedded_doc_type': <class '__main__._GetImpliedOuter'>,
    'fields': [
        {
            'ftype': <class 'fiftyone.core.fields.EmbeddedDocumentField'>,
            'db_field': 'pose',
            'description': None,
            'info': None,
            'read_only': False,
            'created_at': None,
            'embedded_doc_type': <class '__main__._GetImpliedInner'>,
            'fields': [
                {
                    'ftype': <class 'fiftyone.core.fields.FloatField'>,
                    'db_field': 'x',
                    'description': None,
                    'info': None,
                    'read_only': False,
                    'created_at': None,
                    'name': 'x',
                },
                {
                    'ftype': <class 'fiftyone.core.fields.FloatField'>,
                    'db_field': 'score',
                    'description': None,
                    'info': None,
                    'read_only': False,
                    'created_at': None,
                    'name': 'score',
                },
            ],
            'name': 'pose',
        },
    ],
}

If I run option 2 on develop, I don't get an error, but it fails to pick up the score field.

If I only change ftype == fof.EmbeddedDocumentField to issubclass(ftype, fof.EmbeddedDocumentField) on develop, then option 2 works as expected too.

So, given that defining and using EmbeddedDocumentField subclasses is NOT something we officially support, I'm still on the hunt for a concrete example where get_implied_field_kwargs() doesn't work on develop that would justify the changes in this PR 😄

Makes sense, thanks. I rolled back most changes and just add a minimal behavior change: use issubclass(ftype, EmbeddedDocumentField) in _merge_embedded_doc_fields (and the init/finalize helpers). The concrete failure on develop is captured in GetImpliedFieldKwargsTests.test_list_of_embedded_docs_merges_nested_schema: when the field is a subclass of EmbeddedDocumentField, get_implied_field_kwargs() only keeps one of the nested fields (x or score). With the issubclass check, it now merges both.

To clarify, the concrete problem I ran into is your “option 2” scenario: when the schema uses a subclass of EmbeddedDocumentField, get_implied_field_kwargs() on origin/develop fails to merge the nested embedded schema across list elements, so only one of the observed inner fields (x or score) ends up in the inferred spec. I’ve captured that behavior in GetImpliedFieldKwargsTests.test_list_of_embedded_docs_merges_nested_schema, which fails on develop and passes in the new code, as we treat subclasses of EmbeddedDocumentField as embedded as well.

Honestly, if you don't think this issue is worth the effort, I will happily drop this PR. Thanks for the review!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants