Fix _finalize_embedded_doc_fields to handle deeply nested embedded documents #6639

lorenzoriano · 2025-12-04T21:01:44Z

Summary

This PR fixes a critical bug in the ODM (Object-Document Mapper) layer that prevented proper handling of deeply nested embedded document fields (3+ levels deep). The issue manifested as a TypeError when merging schemas from multiple samples with complex nested structures.

Problem Description

The bug occurred when processing embedded documents with 3 or more levels of nesting, such as:

person.pose.joint.angle  # 3 levels deep

Root Cause

The issue stemmed from two related problems in the embedded document field merging logic:

Type mismatch during recursive merging (_merge_embedded_doc_fields):
- First sample would add nested fields as a list (the format returned by _parse_embedded_doc_fields)
- Second sample would attempt to merge, triggering a recursive call
- The recursive call would try to perform dictionary operations on a list: fields_dict[name] = field
- This caused: TypeError: 'list' object does not support item assignment
Inconsistent type checking (_finalize_embedded_doc_fields):
- Used exact equality (==) instead of issubclass() to check for EmbeddedDocumentField
- Failed to recognize subclasses of EmbeddedDocumentField (commonly used in dynamic label schemas)
- Resulted in nested fields remaining as dictionaries instead of being properly converted to lists

Solution

Changes to `_merge_embedded_doc_fields`

Introduced defensive normalization that accepts both list and dict formats
Added _to_field_mapping() helper to ensure consistent dict representation during merging
Modified recursive merging logic to normalize inputs before dictionary operations
This ensures the function can handle any input format without throwing type errors

Changes to `_finalize_embedded_doc_fields`

Replaced exact type checking (field["ftype"] == fof.EmbeddedDocumentField) with proper subclass checking
Added inspect.isclass() guard before issubclass() to handle both class and instance references
Now correctly identifies and processes all EmbeddedDocumentField subclasses
Ensures nested schemas are recursively converted from dict to list format

Test Coverage

Added comprehensive test suite in MergeEmbeddedDocFieldsTests:

test_preserves_list_field_merging_behavior: Verifies existing ListField merging still works
test_subfield_conflict_behavior_matches_previous_logic: Ensures conflict resolution unchanged
test_merges_subclassed_embedded_fields: Tests merging with subclassed embedded fields
test_accepts_list_inputs_and_normalizes_output: Validates list-to-dict normalization
test_finalize_handles_subclassed_embedded_fields: Confirms subclass detection in finalization
test_full_pipeline_with_subclassed_embedded_fields: End-to-end pipeline test
test_deeply_nested_embedded_doc_merging: Minimal reproducible example demonstrating the original bug

The final test (test_deeply_nested_embedded_doc_merging) creates a 3-level nested structure and verifies that the merge completes without errors. Before this fix, this test would fail with:

TypeError: 'list' object does not support item assignment

Impact

This fix is critical for:

Dynamic label schemas: Particularly those using subclassed embedded document fields
Complex nested structures: Any dataset with 3+ levels of embedded documents
HRM2 integration: Which uses dynamic pose schemas with deep nesting
General robustness: Improves type safety and error handling in the ODM layer

Related Issues

This bug was discovered while implementing the HRM2 model integration, which uses deeply nested pose structures with dynamic label schemas.

Summary by CodeRabbit

Bug Fixes
- Improved handling of subclassed and nested embedded-document schemas so nested types are correctly recognized and merged, preserving subfields and avoiding incorrect conflicts.
Refactor
- Reworked embedded-document initialization, merge, and finalization to use consistent type detection and recursive, non-mutating merging.
Tests
- Added unit tests verifying nested embedded-document merging and subclass semantics.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

…ntField - Updated the _finalize_embedded_doc_fields function to correctly identify and process subclasses of EmbeddedDocumentField, ensuring nested fields are converted from dict to list format. - Added unit tests to verify the handling of subclassed embedded fields and to ensure the full pipeline works correctly with deeply nested structures, addressing a previously encountered TypeError during merging.

coderabbitai · 2025-12-04T21:02:09Z

Walkthrough

Replaces direct equality checks for EmbeddedDocumentField with class-based detection (using inspect.isclass and issubclass) in fiftyone/core/odm/utils.py. Adjusts logic in _merge_embedded_doc_fields, _init_embedded_doc_fields, and _finalize_embedded_doc_fields to initialize, recursively merge, and finalize nested EmbeddedDocumentField subclasses by recognizing ftype as a class subclass of EmbeddedDocumentField. Removes a blank import line (non-functional). Adds unit tests that exercise merging of nested embedded-document schemas and subclass-preserving behavior.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

fiftyone/core/odm/utils.py: review class-based type checks, recursive merge/finalize paths, and correct handling of EmbeddedDocumentField subclasses.
tests/unittests/odm_tests.py: verify new tests for nested merge behavior and subclass preservation.

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Description check	⚠️ Warning	The description covers proposed changes, test coverage, and impact. However, the Release Notes section required by the template is not filled in.	Complete the Release Notes section by selecting whether this is user-facing and, if yes, providing a 1-2 sentence description suitable for release notes.
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately describes the main fix: handling deeply nested embedded documents in the _finalize_embedded_doc_fields method.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch lorenzo/odm_fix

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f39e033 and 2e84162.

📒 Files selected for processing (2)

fiftyone/core/odm/utils.py (1 hunks)
tests/unittests/odm_tests.py (2 hunks)

🧰 Additional context used

🧬 Code graph analysis (2)

fiftyone/core/odm/utils.py (1)

fiftyone/core/fields.py (1)

EmbeddedDocumentField (1775-2100)

tests/unittests/odm_tests.py (2)

fiftyone/core/fields.py (3)

EmbeddedDocumentField (1775-2100)

FloatField (958-1000)

ListField (1045-1095)

fiftyone/core/odm/utils.py (2)

_merge_embedded_doc_fields (507-531)

_finalize_embedded_doc_fields (545-561)

🪛 Ruff (0.14.7)

tests/unittests/odm_tests.py

68-68: Unnecessary pass statement

Remove unnecessary pass

(PIE790)

70-70: Missing return type annotation for private function _build_embedded_field

(ANN202)

98-98: Use a regular assert instead of unittest-style assertIn

Replace assertIn(...) with assert ...

(PT009)

99-99: Use a regular assert instead of unittest-style assertEqual

Replace assertEqual(...) with assert ...

(PT009)

119-119: Use a regular assert instead of unittest-style assertIsNone

Replace assertIsNone(...) with assert ...

(PT009)

128-128: Use a regular assert instead of unittest-style assertIn

Replace assertIn(...) with assert ...

(PT009)

129-129: Use a regular assert instead of unittest-style assertIn

Replace assertIn(...) with assert ...

(PT009)

137-137: Use a regular assert instead of unittest-style assertIsInstance

Replace assertIsInstance(...) with assert ...

(PT009)

138-138: Use a regular assert instead of unittest-style assertIn

Replace assertIn(...) with assert ...

(PT009)

139-139: Use a regular assert instead of unittest-style assertIn

Replace assertIn(...) with assert ...

(PT009)

169-169: Use a regular assert instead of unittest-style assertIsInstance

Replace assertIsInstance(...) with assert ...

(PT009)

170-170: Use a regular assert instead of unittest-style assertEqual

Replace assertEqual(...) with assert ...

(PT009)

174-174: Use a regular assert instead of unittest-style assertEqual

Replace assertEqual(...) with assert ...

(PT009)

175-175: Use a regular assert instead of unittest-style assertIsInstance

Replace assertIsInstance(...) with assert ...

(PT009)

180-180: Use a regular assert instead of unittest-style assertEqual

Replace assertEqual(...) with assert ...

(PT009)

201-201: Use a regular assert instead of unittest-style assertIsInstance

Replace assertIsInstance(...) with assert ...

(PT009)

202-202: Use a regular assert instead of unittest-style assertEqual

Replace assertEqual(...) with assert ...

(PT009)

205-205: Use a regular assert instead of unittest-style assertEqual

Replace assertEqual(...) with assert ...

(PT009)

208-208: Use a regular assert instead of unittest-style assertIsInstance

Replace assertIsInstance(...) with assert ...

(PT009)

210-210: Use a regular assert instead of unittest-style assertEqual

Replace assertEqual(...) with assert ...

(PT009)

229-229: Missing return type annotation for private function build_deeply_nested_field

(ANN202)

253-253: Missing return type annotation for private function build_deeply_nested_field_variant

(ANN202)

290-290: Use a regular assert instead of unittest-style assertIn

Replace assertIn(...) with assert ...

(PT009)

294-294: Use a regular assert instead of unittest-style assertIn

Replace assertIn(...) with assert ...

(PT009)

295-295: Use a regular assert instead of unittest-style assertIn

Replace assertIn(...) with assert ...

(PT009)

296-296: Use a regular assert instead of unittest-style assertIn

Replace assertIn(...) with assert ...

(PT009)

300-300: Use a regular assert instead of unittest-style assertIn

Replace assertIn(...) with assert ...

(PT009)

301-301: Use a regular assert instead of unittest-style assertIn

Replace assertIn(...) with assert ...

(PT009)

302-302: Use a regular assert instead of unittest-style assertIn

Replace assertIn(...) with assert ...

(PT009)

306-306: Use a regular assert instead of unittest-style assertIn

Replace assertIn(...) with assert ...

(PT009)

307-307: Use a regular assert instead of unittest-style assertIn

Replace assertIn(...) with assert ...

(PT009)

312-312: Use a regular assert instead of unittest-style assertIsInstance

Replace assertIsInstance(...) with assert ...

(PT009)

313-313: Use a regular assert instead of unittest-style assertEqual

Replace assertEqual(...) with assert ...

(PT009)

317-317: Use a regular assert instead of unittest-style assertIsInstance

Replace assertIsInstance(...) with assert ...

(PT009)

320-320: Use a regular assert instead of unittest-style assertIsInstance

Replace assertIsInstance(...) with assert ...

(PT009)

323-323: Use a regular assert instead of unittest-style assertIsInstance

Replace assertIsInstance(...) with assert ...

(PT009)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (13)

GitHub Check: test-windows / test-python (windows-latest, 3.12)
GitHub Check: e2e / test-e2e
GitHub Check: lint / eslint
GitHub Check: test-windows / test-python (windows-latest, 3.9)
GitHub Check: test-windows / test-python (windows-latest, 3.11)
GitHub Check: test-windows / test-python (windows-latest, 3.10)
GitHub Check: build / build
GitHub Check: test / test-python (ubuntu-latest-m, 3.9)
GitHub Check: test / test-python (ubuntu-latest-m, 3.12)
GitHub Check: test / test-python (ubuntu-latest-m, 3.10)
GitHub Check: test / test-python (ubuntu-latest-m, 3.11)
GitHub Check: test / test-app
GitHub Check: build

🔇 Additional comments (3)

fiftyone/core/odm/utils.py (1)

550-559: LGTM! Correct fix for subclass detection.

The inspect.isclass(ftype) guard before issubclass() prevents TypeError on non-class values, and using issubclass() instead of exact equality properly handles EmbeddedDocumentField subclasses like dynamic label schemas.

tests/unittests/odm_tests.py (2)

212-323: Excellent regression test for the core bug.

This test directly exercises the 3+ level nesting scenario that triggered the original TypeError. The structure verification at lines 290-323 is thorough—checking each nesting level after both merge and finalize operations.

141-180: Well-documented test case.

The docstring clearly explains the bug scenario (lines 142-153) and what would have failed without the fix. This serves as excellent documentation for future maintainers.

tests/unittests/odm_tests.py

brimoor · 2025-12-04T21:24:07Z

fiftyone/core/odm/utils.py

            fields.append(field)
-            if field["ftype"] == fof.EmbeddedDocumentField:
+            ftype = field["ftype"]
+            is_embedded_doc = inspect.isclass(ftype) and issubclass(


Can you provide the example call to foo.get_implied_field_kwargs() that resulted in a case where there was a field whose type was a strict subclass of EmbeddedDocumentField in this method?

ftype == fof.EmbeddedDocumentField comparisons are used in a couple other methods in this module as well, so I would think that if we need a patch here, then we'd need a patch in those methods too

Generally speaking, we do add more EmbeddedDocument subclasses to represent different data models in FO over time, but we do not define EmbeddedDocumentField subclasses, as the latter is just the generic concept of a field that contains a document.

Yup you're right, there isn’t an in-repo call path today that supplies a strict subclass of EmbeddedDocumentField. The issubclass guards are defensive to avoid silently skipping nested schemas if a downstream plugin or future extension ever introduces a subclass (which is what I ended up doing at one point and discovered this issue). I updated the remaining equality check in _finalize_embedded_doc_fields to use the same issubclass pattern we already use in _merge_embedded_doc_fields, so merge/finalize behavior stays consistent and safe for hypothetical subclasses while remaining identical for current inputs.

brimoor

I'm still trying to wrap my head around the actual error case here. The methods you're tweaking are strictly invoked by calling get_implied_field_kwargs() so can you provide an example that raises an exception on develop?

import fiftyone.core.odm as foo

# what fails here?
value = ...
kwargs = foo.get_implied_field_kwargs(value, ...)

The unit tests are contrived ways of invoking the private methods directly, which users would never do. I'd like to have "integration" tests that validate the behavior of get_implied_field_kwargs() instead.

brimoor · 2025-12-05T05:32:51Z

fiftyone/core/odm/utils.py

+        return []
+
+    if isinstance(fields_dict, list):
+        # Older callers may still provide a list that already filtered out


nit: we don't need to support backwards-compatibility in these private methods. We can assume they are strictly invoked by get_implied_field_kwargs()

…nested schemas - Removed legacy handling of list input in _finalize_embedded_doc_fields to streamline processing. - Introduced new integration tests for the get_implied_field_kwargs function to validate merging of nested embedded document schemas, ensuring correct field type inference and structure.

lorenzoriano · 2025-12-05T16:45:00Z

I'm still trying to wrap my head around the actual error case here. The methods you're tweaking are strictly invoked by calling get_implied_field_kwargs() so can you provide an example that raises an exception on develop?
import fiftyone.core.odm as foo

# what fails here?
value = ...
kwargs = foo.get_implied_field_kwargs(value, ...)
The unit tests are contrived ways of invoking the private methods directly, which users would never do. I'd like to have "integration" tests that validate the behavior of get_implied_field_kwargs() instead.

The bug shows up when get_implied_field_kwargs() is asked to infer a schema for nested embedded-document structures, specifically when:

The value is a list of embedded documents, and
Those embedded documents themselves contain embedded-document fields.

I had this issue when working on the new (and now removed) labels in HRM2. The old implementation of _merge_embedded_doc_fields() assumed its fields_dict argument was always a dict, but in practice it was sometimes a list of field-specs coming. When merging them, it would end up doing something equivalent to:
fields_dict[name] = field # fields_dict is actually a list here
which on develop raises:
TypeError: list indices must be integers or slices, not str
from inside _merge_embedded_doc_fields().

I’ve added a test that uses get_implied_field_kwargs(). The new GetImpliedFieldKwargsTests.test_list_of_embedded_docs_merges_nested_schema builds a list of embedded documents where the nested field is a subclass of EmbeddedDocumentField, and each element’s inner document exposes different fields.

coderabbitai

Actionable comments posted: 2

♻️ Duplicate comments (1)

tests/unittests/odm_tests.py (1)

64-68: Remove unnecessary pass.

The docstring alone is sufficient for the class body.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 684d729 and 6791871.

📒 Files selected for processing (2)

fiftyone/core/odm/utils.py (2 hunks)
tests/unittests/odm_tests.py (2 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

fiftyone/core/odm/utils.py (2)

fiftyone/core/dataset.py (2)

name (685-687)

name (690-704)

fiftyone/core/fields.py (3)

EmbeddedDocumentField (1775-2100)

ListField (1045-1095)

DictField (1130-1190)

🪛 Ruff (0.14.7)

tests/unittests/odm_tests.py

68-68: Unnecessary pass statement

Remove unnecessary pass

(PIE790)

70-70: Missing return type annotation for private function _build_embedded_field

(ANN202)

98-98: Use a regular assert instead of unittest-style assertIn

Replace assertIn(...) with assert ...

(PT009)

99-99: Use a regular assert instead of unittest-style assertEqual

Replace assertEqual(...) with assert ...

(PT009)

119-119: Use a regular assert instead of unittest-style assertIsNone

Replace assertIsNone(...) with assert ...

(PT009)

128-128: Use a regular assert instead of unittest-style assertIn

Replace assertIn(...) with assert ...

(PT009)

129-129: Use a regular assert instead of unittest-style assertIn

Replace assertIn(...) with assert ...

(PT009)

137-137: Use a regular assert instead of unittest-style assertIsInstance

Replace assertIsInstance(...) with assert ...

(PT009)

138-138: Use a regular assert instead of unittest-style assertIn

Replace assertIn(...) with assert ...

(PT009)

139-139: Use a regular assert instead of unittest-style assertIn

Replace assertIn(...) with assert ...

(PT009)

150-150: Unnecessary pass statement

Remove unnecessary pass

(PIE790)

180-180: Use a regular assert instead of unittest-style assertEqual

Replace assertEqual(...) with assert ...

(PT009)

181-181: Use a regular assert instead of unittest-style assertEqual

Replace assertEqual(...) with assert ...

(PT009)

182-182: Use a regular assert instead of unittest-style assertEqual

Replace assertEqual(...) with assert ...

(PT009)

189-189: Use a regular assert instead of unittest-style assertTrue

Replace assertTrue(...) with assert ...

(PT009)

192-192: Use a regular assert instead of unittest-style assertEqual

Replace assertEqual(...) with assert ...

(PT009)

195-195: Use a regular assert instead of unittest-style assertEqual

Replace assertEqual(...) with assert ...

(PT009)

fiftyone/core/odm/utils.py

13-13: Import from collections.abc instead: Iterable

Import from collections.abc

(UP035)

13-13: typing.Dict is deprecated, use dict instead

(UP035)

13-13: typing.List is deprecated, use list instead

(UP035)

498-498: Use X | Y for type annotations

(UP007)

501-501: Missing return type annotation for private function _parse_embedded_doc_list_fields

(ANN202)

515-515: Use X | Y for type annotations

Convert to X | Y

(UP007)

536-536: Use X | Y for type annotations

Convert to X | Y

(UP007)

548-548: Use X | Y for type annotations

Convert to X | Y

(UP007)

603-603: Missing return type annotation for private function _finalize_embedded_doc_fields

(ANN202)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (13)

GitHub Check: test-windows / test-python (windows-latest, 3.12)
GitHub Check: test / test-python (ubuntu-latest-m, 3.10)
GitHub Check: test-windows / test-python (windows-latest, 3.9)
GitHub Check: test-windows / test-python (windows-latest, 3.11)
GitHub Check: test-windows / test-python (windows-latest, 3.10)
GitHub Check: test / test-python (ubuntu-latest-m, 3.12)
GitHub Check: test / test-app
GitHub Check: test / test-python (ubuntu-latest-m, 3.9)
GitHub Check: test / test-python (ubuntu-latest-m, 3.11)
GitHub Check: lint / eslint
GitHub Check: build / build
GitHub Check: e2e / test-e2e
GitHub Check: build

🔇 Additional comments (11)

tests/unittests/odm_tests.py (6)

13-15: LGTM!

The new imports for fof and odmu are correctly added to support the new test classes.

80-99: LGTM!

Test correctly validates that ListField merging behavior is preserved with the new implementation.

101-119: LGTM!

Good coverage of the conflict resolution behavior where mismatched subfield types result in None.

121-129: LGTM!

Key test case validating the issubclass() fix for handling subclassed EmbeddedDocumentField types.

131-139: LGTM!

Validates the defensive normalization of list inputs to dict format.

159-195: LGTM!

Comprehensive integration test that validates the full merge/finalize pipeline through the public get_implied_field_kwargs API. The test correctly exercises the scenario described in the PR objectives.

fiftyone/core/odm/utils.py (5)

496-498: LGTM!

Type aliases clearly document the dual-format (dict/list) representations and improve readability of the merging functions.

501-510: LGTM!

Correctly updated to use the return value from _merge_embedded_doc_fields and finalize the result.

591-598: LGTM!

The recursive merge correctly normalizes existing nested schemas before merging, which is the core fix for the deeply nested document issue.

603-629: LGTM!

Correctly converts the merged dict-form schemas back to list form, with proper recursive handling for nested embedded documents using the issubclass() pattern.

569-572: The review comment's concern about field spec mutation is not applicable. Field specifications are freshly constructed by _parse_embedded_doc_fields on each call and are never reused across multiple merge operations. The mutation at line 572 is safe because:

Each field dict originates from a fresh call to _parse_embedded_doc_fields, which constructs new dict objects.

Within _parse_embedded_doc_list_fields, each iteration creates new specs; they are not cached or stored for reuse.

The assignment merged[name] = field stores a reference to this fresh dict, which is then immediately normalized on the next line without external dependencies.

The docstring explicitly states the current design "avoids mutating list-based inputs in-place, which previously caused TypeError"—this refers to the incoming fields parameter structure, not the safety of normalizing individual field dicts.

The proposed diff in the original comment would not improve safety and would introduce unnecessary overhead without addressing an actual problem.

coderabbitai · 2025-12-05T16:48:34Z

fiftyone/core/odm/utils.py

 import json
 import numbers
 import sys
+from typing import Any, Dict, Iterable, List, Union


🧹 Nitpick | 🔵 Trivial

Consider modern type annotations.

For Python 3.9+, prefer collections.abc.Iterable and built-in generics (dict, list) over typing equivalents. This is optional if maintaining older Python compatibility.

🧰 Tools

🪛 Ruff (0.14.7)

13-13: Import from collections.abc instead: Iterable

Import from collections.abc

(UP035)

13-13: typing.Dict is deprecated, use dict instead

(UP035)

13-13: typing.List is deprecated, use list instead

(UP035)

🤖 Prompt for AI Agents

In fiftyone/core/odm/utils.py around line 13, the file imports typing.Iterable, Dict, and List; update to modern annotations by importing Iterable from collections.abc and using built-in generics (dict, list) in your type annotations (keep typing.Any and typing.Union if still needed for compatibility), e.g., replace "from typing import Any, Dict, Iterable, List, Union" with an import of Any and Union from typing plus "from collections.abc import Iterable" and update any annotations that used Dict[...] and List[...] to dict[...] and list[...].

tests/unittests/odm_tests.py

brimoor · 2025-12-05T20:06:44Z

I’ve added a test that uses get_implied_field_kwargs(). The new GetImpliedFieldKwargsTests.test_list_of_embedded_docs_merges_nested_schema builds a list of embedded documents where the nested field is a subclass of EmbeddedDocumentField, and each element’s inner document exposes different fields.

Okay so here's the test case you added:

import fiftyone as fo
import fiftyone.core.odm as foo
import fiftyone.core.fields as fof

class _GetImpliedInner(foo.EmbeddedDocument):
    x = fof.FloatField()
    score = fof.FloatField()

class _GetImpliedSubclassedEmbeddedDocField(fof.EmbeddedDocumentField):
    pass

class _GetImpliedOuter(foo.EmbeddedDocument):
    # option 1
    pose = fof.EmbeddedDocumentField(_GetImpliedInner)
    
    # option 2
    # pose = _GetImpliedSubclassedEmbeddedDocField(_GetImpliedInner)

values = [
    _GetImpliedOuter(pose=_GetImpliedInner(x=1.0)),
    _GetImpliedOuter(pose=_GetImpliedInner(score=0.5)),
]

kwargs = foo.get_implied_field_kwargs(values)
fo.pprint(kwargs)

If I run with option 1 on develop, I get the expected output:

{
    'ftype': <class 'fiftyone.core.fields.ListField'>,
    'subfield': <class 'fiftyone.core.fields.EmbeddedDocumentField'>,
    'embedded_doc_type': <class '__main__._GetImpliedOuter'>,
    'fields': [
        {
            'ftype': <class 'fiftyone.core.fields.EmbeddedDocumentField'>,
            'db_field': 'pose',
            'description': None,
            'info': None,
            'read_only': False,
            'created_at': None,
            'embedded_doc_type': <class '__main__._GetImpliedInner'>,
            'fields': [
                {
                    'ftype': <class 'fiftyone.core.fields.FloatField'>,
                    'db_field': 'x',
                    'description': None,
                    'info': None,
                    'read_only': False,
                    'created_at': None,
                    'name': 'x',
                },
                {
                    'ftype': <class 'fiftyone.core.fields.FloatField'>,
                    'db_field': 'score',
                    'description': None,
                    'info': None,
                    'read_only': False,
                    'created_at': None,
                    'name': 'score',
                },
            ],
            'name': 'pose',
        },
    ],
}

If I run option 2 on develop, I don't get an error, but it fails to pick up the score field.

If I only change ftype == fof.EmbeddedDocumentField to issubclass(ftype, fof.EmbeddedDocumentField) on develop, then option 2 works as expected too.

So, given that defining and using EmbeddedDocumentField subclasses is NOT something we officially support, I'm still on the hunt for a concrete example where get_implied_field_kwargs() doesn't work on develop that would justify the changes in this PR 😄

coderabbitai

Actionable comments posted: 1

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6791871 and 65f240b.

📒 Files selected for processing (2)

fiftyone/core/odm/utils.py (3 hunks)
tests/unittests/odm_tests.py (2 hunks)

🧰 Additional context used

🧠 Learnings (3)

📓 Common learnings

Learnt from: swheaton
Repo: voxel51/fiftyone PR: 6023
File: tests/unittests/factory/delegated_operation_doc_tests.py:55-57
Timestamp: 2025-06-16T21:08:14.104Z
Learning: The DelegatedOperationDocument class in fiftyone/factory/repos/delegated_operation_doc.py inherits from object (plain Python class), not from Pydantic BaseModel. Using setattr to modify its attributes in tests is appropriate and does not bypass any validation.

📚 Learning: 2025-09-10T03:01:27.501Z

Learnt from: swheaton
Repo: voxel51/fiftyone PR: 6334
File: fiftyone/core/expressions.py:2727-2741
Timestamp: 2025-09-10T03:01:27.501Z
Learning: FiftyOne requires MongoDB >= 6.0 as documented at https://docs.voxel51.com/user_guide/config.html#using-a-different-mongodb-version

Applied to files:

tests/unittests/odm_tests.py

📚 Learning: 2024-11-11T19:26:20.542Z

Learnt from: brimoor
Repo: voxel51/fiftyone PR: 5086
File: fiftyone/operators/store/service.py:64-64
Timestamp: 2024-11-11T19:26:20.542Z
Learning: The codebase requires Python >= 3.9, so using features introduced in Python 3.9, such as type hinting with `list[str]`, is acceptable.

Applied to files:

fiftyone/core/odm/utils.py

🧬 Code graph analysis (2)

tests/unittests/odm_tests.py (2)

fiftyone/core/fields.py (3)

FloatField (958-1000)

EmbeddedDocumentField (1775-2100)

ListField (1045-1095)

fiftyone/core/odm/utils.py (1)

get_implied_field_kwargs (364-435)

fiftyone/core/odm/utils.py (1)

fiftyone/core/fields.py (1)

EmbeddedDocumentField (1775-2100)

🪛 Ruff (0.14.7)

tests/unittests/odm_tests.py

71-71: Unnecessary pass statement

Remove unnecessary pass

(PIE790)

101-101: Use a regular assert instead of unittest-style assertEqual

Replace assertEqual(...) with assert ...

(PT009)

102-102: Use a regular assert instead of unittest-style assertEqual

Replace assertEqual(...) with assert ...

(PT009)

103-103: Use a regular assert instead of unittest-style assertEqual

Replace assertEqual(...) with assert ...

(PT009)

110-110: Use a regular assert instead of unittest-style assertTrue

Replace assertTrue(...) with assert ...

(PT009)

113-113: Use a regular assert instead of unittest-style assertEqual

Replace assertEqual(...) with assert ...

(PT009)

116-116: Use a regular assert instead of unittest-style assertEqual

Replace assertEqual(...) with assert ...

(PT009)

fiftyone/core/odm/utils.py

537-537: Missing return type annotation for private function _init_embedded_doc_fields

Add return type annotation: None

(ANN202)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (13)

GitHub Check: test-windows / test-python (windows-latest, 3.11)
GitHub Check: test-windows / test-python (windows-latest, 3.10)
GitHub Check: test-windows / test-python (windows-latest, 3.12)
GitHub Check: test-windows / test-python (windows-latest, 3.9)
GitHub Check: test / test-python (ubuntu-latest-m, 3.9)
GitHub Check: test / test-app
GitHub Check: test / test-python (ubuntu-latest-m, 3.11)
GitHub Check: test / test-python (ubuntu-latest-m, 3.12)
GitHub Check: test / test-python (ubuntu-latest-m, 3.10)
GitHub Check: lint / eslint
GitHub Check: e2e / test-e2e
GitHub Check: build / build
GitHub Check: build

🔇 Additional comments (1)

fiftyone/core/odm/utils.py (1)

512-515: Subclass-aware deep embedded-doc merge/finalize logic looks correct

Using inspect.isclass(ftype) and issubclass(ftype, fof.EmbeddedDocumentField) in both _merge_embedded_doc_fields and _finalize_embedded_doc_fields, together with the recursive _init_embedded_doc_fields, ensures that:

During merging, all embedded-doc specs (including subclasses) have their field["fields"] normalized to dicts at every depth, so recursive merges no longer hit the “list indices must be integers or slices” failure for 3+ levels of nesting.

During finalization, the same subclass-aware check correctly converts those nested dicts back to lists, so callers of get_implied_field_kwargs() always see the expected list schema format even when plugins provide EmbeddedDocumentField subclasses.

The change preserves existing behavior for non-embedded types and base EmbeddedDocumentField while fixing the nested-subclass case exercised by the new tests.

Also applies to: 531-533, 540-543, 556-559

coderabbitai · 2025-12-08T19:01:16Z

tests/unittests/odm_tests.py

+class _GetImpliedInner(foo.EmbeddedDocument):
+    x = fof.FloatField()
+    score = fof.FloatField()
+
+
+class _GetImpliedSubclassedEmbeddedDocField(fof.EmbeddedDocumentField):
+    """Lightweight subclass used to mimic plugin-provided embedded fields."""
+
+    pass
+
+
+class _GetImpliedOuter(foo.EmbeddedDocument):
+    # Use a subclassed `EmbeddedDocumentField` so that `ftype` in the inferred
+    # schema is a subclass, not the base `EmbeddedDocumentField`
+    pose = _GetImpliedSubclassedEmbeddedDocField(_GetImpliedInner)
+
+
+class GetImpliedFieldKwargsTests(unittest.TestCase):
+    """Integration-style tests that exercise `_merge_embedded_doc_fields`
+    via the public :func:`get_implied_field_kwargs` API.
+    """
+
+    def test_list_of_embedded_docs_merges_nested_schema(self):
+        # Two `_GetImpliedOuter` documents whose nested `_GetImpliedInner`
+        # subdocuments populate
+        # different fields. The merged schema for `pose` should be the union
+        # of the observed inner fields.
+        inner_with_x = _GetImpliedInner(x=1.0)
+        inner_with_score = _GetImpliedInner(score=0.5)
+
+        values = [
+            _GetImpliedOuter(pose=inner_with_x),
+            _GetImpliedOuter(pose=inner_with_score),
+        ]
+
+        kwargs = foo.get_implied_field_kwargs(values)
+
+        # Top-level: list of `_GetImpliedOuter` embedded documents
+        self.assertEqual(kwargs["ftype"], fof.ListField)
+        self.assertEqual(kwargs["subfield"], fof.EmbeddedDocumentField)
+        self.assertEqual(kwargs["embedded_doc_type"], _GetImpliedOuter)
+
+        # Nested: `pose` should itself be an embedded document whose schema
+        # includes both `x` and `score`, demonstrating that nested embedded
+        # schemas from multiple list elements are correctly merged when the
+        # field type is a subclass of `EmbeddedDocumentField`
+        pose_spec = next(f for f in kwargs["fields"] if f["name"] == "pose")
+        self.assertTrue(
+            issubclass(pose_spec["ftype"], fof.EmbeddedDocumentField)
+        )
+        self.assertEqual(pose_spec["embedded_doc_type"], _GetImpliedInner)
+
+        inner_field_names = {f["name"] for f in pose_spec["fields"]}
+        self.assertEqual(inner_field_names, {"x", "score"})


🧹 Nitpick | 🔵 Trivial

Good integration test for nested subclassed EmbeddedDocumentField merging

The _GetImpliedInner / _GetImpliedOuter scaffolding and GetImpliedFieldKwargsTests.test_list_of_embedded_docs_merges_nested_schema accurately exercise the get_implied_field_kwargs() → _parse_embedded_doc_list_fields → _merge/_finalize_embedded_doc_fields path, and the assertions on:

top-level list + embedded-doc subfield/embedded_doc_type

pose_spec["ftype"] being a subclass of EmbeddedDocumentField

union of inner field names {"x", "score"}

match the intended semantics of the bugfix and should protect against regressions in the deep-nesting/subclass case.

Optionally, you can drop the redundant pass in _GetImpliedSubclassedEmbeddedDocField since the docstring alone gives it a non-empty body.

🧰 Tools

🪛 Ruff (0.14.7)

71-71: Unnecessary pass statement

Remove unnecessary pass

(PIE790)

101-101: Use a regular assert instead of unittest-style assertEqual

Replace assertEqual(...) with assert ...

(PT009)

102-102: Use a regular assert instead of unittest-style assertEqual

Replace assertEqual(...) with assert ...

(PT009)

103-103: Use a regular assert instead of unittest-style assertEqual

Replace assertEqual(...) with assert ...

(PT009)

110-110: Use a regular assert instead of unittest-style assertTrue

Replace assertTrue(...) with assert ...

(PT009)

113-113: Use a regular assert instead of unittest-style assertEqual

Replace assertEqual(...) with assert ...

(PT009)

116-116: Use a regular assert instead of unittest-style assertEqual

Replace assertEqual(...) with assert ...

(PT009)

🤖 Prompt for AI Agents

In tests/unittests/odm_tests.py around lines 63 to 116, the review notes the test is good and suggests an optional cleanup: remove the redundant `pass` statement from the `_GetImpliedSubclassedEmbeddedDocField` class since the docstring already makes the body non-empty; simply delete the `pass` line to keep the class minimal and tidy.

lorenzoriano · 2025-12-08T19:01:49Z

I’ve added a test that uses get_implied_field_kwargs(). The new GetImpliedFieldKwargsTests.test_list_of_embedded_docs_merges_nested_schema builds a list of embedded documents where the nested field is a subclass of EmbeddedDocumentField, and each element’s inner document exposes different fields.

Okay so here's the test case you added:

import fiftyone as fo
import fiftyone.core.odm as foo
import fiftyone.core.fields as fof

class _GetImpliedInner(foo.EmbeddedDocument):
    x = fof.FloatField()
    score = fof.FloatField()

class _GetImpliedSubclassedEmbeddedDocField(fof.EmbeddedDocumentField):
    pass

class _GetImpliedOuter(foo.EmbeddedDocument):
    # option 1
    pose = fof.EmbeddedDocumentField(_GetImpliedInner)
    
    # option 2
    # pose = _GetImpliedSubclassedEmbeddedDocField(_GetImpliedInner)

values = [
    _GetImpliedOuter(pose=_GetImpliedInner(x=1.0)),
    _GetImpliedOuter(pose=_GetImpliedInner(score=0.5)),
]

kwargs = foo.get_implied_field_kwargs(values)
fo.pprint(kwargs)

If I run with option 1 on develop, I get the expected output:

{
    'ftype': <class 'fiftyone.core.fields.ListField'>,
    'subfield': <class 'fiftyone.core.fields.EmbeddedDocumentField'>,
    'embedded_doc_type': <class '__main__._GetImpliedOuter'>,
    'fields': [
        {
            'ftype': <class 'fiftyone.core.fields.EmbeddedDocumentField'>,
            'db_field': 'pose',
            'description': None,
            'info': None,
            'read_only': False,
            'created_at': None,
            'embedded_doc_type': <class '__main__._GetImpliedInner'>,
            'fields': [
                {
                    'ftype': <class 'fiftyone.core.fields.FloatField'>,
                    'db_field': 'x',
                    'description': None,
                    'info': None,
                    'read_only': False,
                    'created_at': None,
                    'name': 'x',
                },
                {
                    'ftype': <class 'fiftyone.core.fields.FloatField'>,
                    'db_field': 'score',
                    'description': None,
                    'info': None,
                    'read_only': False,
                    'created_at': None,
                    'name': 'score',
                },
            ],
            'name': 'pose',
        },
    ],
}

If I run option 2 on develop, I don't get an error, but it fails to pick up the score field.

If I only change ftype == fof.EmbeddedDocumentField to issubclass(ftype, fof.EmbeddedDocumentField) on develop, then option 2 works as expected too.

So, given that defining and using EmbeddedDocumentField subclasses is NOT something we officially support, I'm still on the hunt for a concrete example where get_implied_field_kwargs() doesn't work on develop that would justify the changes in this PR 😄

Makes sense, thanks. I rolled back most changes and just add a minimal behavior change: use issubclass(ftype, EmbeddedDocumentField) in _merge_embedded_doc_fields (and the init/finalize helpers). The concrete failure on develop is captured in GetImpliedFieldKwargsTests.test_list_of_embedded_docs_merges_nested_schema: when the field is a subclass of EmbeddedDocumentField, get_implied_field_kwargs() only keeps one of the nested fields (x or score). With the issubclass check, it now merges both.

To clarify, the concrete problem I ran into is your “option 2” scenario: when the schema uses a subclass of EmbeddedDocumentField, get_implied_field_kwargs() on origin/develop fails to merge the nested embedded schema across list elements, so only one of the observed inner fields (x or score) ends up in the inferred spec. I’ve captured that behavior in GetImpliedFieldKwargsTests.test_list_of_embedded_docs_merges_nested_schema, which fails on develop and passes in the new code, as we treat subclasses of EmbeddedDocumentField as embedded as well.

Honestly, if you don't think this issue is worth the effort, I will happily drop this PR. Thanks for the review!

lorenzoriano requested review from a team as code owners December 4, 2025 21:01

coderabbitai bot reviewed Dec 4, 2025

View reviewed changes

tests/unittests/odm_tests.py Outdated Show resolved Hide resolved

tests/unittests/odm_tests.py Outdated Show resolved Hide resolved

brimoor reviewed Dec 4, 2025

View reviewed changes

Fixing the tests

684d729

brimoor reviewed Dec 5, 2025

View reviewed changes

coderabbitai bot reviewed Dec 5, 2025

View reviewed changes

Simplify the PR

65f240b

coderabbitai bot reviewed Dec 8, 2025

View reviewed changes

Fix _finalize_embedded_doc_fields to handle deeply nested embedded documents #6639

Are you sure you want to change the base?

Fix _finalize_embedded_doc_fields to handle deeply nested embedded documents #6639

Uh oh!

Conversation

lorenzoriano commented Dec 4, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem Description

Root Cause

Solution

Changes to _merge_embedded_doc_fields

Changes to _finalize_embedded_doc_fields

Test Coverage

Impact

Related Issues

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Dec 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Estimated code review effort

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

brimoor Dec 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

brimoor Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

brimoor Dec 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lorenzoriano Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

brimoor left a comment

Choose a reason for hiding this comment

Uh oh!

brimoor Dec 5, 2025

Choose a reason for hiding this comment

Uh oh!

lorenzoriano Dec 5, 2025

Choose a reason for hiding this comment

Uh oh!

lorenzoriano commented Dec 5, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 5, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

brimoor commented Dec 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 8, 2025

Choose a reason for hiding this comment

Uh oh!

lorenzoriano commented Dec 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

lorenzoriano commented Dec 4, 2025 •

edited by coderabbitai bot

Loading

Changes to `_merge_embedded_doc_fields`

Changes to `_finalize_embedded_doc_fields`

coderabbitai bot commented Dec 4, 2025 •

edited

Loading

brimoor Dec 4, 2025 •

edited

Loading

brimoor Dec 4, 2025 •

edited

Loading

brimoor commented Dec 5, 2025 •

edited

Loading