Fix/desc construction#342
Conversation
📚 Documentation Preview📦 Download Documentation Artifact
💡 To enable live previews, add a |
| **base_dataset_kwargs, | ||
| ) | ||
| ) | ||
| for record in self.records |
There was a problem hiding this comment.
| for record in self.records | |
| datasets = [ | |
| EEGDashRaw( | |
| record, | |
| self.cache_dir, | |
| description=self._build_description(record, description_fields), | |
| **base_dataset_kwargs, | |
| ) | |
| for record in self.records | |
| ] |
There was a problem hiding this comment.
minimizing the diff.
| record: dict[str, Any], | ||
| description_fields: list[str], | ||
| participants_row: dict[str, Any] | None = None, | ||
| ) -> dict[str, Any]: |
There was a problem hiding this comment.
I think it is more verbose than necessary for this function, and I would move to an auxiliary place, some utilities. This way, the dataset object does not deliver this function, which is only used once.
| def _build_description( | ||
| self, | ||
| record: dict[str, Any], | ||
| description_fields: list[str], | ||
| participants_row: dict[str, Any] | None = None, | ||
| ) -> dict[str, Any]: | ||
| """Build a description dict for a single record. | ||
|
|
||
| Extracts values for each requested field from the record, then merges | ||
| participant data from either an explicit ``participants_row`` (offline | ||
| path, from a local ``participants.tsv``) or the embedded | ||
| ``participant_tsv`` key inside the record (online paths). Fields still | ||
| absent after the merge are set to ``None`` so the schema is always | ||
| complete. When both the record and participant data carry the same | ||
| field, precedence is determined by ``self._description_precedence``; a | ||
| ``debug``-level log is emitted when the values differ. |
There was a problem hiding this comment.
I think we can compact much more function too.
| participants_row=part, | ||
| description_fields=description_fields, | ||
| ) | ||
| description = self._build_description(record, description_fields) |
There was a problem hiding this comment.
nice simplification
There was a problem hiding this comment.
For the test suite, we need to do much more compact, use parametrization, and use a more pytest-style.
Fix
records=path producing'None'descriptions + unified description constructionMain Bug Fixed
When constructing
EEGDashDatasetviarecords=, every recording'sdescriptionwasthe string
"None". Therecords=path never built or forwarded adescriptiondict toEEGDashRaw; braindecode then stringified the missingNonedefault.Changes
eegdash/dataset/dataset.pyBug fix:
records=path now builds and passes adescriptiondict to eachEEGDashRaw, matching the behaviour of the query and offline paths.Unified
_build_descriptionhelper: All three construction paths (records=,offline, query) previously built descriptions independently. A shared
_build_descriptionmethod now covers all three, eliminating divergence. It:
None(sodesc["subject"]never raisesKeyError)_find_key_in_nested_dict(recursive, case/separator-insensitive,handles both v1 and v2 record formats)
participant_tsvdata with configurable precedence (see below)DEBUGlog when a conflict is detectedConfigurable precedence (
description_precedence): New constructor parameter controlswhich source wins when the same field appears in both the record and
participant_tsv:"record"(default) — existing behaviour, record-level value is kept"participant_tsv"— participant_tsv overwrites the record value, includingNone(documented intentional behaviour: choosing this mode means fully trusting that source)
All-None field warning: After construction, emits a
WARNINGif anydescription_fieldsentry isNoneacross all recordings — surfaces typos like"sbject"early._normalize_recordscalled on the full batch: Preserves deduplication correctness(per-record calls broke
_dedupe_records).tests/unit_tests/dataset/test_build_description.py(new file)test_build_description_precedence_conflict"record"mode: top-level value wins,"kept"loggedtest_build_description_missing_fields_paddingNone, noKeyErrortest_build_description_key_insensitivity"Subject-ID"in record maps to"subject_id"in fieldstest_dataset_initialization_path_paritydescriptionDataFramestest_build_description_participant_tsv_precedence"participant_tsv"mode: tsv value wins;Nonein tsv overwrites real valuetest_dataset_invalid_description_precedencedescription_precedenceraisesValueErrorat construction