Fix/desc construction by GalAshkenazi1 · Pull Request #342 · eegdash/EEGDash

GalAshkenazi1 · 2026-05-13T07:47:12Z

Fix `records=` path producing `'None'` descriptions + unified description construction

Main Bug Fixed

When constructing EEGDashDataset via records=, every recording's description was
the string "None". The records= path never built or forwarded a description dict to
EEGDashRaw; braindecode then stringified the missing None default.

Changes

`eegdash/dataset/dataset.py`

Bug fix: records= path now builds and passes a description dict to each
EEGDashRaw, matching the behaviour of the query and offline paths.
Unified _build_description helper: All three construction paths (records=,
offline, query) previously built descriptions independently. A shared _build_description
method now covers all three, eliminating divergence. It:
- Pre-fills every requested field with None (so desc["subject"] never raises KeyError)
- Looks up fields via _find_key_in_nested_dict (recursive, case/separator-insensitive,
  handles both v1 and v2 record formats)
- Merges participant_tsv data with configurable precedence (see below)
- Emits a DEBUG log when a conflict is detected
Configurable precedence (description_precedence): New constructor parameter controls
which source wins when the same field appears in both the record and participant_tsv:
- "record" (default) — existing behaviour, record-level value is kept
- "participant_tsv" — participant_tsv overwrites the record value, including None
  (documented intentional behaviour: choosing this mode means fully trusting that source)
All-None field warning: After construction, emits a WARNING if any
description_fields entry is None across all recordings — surfaces typos like
"sbject" early.
_normalize_records called on the full batch: Preserves deduplication correctness
(per-record calls broke _dedupe_records).

`tests/unit_tests/dataset/test_build_description.py` (new file)

Test	What it checks
`test_build_description_precedence_conflict`	Default `"record"` mode: top-level value wins, `"kept"` logged
`test_build_description_missing_fields_padding`	Absent fields → `None`, no `KeyError`
`test_build_description_key_insensitivity`	`"Subject-ID"` in record maps to `"subject_id"` in fields
`test_dataset_initialization_path_parity`	All three init paths produce identical `description` DataFrames
`test_build_description_participant_tsv_precedence`	`"participant_tsv"` mode: tsv value wins; `None` in tsv overwrites real value
`test_dataset_invalid_description_precedence`	Unknown `description_precedence` raises `ValueError` at construction

…v/record info.

github-actions · 2026-05-13T08:12:13Z

📚 Documentation Preview

📦 Download Documentation Artifact

Download the documentation-html artifact from the workflow run to view the docs locally.

💡 To enable live previews, add a SURGE_TOKEN secret to this repository. See surge.sh for setup instructions.

bruAristimunha · 2026-05-13T11:54:56Z

+                        **base_dataset_kwargs,
+                    )
                )
-                for record in self.records


Suggested change

for record in self.records

datasets = [

EEGDashRaw(

record,

self.cache_dir,

description=self._build_description(record, description_fields),

**base_dataset_kwargs,

)

for record in self.records

]

minimizing the diff.

bruAristimunha · 2026-05-13T11:58:00Z

+        record: dict[str, Any],
+        description_fields: list[str],
+        participants_row: dict[str, Any] | None = None,
+    ) -> dict[str, Any]:


I think it is more verbose than necessary for this function, and I would move to an auxiliary place, some utilities. This way, the dataset object does not deliver this function, which is only used once.

bruAristimunha · 2026-05-13T11:59:37Z

+    def _build_description(
+        self,
+        record: dict[str, Any],
+        description_fields: list[str],
+        participants_row: dict[str, Any] | None = None,
+    ) -> dict[str, Any]:
+        """Build a description dict for a single record.
+
+        Extracts values for each requested field from the record, then merges
+        participant data from either an explicit ``participants_row`` (offline
+        path, from a local ``participants.tsv``) or the embedded
+        ``participant_tsv`` key inside the record (online paths).  Fields still
+        absent after the merge are set to ``None`` so the schema is always
+        complete.  When both the record and participant data carry the same
+        field, precedence is determined by ``self._description_precedence``; a
+        ``debug``-level log is emitted when the values differ.


I think we can compact much more function too.

bruAristimunha · 2026-05-13T12:00:02Z

-                    participants_row=part,
-                    description_fields=description_fields,
-                )
+            description = self._build_description(record, description_fields)


nice simplification

bruAristimunha · 2026-05-13T12:00:44Z

For the test suite, we need to do much more compact, use parametrization, and use a more pytest-style.

GalAshkenazi1 added 4 commits May 12, 2026 20:14

first try to fix (no test yet)

c81bd1b

added tests - not ready yet!

e66e05e

Added 2 more tests, fixed previous tests, added option to override ts…

561b2f1

…v/record info.

ruff fixes.

de39e98

simplify the change

ee5acb7

bruAristimunha reviewed May 13, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix/desc construction#342

Fix/desc construction#342
GalAshkenazi1 wants to merge 5 commits into
developfrom
fix/desc_construction

GalAshkenazi1 commented May 13, 2026

Uh oh!

github-actions Bot commented May 13, 2026 •

edited

Loading

Uh oh!

bruAristimunha May 13, 2026

Uh oh!

bruAristimunha May 13, 2026

Uh oh!

bruAristimunha May 13, 2026

Uh oh!

bruAristimunha May 13, 2026

Uh oh!

bruAristimunha May 13, 2026

Uh oh!

bruAristimunha May 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

-                for record in self.records
+            datasets = [
+                EEGDashRaw(
+                    record,
+                    self.cache_dir,
+                    description=self._build_description(record, description_fields),
+                    **base_dataset_kwargs,
+                )
+                for record in self.records
+            ]

Conversation

GalAshkenazi1 commented May 13, 2026

Fix records= path producing 'None' descriptions + unified description construction

Main Bug Fixed

Changes

eegdash/dataset/dataset.py

tests/unit_tests/dataset/test_build_description.py (new file)

Uh oh!

github-actions Bot commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📚 Documentation Preview

Uh oh!

bruAristimunha May 13, 2026

Choose a reason for hiding this comment

Uh oh!

bruAristimunha May 13, 2026

Choose a reason for hiding this comment

Uh oh!

bruAristimunha May 13, 2026

Choose a reason for hiding this comment

Uh oh!

bruAristimunha May 13, 2026

Choose a reason for hiding this comment

Uh oh!

bruAristimunha May 13, 2026

Choose a reason for hiding this comment

Uh oh!

bruAristimunha May 13, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Fix `records=` path producing `'None'` descriptions + unified description construction

`eegdash/dataset/dataset.py`

`tests/unit_tests/dataset/test_build_description.py` (new file)

github-actions Bot commented May 13, 2026 •

edited

Loading