Add Schrag2026Pediatric SSVEP dataset (n=47, ages 5-18) by bruAristimunha · Pull Request #1069 · NeuroTechX/moabb

bruAristimunha · 2026-05-07T16:41:11Z

Summary

Adds Schrag2026Pediatric — pediatric SSVEP-BCI dataset from Schrag et al. (2026) hosted on Zenodo (CC-BY-ND-4.0). 47 children (ages 5–18, 40.4% female) recorded at 256 Hz on 16 channels with the g.tec g.GAMMAsys + g.USBamp + g.GAMMAcap system, ground Fpz, earlobe reference.

Each subject contributes:

Personalization (T1) — 12 visual stimuli (4 contrasts × 3 sizes), all flickering at 10 Hz, used by the original authors to pick a per-child personalized stimulus.
SSVEP game (T2 + T3) — 4-target online game at 6.25 / 10 / 11.11 / 14.28 Hz, played twice (once with each subject's personal stimulus, once with a high-contrast standard) across two themed maps.

By default the loader exposes the SSVEP game runs only — two sessions per subject ("0standard", "1personal"), 5 s trials at four target frequencies. include_personalization=True also loads T1 as a third session ("2personalization"); all its trials carry the "10" event since every personalization stimulus flickers at 10 Hz.

Important caveat (also in the class docstring): trial labels for the game sessions come from the recorded fbCCA classifier output (the Selected SPO column of the per-game movement CSV) — i.e. the frequency the system identified during the live game, which then drove avatar movement. They are not ground-truth target frequencies. Treating y as such biases benchmarks toward fbCCA's behaviour. Trial-vs-CSV count drift is min-truncated when small (≤ 10 percent), otherwise the run's labels are dropped to avoid silent shifts.

Files

moabb/datasets/schrag2026.py — new dataset class + helpers
moabb/datasets/__init__.py — registration
moabb/datasets/summary_ssvep.csv — summary row
docs/source/api.rst — autosummary entry under SSVEP datasets
docs/source/whats_new.rst — changelog entry

Implementation notes

XDF + Unity markers via pyxdf (soft-imported). Modelled after aguilera_rodriguez2025.py (XDF) and kumar2024.py (single-zip Zenodo). Marker stream is selected by name (UnityMarkerStream) — each XDF also carries an empty gUSBamp-1Markers stream that wins a type-based match in some files.
Single 1.2 GB DatasetData.zip is downloaded once and extracted per-subject on demand via safe_extract_zip(... members=...) so first-use latency stays in seconds for one-subject runs. Extraction is staged into a sibling temp dir then os.replace-d into place — race-safe under concurrent pytest workers.
Demographics (_AGES, _SEXES) hardcoded from Participant_Demographic_Info.csv in the deposit; verified to match byte-for-byte.
License set to CC-BY-ND-4.0 per the live Zenodo deposit (the preprint PDF says CC-BY-4.0; Zenodo metadata is authoritative for the data — comment in the source explains the discrepancy).

Cross-checks performed

31/31 metadata fields verified against (a) preprint PDF, (b) Participant_Demographic_Info.csv, (c) live Zenodo API.
Loaded subjects 1, 2, 10, 20, 30, 40, 47 end-to-end. Trial counts 33–87 per session, all 4 SSVEP classes represented in each clean run.
SSVEP paradigm round-trip on 5 clean subjects: X.shape=(428, 16, 1281), balanced classes {6.25: 119, 10: 116, 11.11: 95, 14.28: 98}.
Post 7–45 Hz bandpass channel std ~15–34 µV (sane for pediatric EEG).
Atomic _extract_subject verified idempotent; no temp leftovers after concurrent extraction races.
P001 personal session has 15% trial-vs-CSV drift; the safeguard correctly drops labels with a clear log.error rather than silently shift them.

Style note

The class follows MOABB's existing BaseDataset shape but its module-level helpers (_load_xdf_streams, _read_unity_markers, _build_raw, _load_game_run, _load_personalization_run, _match_freq, _extract_subject) are deliberately flat / procedural and use the variable names (marker_ts, markers, eeg_stream, …) from the upstream Schrag / Comaduran reference notebooks (epoching-example.ipynb in the Zenodo deposit), so the original authors can read it top-to-bottom.

Test plan

References

Preprint: Schrag, E., Comaduran Marquez, D., Kirton, A., & Kinney-Lang, E. (2026). A steady-state visual evoked potential-based brain-computer interface dataset in children and adolescents. Research Square. https://doi.org/10.21203/rs.3.rs-9347306/v1
Data: https://doi.org/10.5281/zenodo.19440997
Original code: https://github.com/kirtonBCIlab/ES_Masters

Pediatric SSVEP-BCI dataset from Schrag et al. (2026): 47 children (ages 5-18, 40.4% female) recorded with g.tec g.GAMMAsys + g.USBamp at 256 Hz on 16 scalp channels. Two-stage protocol: - Stimulus personalization (12 stimuli, 4 contrasts x 3 sizes at 10 Hz) - Online 4-target SSVEP game (6.25 / 10 / 11.11 / 14.28 Hz), played twice per subject (personal vs standard stimulus, two maps). By default the loader exposes the SSVEP game runs as two sessions ("0standard", "1personal") with 5 s trials at four target frequencies; include_personalization=True opens a third "2personalization" session (all trials labelled "10" -- the shared 10 Hz flicker). Trial labels for the game come from the live fbCCA classifier output (Selected SPO column in the per-game movement CSV); this is documented in the class docstring as not-quite-ground-truth. Trial / CSV count drift is min-truncated when small (<= 10 percent), otherwise the run's labels are dropped to avoid silent shifts. Data hosted as a single 1.2 GB Zenodo zip (10.5281/zenodo.19440997, CC-BY-ND-4.0); per-subject extraction is staged via tempfile + os.replace for race-safe concurrent runs. Preprint DOI: 10.21203/rs.3.rs-9347306/v1 - Add moabb/datasets/schrag2026.py - Register in moabb/datasets/__init__.py and summary_ssvep.csv - Add to docs/source/api.rst SSVEP autosummary and whats_new.rst

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1446b3ff31

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-07T16:43:16Z

+        if zip_path.suffix != ".zip":
+            target = zip_path.with_suffix(".zip")
+            if not target.exists():
+                zip_path.rename(target)
+            zip_path = target


Preserve the downloader cache filename

Because the Zenodo API URL ends in /content, data_dl caches this archive at a path named content; renaming it to content.zip removes the exact file that data_dl checks on the next call. Since data_path() calls data_dl() before checking whether the subject was already extracted, any later access to an already-extracted subject will still re-download the full ~1.2 GB archive every time. Keep the cached path intact or download/cache under the final zip filename instead.

Useful? React with 👍 / 👎.

Audit of the dataset class showed six private helpers each used by exactly one caller, which obscured the linear flow when reading top-down. Inline the small ones; keep helpers used by both loaders. - Inline _normalize_spo (now: ``_match_freq`` next to its caller) - Inline _personalization_label as a 2-line ``rsplit`` in the loader - Inline _movement_csv_for_eeg as a path expression at the call site - Inline _wanted_session_keys, _find_game_files, _safe_pair_count - Move _load_xdf_streams, _read_unity_markers, _build_raw, _load_*_run, _extract_subject to module level so the file reads top-to-bottom - Rename ``marker_text`` -> ``markers`` and ``start_idx`` -> ``trial_starts`` to match the variable names used in the upstream Schrag/Comaduran reference notebooks (``epoching-example.ipynb`` in the Zenodo deposit) Behavior unchanged. All previously-verified properties hold: - demographics still match the Zenodo CSV byte-for-byte - METADATA fields preserved (DOI, license, freqs, n_classes, n_subjects) - 10 percent drift safeguard still drops shifted-label runs - include_personalization=True still yields 40 T1 trials - session filtering ("personal" and "0standard" forms) still works - _extract_subject is still atomic (tempfile + os.replace) - SSVEP paradigm round-trip identical (428 trials, balanced 4 classes)

chatgpt-codex-connector Bot reviewed May 7, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Schrag2026Pediatric SSVEP dataset (n=47, ages 5-18)#1069

Add Schrag2026Pediatric SSVEP dataset (n=47, ages 5-18)#1069
bruAristimunha wants to merge 2 commits intodevelopfrom
add-schrag2026-pediatric-ssvep

bruAristimunha commented May 7, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot May 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

bruAristimunha commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Files

Implementation notes

Cross-checks performed

Style note

Test plan

References

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 7, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

bruAristimunha commented May 7, 2026 •

edited

Loading