This repository provides utilities for extracting, transforming, and loading publicly available C. elegans neuroscience datasets.
git clone https://github.com/flavell-lab/pub_utils.git
cd pub_utils
uv syncThe package requires Python >= 3.10. Dependencies are managed in pyproject.toml.
The pub_utils package (src/pub_utils/) exports:
-
NeuronFeatures(core.py): Stores neuroanatomical features as a matrix (neurons x features). Features are categorized intocellType,sensoryType,segment, andprocess. Provides fast lookup by neuron ID or feature name. -
NeuronInteraction(core.py): Square adjacency matrix wrapper for connectome data. Handles source→recipient relationships, reciprocal pairs detection, BFS shortest path, and degree analysis. Matrix values represent connection strength (e.g., number of unique ligand-receptor pairs). -
plot_connectome_matrix/plot_reciprocal_network(plot.py): Visualization functions using seaborn heatmaps and networkx graphs with discrete colormaps.
And many other useful functionality -- see src/pub_utils/__init__.py for the full list of features.
- Gene Info:
data/HobertLab/NT_uptake_synthesis_release_gene_info.csv- maps functional categories (uptake, synthesis, release) to gene names for each NT - Pairing Info:
data/Altun2013/NT_receptor_info.csv- original receptor-ligand mappings with confidence scores and receptor type flagsconnectomes/NT_receptor_info.csv- merged database combining Altun2013 receptor data with DeepResearch polarity annotations (excitatory/inhibitory/variable/unknown)
- Release Data: neuron × gene matrices (binary) from literature, reporter, staining methods
- Receptor Data: neuron × receptor matrices (binary) from sequencing, reporter, literature methods
File paths in assets.json
assets
├── neuron_features
│
├── connectomes
│ ├── preassembled/
│ │ ├── structural/ (accessed through OpenWorm)
│ │ │ ├── chemical
│ │ │ └── electrical
│ │ └── molecular/ (from Bentley2016 and RipollSanchez2023)
│ │
│ ├── candy_assembly/ (customized logic, all are molecular)
│ │ ├── dopamine/
│ │ ├── serotonin/
│ │ ├── tyramine/
│ │ ├── octopamine/
│ │ ├── acetylcholine/
│ │ ├── gaba/
│ │ ├── glutamate/
│ │ ├── individual_neuropeptides/
│ │ ├── aggregated_neuropeptides/
│ │ ├── aggregated_synapticNT/
│ │ └── aggregated_extrasynapticNT/
│ │
│ └── reproducibility_tests/ (validation reports on molecular assembly from claude code)
│
├── pairing_info
│ ├── neurotransmitter
│ │ ├── Altun2013 (original receptor-ligand mappings)
│ │ └── merged (combined with DeepResearch polarity data)
│ └── neuropeptide
│
├── release
│ ├── neurotransmitter
│ │ ├── literature
│ │ ├── reporter
│ │ └── staining
│ │
│ └── neuropeptide
│ ├── literature
│ └── sequencing
│
└── receptor
├── neurotransmitter
│ ├── acetylcholine
│ │ ├── sequencing
│ │ └── reporter (metabotropic only)
│ ├── glutamate
│ │ ├── sequencing
│ │ └── reporter (metabotropic only)
│ ├── gaba
│ │ ├── sequencing
│ │ └── reporter
│ ├── dopamine
│ │ ├── reporter
│ │ └── sequencing
│ ├── serotonin
│ │ └── reporter
│ ├── tyramine
│ │ └── sequencing
│ ├── octopamine
│ │ └── sequencing
│ └── all
│ └── literature
│
└── neuropeptide
├── literature
└── sequencing
Note: {username}_assembly/ directories contain custom connectomes assembled via notebook/assemble_connectomes.ipynb. Single-molecule neurotransmitter connectomes go into molecule subdirectories; neuropeptide and aggregated connectomes are saved flat in the root.
data/
├── Altun2013/
│ ├── NPP_receptor_info.csv
│ └── NT_receptor_info.csv
│
├── Bentley2016/
│ ├── NPP_receptor_info.csv
│ ├── NPP_receptor_metabotropic_literature.csv
│ ├── NPP_release_literature.csv
│ ├── NT_receptor_all_literature.csv
│ ├── NT_release_literature.csv
│ ├── dopamine_receptor_all_reporter.csv
│ ├── octopamine_receptor_all_literature.csv
│ ├── serotonin_receptor_all_literature.csv
│ ├── tyramine_receptor_all_literature.csv
│ ├── monoamine_expression.csv (raw)
│ ├── monoamine_receptor_expression.csv (raw)
│ ├── neuropeptide_expression.csv (raw)
│ ├── neuropeptide_receptor_expression.csv (raw)
│ └── supplementary_references.csv (raw)
│
├── Dag2023/
│ ├── serotonin_receptor_all_reporter.csv
│ └── 5htr_expression_dv_final.csv (raw)
│
├── Fenyves2020/
│ ├── acetylcholine_receptor_ionotropic_sequencing.csv
│ ├── gaba_receptor_ionotropic_sequencing.csv
│ ├── glutamate_receptor_ionotropic_sequencing.csv
│ ├── NT_receptor_expression.csv (raw)
│ └── NT_receptor_polarity.csv (raw)
│
├── HobertLab/
│ ├── NT_uptake_synthesis_release_gene_info.csv
│ ├── acetylcholine_receptor_metabotropic_reporter.csv
│ ├── gaba_receptor_all_reporter.csv
│ ├── MA_gaba_release_expression_sequencing.csv (raw)
│ └── NT_receptors.R (raw)
│
├── Muralidhara2025/
│ ├── dopamine_receptor_all_reporter.csv
│ └── dopamine_receptor_all_sequencing.csv
│
├── RipollSanchez2023/
│ ├── NPP_receptor_info.csv
│ ├── NPP_receptor_all_sequencing.csv
│ ├── NPP_release_sequencing.csv
│ ├── neuroanatomy.csv
│ ├── NPP_connectome_short_range_01022024.csv (raw)
│ ├── NPP_connectome_mid_range_01022024.csv (raw)
│ ├── NPP_connectome_long_range_01022024.csv (raw)
│ ├── monoamine_connectome_08062023.csv (raw)
│ ├── GPCR_per_neuron.csv (raw)
│ ├── NPP_per_neuron.csv (raw)
│ ├── NPPpairsbyneuron_*.csv (raw)
│ ├── 30072020_CENGEN_*.csv (raw)
│ ├── group/ (raw)
│ └── individual/ (raw)
│
├── Wang2024/
│ ├── NT_release_reporter.csv
│ ├── NT_release_staining.csv
│ ├── NT_release_reporter_male.csv
│ └── NT_release_staining_male.csv
│
└── DeepResearch20260119/
├── NT_receptor_info.csv (polarity annotations with sources)
├── acetylcholine_receptor_info.csv (detailed mechanism info)
├── gaba_receptor_info.csv
├── glutamate_receptor_info.csv
└── monoamine_receptor_info.csv
Note: Files marked (raw) are original source files kept for reference but not directly used in assets.json.
- Connectome matrices are square DataFrames: rows = source neurons, columns = recipient neurons
- Value
1= known connection;0.5= variable connection;0= evidence of absence;NaN= absence of evidence - Neuron IDs follow WormAtlas naming (e.g.,
ASEL,ASER,DA01)
- Extract: Notebooks in
notebook/convert source formats (RData, XLSX) to CSV - Transform: One-hot encode categorical features, standardize neuron ordering via
standardize_dataframe(df, neuron_order), saved to CSV - Load: Wrap in
NeuronFeaturesorNeuronInteractionclasses for analysis
Connectome[source, target] = Release[source, NT] AND Receptor[target, receptor] AND (optional) Constraint
where (NT, receptor) is a valid pair from pairing_info with confidence >= 1
and
Constraint is set with structural chemical synapses and applicable to classical NT (acetylcholine, glutamate)
- Release filtering: User chooses AND or OR gate across data sources; AND is recommended
- Receptor filtering: User chooses AND or OR gate across data sources; OR is recommended
- Confidence threshold: Applied in code logic (confidence < 1 treated as 0)
- Missing values: Preserved as NaN throughout, all functions robust to NaN
- Output location: Assembled connectomes saved under
connectomes/{username}_assemblyinassets.json
For reproducible assemblies, use TOML configuration files with the CLI runner:
# Validate a config file
python scripts/assemble_connectome.py --validate-only configs/examples/nt_dopamine_example.toml
# Run an assembly
python scripts/assemble_connectome.py configs/examples/nt_dopamine_example.toml
# Run quietly (suppress progress messages)
python scripts/assemble_connectome.py -q configs/examples/nt_dopamine_example.tomlExample TOML config (configs/examples/nt_dopamine_example.toml):
version = "1.0"
[metadata]
name = "dopamine_reporter_constrained"
description = "Dopamine connectome using Wang2024 reporter data"
[assembly]
molecule_type = "neurotransmitter"
molecule = "dopamine"
[assembly.release]
markers = ["release"] # cat-1 vesicular monoamine transporter
sources = ["reporter:Wang2024"]
gate = "or"
[assembly.receptor]
sources = ["reporter:Muralidhara2025"]
gate = "or"
type = "all"
[constraint]
enabled = true
structural_dataset = "Cook2019"
mode = "binary"
[output]
directory = "connectomes/candy_assembly/dopamine"
basename = "dopamine_reporter_Cook2019"
save_per_pair = true
save_count = true
save_binary = trueOutput files:
{basename}_binary.csv- Binary adjacency matrix (1 if any connection exists){basename}_count.csv- Count matrix (number of receptor types per connection){basename}_per_pair/{receptor}.csv- Per-receptor adjacency matrices{basename}_metadata.json- Assembly metadata (config hash, timestamp, stats)
Available example configs:
configs/examples/nt_dopamine_example.toml- Dopamine with Wang2024 reporter + Muralidhara2025 receptorsconfigs/examples/nt_acetylcholine_ionotropic_example.toml- Acetylcholine ionotropic receptors onlyconfigs/examples/npp_flp1_example.toml- FLP-1 neuropeptide
Data source coverage:
| Molecule | Release Sources | Receptor Sources |
|---|---|---|
| Dopamine | reporter:Wang2024, literature:Bentley2016 |
reporter:Muralidhara2025, sequencing:Muralidhara2025 |
| Serotonin | reporter:Wang2024, literature:Bentley2016 |
reporter:Dag2023 |
| Acetylcholine | reporter:Wang2024 |
sequencing:Fenyves2020 (ionotropic), reporter:HobertLab (metabotropic) |
| GABA | reporter:Wang2024 |
sequencing:Fenyves2020 (ionotropic), reporter:HobertLab (all) |
| Glutamate | reporter:Wang2024 |
sequencing:Fenyves2020 (ionotropic) |
| Tyramine | literature:Bentley2016 |
literature:Bentley2016 |
| Octopamine | literature:Bentley2016 |
literature:Bentley2016 |
| Neuropeptides | sequencing:RipollSanchez2023, literature:Bentley2016 |
sequencing:RipollSanchez2023, literature:Bentley2016 |
Note: literature:Bentley2016 release data only covers monoamines (cat-2, dat-1, mod-5, tbh-1, tdc-1, tph-1). For acetylcholine/GABA/glutamate release, use reporter:Wang2024 which includes unc-17 (ACh), unc-25 (GABA), and eat-4 (glutamate).
See docs/molecular_connectome_assembly_with_config.md for full documentation including MCP server plans
Worm Atlas: https://www.wormatlas.org/neurons/Individual%20Neurons/Neuronframeset.html
Witvliet 2021: https://github.com/dwitvliet/nature2021/tree/master
RipollSanchez 2023 (subcellular localization): https://github.com/LidiaRipollSanchez/Neuropeptide-Connectome
Worm Atlas - Altun 2013: https://www.wormatlas.org/NTRmainframe.htm
RipollSanchez...Schaeffer 2023 (fluorescent reporter & scRNAseq): https://github.com/LidiaRipollSanchez/NemaMod/tree/main https://github.com/LidiaRipollSanchez/Neuropeptide-Connectome
Wang...Hobert, 2025: https://pmc.ncbi.nlm.nih.gov/articles/PMC11488851/#s6 https://iiif.elifesciences.org/lax:95402%2Felife-95402-fig3-v1.tif/full/,1500/0/default.jpg
WormAtlas: https://www.wormatlas.org/neurotransmitterstable.htm
Worm Atlas - Altun 2013: https://www.wormatlas.org/NTRmainframe.htm
GABA-A receptors (fluorescent reporter) - Gendrel...Hobert 2016: https://elifesciences.org/articles/17686#tbl4
GABA-B receptors (fluorescent reporter) - Yemini...Hobert 2023: https://pmc.ncbi.nlm.nih.gov/articles/PMC10494711/#SM1
Dopamine receptors (fluorescent reporter & scRNAseq) - Muralidhara & Hardege 2025: https://pmc.ncbi.nlm.nih.gov/articles/PMC12539964/table/T4
Serotonin receptors (fluorescent reporter) - Dag...Flavell 2023: CSV curated by Ugur Dag for Di Kang to make Figure 7
White 1986, Varshley 2011, Cook 2019, Cook 2020, Witvliet 2021 - accessed via OpenWorm C. elegans Connectome Toolbox: https://openworm.org/ConnectomeToolbox/