P6

Peter's Parse and Processing of Prenatal Particulars via Pandas

A simple, extensible CLI for downloading the Human Phenotype Ontology, parsing genotype/phenotype Excel workbooks, and producing GA4GH Phenopackets as specified here. This project enables downloading the latest or specified Human Phenotype Ontology (HPO) JSON release, auto-classifying Excel sheets as genotype or phenotype data, normalizing column names and HPO IDs, and writing one Phenopacket per record. Additional commands provide quick auditing of workbooks for header normalization, sheet classification, and required variant columns. Built for easy integration and reproducibility, P6 supports rapid phenotypic data preparation for research and clinical workflows, and runs locally with simple installation via pip. The end usage of this project is to convert an existing digital record of phenotypic data into phenopackets, such that they may be linked to their corresponding VCFs and used to integrate with a larger federated repository system.

Features

Download: fetch the latest or a specific hp.json release from GitHub
Parse: autodetect genotype vs phenotype sheets in any Excel workbook
Normalize: clean up column names, HPO IDs, timestamps, and data types
Generate: emit individual Phenopacket files, one per record (will change the file extension later)

Installation

Clone the repo:

git clone https://github.com/VarenyaJ/P6.git
cd P6

(Recommended) Create a virtual environment (venv or Conda):

=== Simple Venv setup ===

python3 -m venv .venv
source .venv/bin/activate

=== or with Conda ===

conda env create -f requirements/environment.yml -y
conda activate P6

Install via pip:

python3 -m pip install -r requirements/requirements.txt .

Verify the installation:

p6 --help

You should see something like:

Usage: p6 [OPTIONS] COMMAND [ARGS]...

  P6: Peter's Parse and Processing of Prenatal Particulars via Pandas.

Options:
  --help  Show this message and exit.

Commands:
  download    Download a specific or the latest HPO JSON release into...
  parse-excel Read each sheet, check column order, then: - Identify as a...

Quickstart

Download HPO JSON

Fetch the latest release into tests/data/ (the default directory):

p6 download

After running, you’ll have tests/data/hp.json.

Parse Excel to Phenopackets

With your HPO JSON in place at tests/data/hp.json, run:

p6 parse-excel -e tests/data/Sydney_Python_transformation.xlsx

Resulting phenopacket files will be under:

phenopacket_from_excel/$(date "+%Y-%m-%d_%H-%M-%S")/phenopackets/

Audit Excel Workbooks

Quickly check each sheet in an Excel file for header normalization, sheet classification, and presence of required variant columns.

p6 audit-excel -e tests/data/Sydney_Python_transformation.xlsx

By default you get a table; use -r for a JSON output to the console.

p6 audit-excel -e tests/data/Sydney_Python_transformation.xlsx -r

CLI Reference

p6 download

Usage:

p6 download [OPTIONS]

Options:

    -d, --data-path PATH        where to save HPO JSON (default: tests/data)
    -v, --hpo-version TEXT      exact HPO release tag (e.g. 2025-03-03 or v2025-03-03)
    --help                      Show this help message and exit.

Examples:

Fetch a specific release tag (e.g. v2025-03-03 or 2025-03-03) into tests/data/ (the default directory):

p6 download -v 2025-03-03
p6 download --hpo-version 2025-03-03

Fetch a specific release tag (e.g. v2025-03-03 or 2025-03-03) into a custom directory:

p6 download -d src/P6 -v 2025-03-03
p6 download --data-path src/P6 --hpo-version 2025-03-03

p6 parse-excel

Read an Excel workbook, classify sheets, normalize fields, and emit Phenopacket protobuffers.

Usage: p6 parse-excel [OPTIONS] EXCEL_FILE

Options:

    -e, --excel-path FILE       path to the Excel workbook  [required]
    -hpo, --custom-hpo FILE     path to a custom HPO JSON file (defaults to `tests/data/hp.json`)
    --help                      Show this message and exit.

Example:

Explicitly point at a custom HPO file:

p6 parse-excel -e tests/data/Sydney_Python_transformation.xlsx -hpo src/P6/hp.json

p6 audit-excel

Run a lightweight audit on each sheet in an Excel workbook, reporting header counts, sheet classification, and missing variant‐column checks.

Usage: p6 audit-excel [OPTIONS] EXCEL_FILE

Options:

    -e, --excel-path FILE   path to the Excel workbook  [required]
    -r, --report-json       output audit report as JSON instead of table
    --help                  Show this message and exit.

Development & Testing

Install dev requirements:

python3 -m pip install -r requirements/requirements.txt -r requirements/requirements_test.txt .

This will install P6 along with the dependencies needed for the development.

Run the full test suite:

pytest -q

Lint & type-check (via ruff and built-in assertions):

ruff check .
ruff format .

Contributing

Fork the repo & create a feature branch
Make your changes & add tests
Ensure all tests pass & lint is clean
Submit a pull request against main
Please follow the AGPL-3.0 code of conduct.

License

This project is licensed under the AGPL-3.0. See LICENSE for details.

Contact

Varenya Jain [email protected] GitHub: @VarenyaJ

Name		Name	Last commit message	Last commit date
Latest commit History 167 Commits
.github		.github
.idea		.idea
requirements		requirements
src/P6		src/P6
tests		tests
.gitignore		.gitignore
CONTRIBUTORS.md		CONTRIBUTORS.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
ruff.toml		ruff.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

P6

Table of Contents

Features

Installation

=== Simple Venv setup ===

=== or with Conda ===

Quickstart

Download HPO JSON

Parse Excel to Phenopackets

Audit Excel Workbooks

CLI Reference

p6 download

p6 parse-excel

p6 audit-excel

Development & Testing

Contributing

License

Contact

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

VarenyaJ/P6

Folders and files

Latest commit

History

Repository files navigation

P6

Table of Contents

Features

Installation

=== Simple Venv setup ===

=== or with Conda ===

Quickstart

Download HPO JSON

Parse Excel to Phenopackets

Audit Excel Workbooks

CLI Reference

p6 download

p6 parse-excel

p6 audit-excel

Development & Testing

Contributing

License

Contact

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages