GitHub - AustralianCancerDataNetwork/orm-loader: SQLAlchemy helper tools for managing dataload and model generation

orm-loader

A lightweight, reusable foundation for building and validating SQLAlchemy-based clinical (and non-clinical) data models.

This library provides general-purpose ORM infrastructure that sits below any specific data model (OMOP, PCORnet, custom CDMs, etc.), focusing on:

declarative base configuration
bulk ingestion patterns
file-based validation & loading
table introspection
model-agnostic validation scaffolding
safe, database-portable operational helpers

It intentionally contains no domain logic and no assumptions about a specific schema.

What this library provides:

This library provides a small set of composable building blocks for defining, loading, inspecting, and validating SQLAlchemy-based data models. All components are model-agnostic and can be selectively combined in downstream libraries.

A minimal, opinionated ORM table base

ORMTableBase provides structural introspection utilities for SQLAlchemy-mapped tables, without imposing any domain semantics.

It supports:

mapper access and inspection
primary key discovery
required (non-nullable) column detection
consistent primary key handling across models
simple ID allocation helpers for sequence-less databases

from orm_loader.tables import ORMTableBase

class MyTable(ORMTableBase, Base):
    __tablename__ = "my_table"

This base is intended to be inherited by all ORM tables, either directly or via higher-level mixins.

CSV-based ingestion mixins

CSVLoadableTableInterface adds opt-in CSV loading support for ORM tables using pandas, with a focus on correctness and scalability.

Features include:

chunked loading for large files
optional per-table normalisation logic
optional deduplication against existing database rows
safe bulk inserts using SQLAlchemy sessions

class MyTable(CSVLoadableTableInterface, ORMTableBase, Base):
    __tablename__ = "my_table"

Downstream models may override:

normalise_dataframe(...)
dedupe_dataframe(...)
csv_columns() to implement table-specific ingestion policies.

Structured serialisation and hashing

SerialisableTableInterface adds lightweight, explicit serialisation helpers for ORM rows.

It supports:

conversion to dictionaries
JSON serialisation
stable row-level fingerprints
iterator-style access to field/value pairs

row = session.get(MyTable, 1)
row.to_dict()
row.to_json()
row.fingerprint()

This is useful for:

debugging
auditing
reproducibility checks
downstream APIs or exports

Model registry and validation scaffolding

The library includes model-agnostic validation infrastructure, designed to compare ORM models against external specifications.

This includes:

a model registry
table and field descriptors
validator contracts
a validation runner
structured validation reports Specifications can be loaded from CSV today, with support for other formats (e.g. LinkML) planned.

registry = ModelRegistry(model_version="1.0")
registry.load_table_specs(table_csv, field_csv)
registry.register_models([MyTable])

runner = ValidationRunner(validators=always_on_validators())
report = runner.run(registry)

Validation output is available as:

human-readable text
structured dictionaries
JSON (for CI/CD integration)
exit codes suitable for pipelines

Database bootstrap helpers The library provides lightweight helpers for schema creation and bootstrapping, without imposing a migration strategy.

from orm_loader.metadata import Base
from orm_loader.bootstrap import bootstrap

bootstrap(engine, create=True)

Safe bulk-loading utilities

A reusable context manager simplifies trusted bulk ingestion workflows:

temporarily disables foreign key checks where supported
suppresses autoflush for performance
ensures reliable rollback on failure

Summary

This library intentionally focuses on infrastructure, not semantics.

It provides:

reusable ORM mixins
safe ingestion patterns
validation scaffolding
database-portable utilities

while leaving domain rules, business logic, and schema semantics to downstream libraries.

This makes it suitable as a shared foundation for:

clinical data models
research data marts
registry schemas
synthetic data pipelines

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.github/workflows		.github/workflows
notebooks		notebooks
src/orm_loader		src/orm_loader
.gitignore		.gitignore
.python-version		.python-version
CHANGELOG.md		CHANGELOG.md
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

orm-loader

What this library provides:

Summary

About

Uh oh!

Releases 2

Packages

Uh oh!

Languages

AustralianCancerDataNetwork/orm-loader

Folders and files

Latest commit

History

Repository files navigation

orm-loader

What this library provides:

Summary

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Languages

Packages