Skip to content

Merge v1 Feature Branch to Main #535

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 59 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
59 commits
Select commit Hold shift + click to select a range
8eeddea
update project metadata and deps
tasansal Apr 29, 2025
293a714
update project metadata and deps
tasansal Apr 29, 2025
faacbaa
add schemas
tasansal Apr 29, 2025
1a1e743
Relocate quickstart notebook to tutorials directory
tasansal Apr 29, 2025
3c8ea4d
update docs dependencies
tasansal Apr 29, 2025
53ecaa9
add new docs
tasansal Apr 29, 2025
da09c4b
remove incorrect exclude
tasansal Apr 29, 2025
0d59c95
remove duplicate doc directive
tasansal Apr 29, 2025
793f3f1
fix creation notebook
tasansal Apr 29, 2025
0e64e09
Add basic unit test for v1 dataset schema validation
tasansal Apr 29, 2025
a1a312e
update lockfile
tasansal Apr 30, 2025
dfe7751
fix broken creation nb
tasansal May 7, 2025
f6b2c34
update lockfile
tasansal May 7, 2025
dbec58e
lint v1 files
tasansal May 7, 2025
ecb69d6
update lock file
tasansal May 27, 2025
9dd9fbc
schema_v1-dataset_builder-add_dimension
dmitriyrepin Jun 24, 2025
5816f83
V1 schema review (#553)
BrianMichell Jun 26, 2025
f88531e
Merge remote-tracking branch 'upstream/v1' into v1
dmitriyrepin Jun 26, 2025
1358f95
First take on add_dimension(), add_coordinate(), add_variable()
dmitriyrepin Jun 27, 2025
e5261cb
Finished add_dimension, add_coordinate, add_variable
dmitriyrepin Jun 28, 2025
95c01d8
Work on build
dmitriyrepin Jun 30, 2025
f391e23
Merge branch 'main' into v1
tasansal Jul 1, 2025
46f82f0
Generalize _to_dictionary()
dmitriyrepin Jul 1, 2025
0dc7cc8
build
dmitriyrepin Jul 1, 2025
fe4af2b
[v1] Update dependencies to latest (#567)
tasansal Jul 2, 2025
79863ac
Dataset Build - pass one
dmitriyrepin Jul 2, 2025
ec480f1
Merge the latest TGSAI/mdio-python:v1 branch
dmitriyrepin Jul 2, 2025
4062a77
unpin zarr because breaking bug fixed (#569)
tasansal Jul 7, 2025
fa81ea2
Merge branch 'v1' into v1
tasansal Jul 7, 2025
4b2b163
Revert .container changes
dmitriyrepin Jul 7, 2025
c532c3b
PR review: remove DEVELOPER_NOTES.md
dmitriyrepin Jul 7, 2025
08798cd
PR Review: add_coordinate() should accept only data_type: ScalarType
dmitriyrepin Jul 7, 2025
e8febe4
PR review: add_variable() data_type remove default
dmitriyrepin Jul 7, 2025
0a4be3f
RE review: do not add dimension variable
dmitriyrepin Jul 8, 2025
7b25d6b
PR Review: get api version from the package version
dmitriyrepin Jul 8, 2025
7ca3ed8
PR Review: remove add_dimension_coordinate
dmitriyrepin Jul 9, 2025
4d1ec9c
PR Review: add_coordinate() remove data_type default value
dmitriyrepin Jul 9, 2025
99fcf43
PR Review: improve unit tests by extracting common functionality in v…
dmitriyrepin Jul 9, 2025
0778fdd
Remove the Dockerfile changes. They are not supposed to be a part of …
dmitriyrepin Jul 9, 2025
7e74567
PR Review: run ruff
dmitriyrepin Jul 9, 2025
0aaa5f6
PR Review: fix pre-commit errors
dmitriyrepin Jul 10, 2025
1904dee
remove some noqa overrides
tasansal Jul 10, 2025
90d31a1
Implement MDIO Dataset builder to create in-memory instance of schema…
dmitriyrepin Jul 10, 2025
4c7c833
Writing XArray / Zarr
dmitriyrepin Jul 10, 2025
4b39ffa
gitignore
dmitriyrepin Jul 10, 2025
e772a4f
Merge remote-tracking branch 'upstream/v1' into v1
dmitriyrepin Jul 11, 2025
cea7308
to_zarr() fix compression
dmitriyrepin Jul 11, 2025
850135e
Fix precommit issues
dmitriyrepin Jul 11, 2025
82f1960
Use only make_campos_3d_acceptance_dataset
dmitriyrepin Jul 11, 2025
b5ee31e
PR Review: address the review comments
dmitriyrepin Jul 14, 2025
7b3ba70
Update _get_fill_value for StructuredType
dmitriyrepin Jul 14, 2025
a4ff4a9
Fix fill type issue for the Structured Types
dmitriyrepin Jul 16, 2025
81bfa76
Improve code coverage
dmitriyrepin Jul 16, 2025
0447659
Fix spelling
dmitriyrepin Jul 17, 2025
d08e2c4
Revert "Fix spelling"
dmitriyrepin Jul 17, 2025
657d2cf
extend per-file ignores for PLR2004 and remove noqa overrides in spec…
tasansal Jul 17, 2025
bfab1d7
Refactor tests: clarify Zarr-related test names, fix type hints, and …
tasansal Jul 17, 2025
9a033de
merge main into v1
tasansal Jul 17, 2025
5878e97
MDIO v1 Templates and Template Registry (#573)
dmitriyrepin Jul 22, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ jobs:
matrix:
include:
- { python: "3.13", os: "ubuntu-latest", session: "pre-commit" }
- { python: "3.13", os: "ubuntu-latest", session: "safety" }
# - { python: "3.13", os: "ubuntu-latest", session: "safety" }
# - { python: "3.13", os: "ubuntu-latest", session: "mypy" }
# - { python: "3.12", os: "ubuntu-latest", session: "mypy" }
# - { python: "3.11", os: "ubuntu-latest", session: "mypy" }
Expand Down
10 changes: 10 additions & 0 deletions docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@
"sphinx.ext.napoleon",
"sphinx.ext.intersphinx",
"sphinx.ext.autosummary",
"sphinxcontrib.autodoc_pydantic",
"sphinx.ext.autosectionlabel",
"sphinx_click",
"sphinx_copybutton",
Expand All @@ -38,6 +39,7 @@
intersphinx_mapping = {
"python": ("https://docs.python.org/3", None),
"numpy": ("https://numpy.org/doc/stable/", None),
"pydantic": ("https://docs.pydantic.dev/latest/", None),
"zarr": ("https://zarr.readthedocs.io/en/stable/", None),
}

Expand All @@ -50,6 +52,14 @@
autoclass_content = "class"
autosectionlabel_prefix_document = True

autodoc_pydantic_field_list_validators = False
autodoc_pydantic_field_swap_name_and_alias = True
autodoc_pydantic_field_show_alias = False
autodoc_pydantic_model_show_config_summary = False
autodoc_pydantic_model_show_validator_summary = False
autodoc_pydantic_model_show_validator_members = False
autodoc_pydantic_model_show_field_summary = False

html_theme = "furo"

myst_number_code_blocks = ["python"]
Expand Down
154 changes: 154 additions & 0 deletions docs/data_models/chunk_grids.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,154 @@
```{eval-rst}
:tocdepth: 3
```

```{currentModule} mdio.schemas.chunk_grid

```

# Chunk Grid Models

```{article-info}
:author: Altay Sansal
:date: "{sub-ref}`today`"
:read-time: "{sub-ref}`wordcount-minutes` min read"
:class-container: sd-p-0 sd-outline-muted sd-rounded-3 sd-font-weight-light
```

The variables in MDIO data model can represent different types of chunk grids.
These grids are essential for managing multi-dimensional data arrays efficiently.
In this breakdown, we will explore four distinct data models within the MDIO schema,
each serving a specific purpose in data handling and organization.

MDIO implements data models following the guidelines of the Zarr v3 spec and ZEPs:

- [Zarr core specification (version 3)](https://zarr-specs.readthedocs.io/en/latest/v3/core/v3.0.html)
- [ZEP 1 — Zarr specification version 3](https://zarr.dev/zeps/accepted/ZEP0001.html)
- [ZEP 3 — Variable chunking](https://zarr.dev/zeps/draft/ZEP0003.html)

## Regular Grid

The regular grid models are designed to represent a rectangular and regularly
paced chunk grid.

```{eval-rst}
.. autosummary::
RegularChunkGrid
RegularChunkShape
```

For 1D array with `size = 31`{l=python}, we can divide it into 5 equally sized
chunks. Note that the last chunk will be truncated to match the size of the array.

`{ "name": "regular", "configuration": { "chunkShape": [7] } }`{l=json}

Using the above schema resulting array chunks will look like this:

```bash
←─ 7 ─→ ←─ 7 ─→ ←─ 7 ─→ ←─ 7 ─→ ↔ 3
┌───────┬───────┬───────┬───────┬───┐
└───────┴───────┴───────┴───────┴───┘
```

For 2D array with shape `rows, cols = (7, 17)`{l=python}, we can divide it into 9
equally sized chunks.

`{ "name": "regular", "configuration": { "chunkShape": [3, 7] } }`{l=json}

Using the above schema, the resulting 2D array chunks will look like below.
Note that the rows and columns are conceptual and visually not to scale.

```bash
←─ 7 ─→ ←─ 7 ─→ ↔ 3
┌───────┬───────┬───┐
│ ╎ ╎ │ ↑
│ ╎ ╎ │ 3
│ ╎ ╎ │ ↓
├╶╶╶╶╶╶╶┼╶╶╶╶╶╶╶┼╶╶╶┤
│ ╎ ╎ │ ↑
│ ╎ ╎ │ 3
│ ╎ ╎ │ ↓
├╶╶╶╶╶╶╶┼╶╶╶╶╶╶╶┼╶╶╶┤
│ ╎ ╎ │ ↕ 1
└───────┴───────┴───┘
```

## Rectilinear Grid

The [RectilinearChunkGrid](RectilinearChunkGrid) model extends
the concept of chunk grids to accommodate rectangular and irregularly spaced chunks.
This model is useful in data structures where non-uniform chunk sizes are necessary.
[RectilinearChunkShape](RectilinearChunkShape) specifies the chunk sizes for each
dimension as a list allowing for irregular intervals.

```{eval-rst}
.. autosummary::
RectilinearChunkGrid
RectilinearChunkShape
```

:::{note}
It's important to ensure that the sum of the irregular spacings specified
in the `chunkShape` matches the size of the respective array dimension.
:::

For 1D array with `size = 39`{l=python}, we can divide it into 5 irregular sized
chunks.

`{ "name": "rectilinear", "configuration": { "chunkShape": [[10, 7, 5, 7, 10]] } }`{l=json}

Using the above schema resulting array chunks will look like this:

```bash
←── 10 ──→ ←─ 7 ─→ ← 5 → ←─ 7 ─→ ←── 10 ──→
┌──────────┬───────┬─────┬───────┬──────────┐
└──────────┴───────┴─────┴───────┴──────────┘
```

For 2D array with shape `rows, cols = (7, 25)`{l=python}, we can divide it into 12
rectilinear (rectangular bur irregular) chunks. Note that the rows and columns are
conceptual and visually not to scale.

`{ "name": "rectilinear", "configuration": { "chunkShape": [[3, 1, 3], [10, 5, 7, 3]] } }`{l=json}

```bash
←── 10 ──→ ← 5 → ←─ 7 ─→ ↔ 3
┌──────────┬─────┬───────┬───┐
│ ╎ ╎ ╎ │ ↑
│ ╎ ╎ ╎ │ 3
│ ╎ ╎ ╎ │ ↓
├╶╶╶╶╶╶╶╶╶╶┼╶╶╶╶╶┼╶╶╶╶╶╶╶┼╶╶╶┤
│ ╎ ╎ ╎ │ ↕ 1
├╶╶╶╶╶╶╶╶╶╶┼╶╶╶╶╶┼╶╶╶╶╶╶╶┼╶╶╶┤
│ ╎ ╎ ╎ │ ↑
│ ╎ ╎ ╎ │ 3
│ ╎ ╎ ╎ │ ↓
└──────────┴─────┴───────┴───┘
```

## Model Reference

:::{dropdown} RegularChunkGrid
:animate: fade-in-slide-down

```{eval-rst}
.. autopydantic_model:: RegularChunkGrid

----------

.. autopydantic_model:: RegularChunkShape
```

:::
:::{dropdown} RectilinearChunkGrid
:animate: fade-in-slide-down

```{eval-rst}
.. autopydantic_model:: RectilinearChunkGrid

----------

.. autopydantic_model:: RectilinearChunkShape
```

:::
100 changes: 100 additions & 0 deletions docs/data_models/compressors.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,100 @@
```{eval-rst}
:tocdepth: 3
```

```{currentModule} mdio.schemas.compressors

```

# Compressors

```{article-info}
:author: Altay Sansal
:date: "{sub-ref}`today`"
:read-time: "{sub-ref}`wordcount-minutes` min read"
:class-container: sd-p-0 sd-outline-muted sd-rounded-3 sd-font-weight-light
```

## Dataset Compression

MDIO relies on [numcodecs] for data compression. We provide good defaults based
on opinionated and limited heuristics for each compressor for various energy datasets.
However, using these data models, the compression can be customized.

[Numcodecs] is a project that a convenient interface to different compression
libraries. We selected the [Blosc] and [ZFP] compressors for lossless and lossy
compression of energy data.

## Blosc

A high-performance compressor optimized for binary data, combining fast compression
with a byte-shuffle filter for enhanced efficiency, particularly effective with
numerical arrays in multi-threaded environments.

For more details about compression modes, see [Blosc Documentation].

```{eval-rst}
.. autosummary::
Blosc
```

## ZFP

ZFP is a compression algorithm tailored for floating-point and integer arrays, offering
lossy and lossless compression with customizable precision, well-suited for large
scientific datasets with a focus on balancing data fidelity and compression ratio.

For more details about compression modes, see [ZFP Documentation].

```{eval-rst}
.. autosummary::
ZFP
```

[numcodecs]: https://github.com/zarr-developers/numcodecs
[blosc]: https://github.com/Blosc/c-blosc
[blosc documentation]: https://www.blosc.org/python-blosc/python-blosc.html
[zfp]: https://github.com/LLNL/zfp
[zfp documentation]: https://computing.llnl.gov/projects/zfp

## Model Reference

:::
:::{dropdown} Blosc
:animate: fade-in-slide-down

```{eval-rst}
.. autopydantic_model:: Blosc

----------

.. autoclass:: BloscAlgorithm()
:members:
:undoc-members:
:member-order: bysource

----------

.. autoclass:: BloscShuffle()
:members:
:undoc-members:
:member-order: bysource
```

:::

:::{dropdown} ZFP
:animate: fade-in-slide-down

```{eval-rst}
.. autopydantic_model:: ZFP

----------

.. autoclass:: ZFPMode()
:members:
:undoc-members:
:member-order: bysource
```

:::
Loading
Loading