Skip to content

audit: VTK XML format compliance gaps and missing features #52

@szaghi

Description

@szaghi

Overview

A systematic audit of VTKFortran against the VTK XML file format specification reveals a number of missing features, spec violations, and areas for improvement. This issue collects them all in one place, ordered by implementation effort.


1. Dataset type coverage

Type Extension Serial write Parallel write Serial read Parallel read
ImageData .vti / .pvti ❌ missing ❌ missing ❌ missing ❌ missing
RectilinearGrid .vtr / .pvtr ⚠️ ASCII-only ❌ missing ❌ missing
StructuredGrid .vts / .pvts ⚠️ ASCII-only ❌ missing ❌ missing
PolyData .vtp / .pvtp ❌ missing ❌ missing ❌ missing ❌ missing
UnstructuredGrid .vtu / .pvtu ⚠️ ASCII-only ❌ missing ❌ missing
MultiBlock .vtm ⚠️ ASCII-only ❌ missing
Time series .pvd ❌ missing ❌ missing

1.1 ImageData (.vti) — missing

The simplest topology: no explicit geometry at all, only WholeExtent, Origin, Spacing as attributes on the dataset element. No Points or Coordinates needed. Should be the lowest-effort new topology to add.

1.2 PolyData (.vtp) — missing

A heavily-used topology for surface meshes. A Piece carries five count attributes (NumberOfPoints, NumberOfVerts, NumberOfLines, NumberOfStrips, NumberOfPolys) and five geometry sub-elements (Points, Verts, Lines, Strips, Polys). Each sub-element holds a connectivity + offsets DataArray pair. Note: this is also a prerequisite for the VTKHDF roadmap (see #51).


2. Reading — entirely absent

VTKFortran is write-only. No reader exists for any format. The XML structure is fully self-describing, so reading is feasible in principle.

  • No vtk_file reader (serial, all types)
  • No pvtk_file reader (parallel P* types)
  • No vtm_file reader (multi-block)

3. Encoding and format gaps

3.1 Compression — not implemented

The spec defines a compressor attribute on VTKFile and a multi-block compression header:

nbblocks | blocksize | lbsize | cblocksize[0] | ... | cblocksize[n-1] | <compressed blocks>

Three compressors are defined: vtkZLibDataCompressor, vtkLZMADataCompressor, vtkLZ4DataCompressor. None is implemented. For large-scale scientific files, uncompressed binary can be 2–10× larger than zlib-compressed output. ParaView defaults to writing zlib-compressed binary.

3.2 header_type hardcoded to UInt32

The spec's header_type attribute controls the width of every DataArray length prefix in binary and appended modes. VTKFortran always writes a 4-byte (I4P) header, meaning a single DataArray cannot exceed ~4 GB. UInt64 headers (introduced in format version 1.0) are unimplemented.

3.3 Parallel files locked to ASCII

pvtk_file allocates xml_writer_ascii_local unconditionally. P* files are purely XML metadata (no DataArrays, only PDataArray type descriptors), so there is nothing to encode in the parallel header itself. The real impact is that the associated piece files (.vts, .vtu, etc.) end up written in ASCII rather than binary or raw-appended.


4. VTKFile element attribute gaps

Attribute Spec status VTKFortran
type required ✅ written
version required ✅ always "1.0"
byte_order required ⚠️ supported but not always explicitly written
header_type optional (UInt32 default) ❌ not written; hardcoded UInt32 behaviour
compressor optional ❌ not written (no compression)

5. DataArray type gaps — unsigned integers

The spec defines 10 data types. VTKFortran covers the six signed/floating types via PENF. The four unsigned types are absent:

Spec type VTKFortran Notes
UInt8 ❌ missing Used by the spec itself for Cells/types and vtkGhostType
UInt16 ❌ missing Rare in practice
UInt32 ❌ missing Used in some index arrays
UInt64 ❌ missing Used for large index arrays

Spec violation: the types DataArray in UnstructuredGrid/Cells is specified as UInt8. VTKFortran writes it as I1P and emits type="Int8" — this is incorrect and may cause read failures in strict VTK readers.


6. PointData / CellData active-array designation

The spec defines optional attributes to mark which DataArray is active for each role:

<PointData Scalars="pressure" Vectors="velocity" Normals="n" Tensors="stress" TCoords="uv">

VTKFortran opens the tag but never populates these attributes. Readers fall back to picking the first array of appropriate size, which is not always the right one.


7. FieldData limitations

write_fielddata is implemented only for rank-0 scalars of R8P and I8P. The spec allows DataArrays of any type, rank, and NumberOfTuples inside FieldData. Unsupported cases include:

  • Integer cycle counters of I4P
  • Multi-component metadata arrays
  • NumberOfTuples-keyed arrays for time series metadata

8. Parallel file gaps

8.1 GhostLevel attribute missing

All P* dataset elements carry a GhostLevel integer attribute. VTKFortran's pvtk_file does not expose it (absent or implicitly 0).

8.2 PDataArray schema not validated

Parallel header files' PPointData/PCellData sections declare the data schema via PDataArray elements. There is no mechanism to verify that this schema matches what the piece files actually contain.

8.3 PImageData and PPolyData absent

These parallel variants are blocked by their missing serial counterparts.


9. Multi-piece files

The spec allows multiple Piece elements within a single dataset element. VTKFortran's write_piece open/close pattern technically permits this, but there is no test coverage, no documentation, and no API guidance for it.


10. Time series — .pvd not supported

VTK defines a Collection format for timestep grouping:

<VTKFile type="Collection" version="0.1" byte_order="LittleEndian">
  <Collection>
    <DataSet timestep="0.0" part="0" file="sim_0000.vtu"/>
    <DataSet timestep="0.1" part="0" file="sim_0001.vtu"/>
  </Collection>
</VTKFile>

No .pvd writer exists. Users must assemble these files manually.


Priority ranking

Priority Gap Effort
1 ImageData write (.vti) Low
2 PolyData write (.vtp) Medium
3 Fix Cells/types type string to UInt8 Low
4 Compression (zlib first) Medium-high
5 .pvd time series writer Low
6 UInt64 header_type Medium
7 Reading (all types) High
8 PImageData / PPolyData parallel writers Low (needs serial first)
9 pvtk_file binary/raw mode Low
10 FieldData breadth Low
11 Active-array designation on PointData/CellData Low
12 GhostLevel on parallel headers Low
13 header_type attribute written explicitly Trivial

Related

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions