This module defines the Trieste file format as well as the rules for manipulating Trieste files. Trieste files are intended to contain archival data. Once written, they are not intended to be modified---only read.
Trieste files can store N-dimensional arrays, tables, and collections of arrays or tables. Every array, table, and collection has a name and contains a metadata dictionary (a map data structure) that can be augmented by the file's author to provide documentation regarding the content of each object stored in the file, as well as instructions on using the data in the file. The goal is to provide the person reading the file with as much documentation and forensic information as possible about the file's contents and the process by which the file was created.
The current prototype of Trieste is a Python-specific format consisting of a specially-formatted, compressed NumPy .npz file in which every data object (NumPy array) has a corresponding metadata dictionary. Some of the metadata attributes are automatically added. Others, such as documentation strings (comments / READMEs) and object names, are strongly encouraged, but can be left empty. The automatically-added metadata are intended for software version compatibility checks and record-keeping / traceability. For example, the following are automatically added to the metadata:
- The versions of NumPy, Python, and the Trieste module that were used to generate the file.
- The OS, platform, and CPU architecture with which the file was created.
- The file's creation time / date.
- The username under which the file was created.
- The hostname of the system on which the file was created.
- The path to the active directory on the host machine when the file was created.
- If the file is generated within an IPython session or Jupyter notebook session, the command history of the session is also stored, as a string.
The Trieste module contains 2 primary stand-alone functions:
-
load(): for loading a Trieste file from the file system. -
save(): for saving a Trieste file to the file system.
There are 4 classes:
-
Array: for storing N-dimensional arrays. -
Table: for storing Arrays with labeled columns of potentially different data types (analogous with a spreadsheet, ASCII table, or database table). -
Collection: for storing multipleArrayorTableobjects. Collections are only allowed to store a single type of object. For example, a collection can store multiple 2-DArrays (like layers of an image) or multipleTables, but not aTableand anArray. Furthermore, the names of the objects in aCollectionmust be unique. Among other benefits, this allowsCollectionobjects to be indexed using the name of the object, so the syntaxcollection['red']can be used to access the object in the collection whose name isred.Collections are also ordered containers, which means that they can be indexed by position, with an integer subscript, as incollection[3].Collections are iterable, so that the syntaxfor object in collection:can be used to iterate through the contents. -
File: for interfacing with a file, after the file has been loaded.There are two types of files, in general:
- files containing one object
- files containing multiple objects
When loading a file containing only one object, the
loadfunction constructs the object itself (i.e., an instance ofArray,Table, orCollection). When a file containing multiple objects is loaded, aFileinstance is created. Just likeCollectionobjects,Fileobjects are iterable and can be indexed by object name or position.