This library aims to support the transition of serial patient signals (i.e. data points for a patient over time) to a data set with observations over time for a patient. This data set in turn can be used for further statistical and machine learning purposes.
The theory and background of this library is described in this blog.
The current data definition set that is being used for the UMCU PICU can be found here.
Once data from an electronic patient record is extracted to Signals, these can be processed by this software library. A Signal has the following attributes:
- Id
- Name
- Time Stamp
- Patient Id
- Value
- Validated
The id of the parameter.
The name of the parameter, for example lab_creatine or mon_heartrate.
The date time or period in which the signal occurs. The timestamp can be empty, meaning that the signal is time independent, for example the gender of a patient.
The patient id of the patient
The value of the signal. A value can be:
- Text
- Numeric
- DateTime
- Or no value
Examples for signals can be found at this google spreadsheet (at the Signals sheet).
The list of signals is processed according to a data definition. The definitions constist of the following structure:
type Collapse = Signal list -> Value
type Convert = Signal -> Signal
type Filter = Signal list -> Signal list
type SourceId = string option
type Observation =
{
Name : string
Type : string
Length : int option
Filters : Filter list
Collapse : Collapse
Sources : Source list
}
and Source =
{
Id : SourceId
Name : string
Conversions : Convert list
}Observations will be turned into columns, and an Observation is derived from a Source list. The Id or Name of a Source in turn is matched to the Id or Name of a Signal. This allows processing of different sources to a single observation. For example urine output can be calculated by adding sources for catheter urine output to spontaneous urine output.
Also, more complex calculated observations are possible by combining different sources into one calculation.
Besides sources the following actions can be defined:
- Conversion: convert a signal value to a different value
- Filter: filters a list of sources for an observation
- Collapse: aggregates the list of signals to one observed value
The output is a data set with the following structure:
type DataSet =
{ Columns : Column list
Data : (PatientId * RowTime * DataRow) list }
and Column =
{
Name: string
Type: string
}
and DataRow = Value list
and RowTime = | Exact of DateTime | Relative of int This equates to a flat regular table with for each patient date time a row of observations. For observations with no values at a specific date time there will be null values.
The following features are implemented or will be implemented:
- Defining observations in an online google spreadsheet
- Ability to set a time resolution so data can be aggregated over a time period
- Anonymizing a data set with encrypted patient id's and relative date time points
- Writing a resulting
DataSetto acsvfile - Filter out columns without any values
- Allow parameterized collapse, filter and convert function definitions
- Writing a resulting
DataSetto a database - Use automated unit conversion (i.e. mg and gram can be added or kPascal can be converted to mmHg)
- Add handling missing values and interpolation
- Use paket for dependencies management