Data ranges and presence

### Data ranges

In the numerical feature types like `REAL`, we could have some descriptive statistics like min/max/avg/std to increase the expressiveness of the schema. This way, we can
1. Use it for data validation on inference time. For example, a tranformer can perform the task of feature data validation on received data points. When a feature is not within the range defined by min/max values, it can log the error accordingly, for example increase an outlier counter/metric.
2. Use the trained data distribution information to compare it against calculated distributions of inference requests batches. For example using some KL based distance method to increase a skew/drift detection counter/metric.

Similarly to the numerical, store the distribution of the `category_map`.

### Data presence

In all feature types, define an attribute to specify whether a feature is supposed to be mandatory for inference or not. For example if there are no missing values on a particular feature during training time, most probably we'd like to require this feature in the inference request. A transformer performing the data validation task can handle this error and increase an anomaly detection counter/metric.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Data ranges and presence #4

Data ranges

Data presence

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Data ranges and presence #4

Description

Data ranges

Data presence

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions