Skip to content

PPC Error and Residual Plots #361

@behramulukir

Description

@behramulukir

There have been demands (#350, #349) for new functionalities regarding PPC Error plots. Those demands are logical and worth implementing; however, to avoid cluttering the current PPC error documentation page and offering functionalities that people are unlikely to use while missing useful functions, it makes sense to decide on the kind of function/plots we want to have regarding error and residual plots. Here is what we currently have:

PPC Error

Right now, we have PPC error plots which plot predictive errors y - yrep.

  1. ppc_error_hist(): A separate histogram is plotted for the predictive errors computed from y and each dataset (row) in yrep. For this plot yrep should have only a small number of rows.
  2. ppc_error_hist_grouped(): Like ppc_error_hist(), except errors are computed within levels of a grouping variable. The number of histograms is therefore equal to the product of the number of rows in yrep and the number of groups (unique values of group).
  3. ppc_error_scatter(): A separate scatterplot is displayed for y vs. the predictive errors computed from y and each dataset (row) in yrep. For this plot yrep should have only a small number of rows.
  4. ppc_error_scatter_avg(): A single scatterplot of y vs. the average of the errors computed from y and each dataset (row) in yrep. For each individual data point y[n] the average error is the average of the errors for y[n] computed over the the draws from the posterior predictive distribution.
  5. ppc_error_scatter_avg_vs_x(): Same as ppc_error_scatter_avg(), except the average is plotted on the y-axis and a predictor variable x is plotted on the x-axis.
  6. ppc_error_binned(): Intended for use with binomial data. A separate binned error plot (similar to arm::binnedplot()) is generated for each dataset (row) in yrep. For this plot y and yrep should contain proportions rather than counts, and yrep should have only a small number of rows.

Proposed Plots

Here is a small list of functions that people proposed to be implemented:

  1. ppc_residual_scatter(): A single scatterplot of y vs. the errors computed from y and a summary of each dataset (row) in yrep. For each individual data point y[n], the error is computed as the difference between y[n] and the summary of the draws from the posterior predictive distribution. (source)
  2. ppc_error_pava(): The PAVA-residual plot is of the form stat(cep_y - p_pred) where cep_y is a matrix of conditional event probabilities obtained by PAVA transforming y based on the predictive probability samples in p_pred. (source)
  3. ppc_residual_binned(): A residual plot that allows for discrete observation, similar to ppc_error_binned (source)

Future of PPC Error and Residual

Now, we need to decide on which of these existing or proposed plots are useful or what other alternative plots related to error or residual plots are needed. Please mention what functionalities you would like to have and which functionalities that currently exist are not needed!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions