This module implements transformation functions for adapting response data to GAMLSS family domains. The transformation system supports two modes:
- Strict mode: Domain-preserving transformations with observation exclusion via masking
- Safe mode: Global affine transformations allowing reversible domain adaptation
All transformations include Jacobian correction to ensure information criterion comparability across families.
Unified transformation dispatcher supporting both strict and safe modes.
y: Numeric vector of response valuesfam: Character scalar specifying GAMLSS family name (e.g.,"PO","GA","BE","NO")mode: Character scalar, either"strict"(default) or"safe"eps: Numeric scalar for epsilon handling in boundary domains (default:1e-6)allow_eps: Logical; whether to apply epsilon adjustments at boundaries (strict mode only)
List with five elements:
y: Numeric vector of transformed response valuesmask: Logical vector indicating valid observationslogJ_per_obs: Numeric vector of per-observation log-Jacobian valuesmeta: List containing transformation metadata (kind,params)mode_used: Character scalar indicating applied transformation mode
Enforces theoretical domain constraints without data modification:
-
Count families (
PO,NBI,ZIP,ZINBI,ZIP2,BI,BB):- Transform: Identity
z = y - Validity:
y ≥ 0andy ∈ ℤ - Jacobian:
log|∂z/∂y| = 0
- Transform: Identity
-
Positive continuous families (
GA,GG,LOGNO,IG):- Transform: Identity
z = y - Validity:
y > 0 - Jacobian:
log|∂z/∂y| = 0
- Transform: Identity
-
Unit interval families (
BE,BEINF,BEO,BEZI,BEo,BEINF0):- Transform: Min-max scaling
z = (y - a)/(b - a)wherea = min(y),b = max(y) - Validity: Depends on inflation variant (
0 < z < 1forBE,0 ≤ z < 1forBEINF, etc.) - Jacobian:
log|∂z/∂y| = -log(b - a)(constant across observations)
- Transform: Min-max scaling
-
Real-valued families (
NO,TF,GU):- Transform: Z-score standardization
z = (y - μ)/σwhereμ = mean(y),σ = sd(y) - Validity: All finite
y - Jacobian:
log|∂z/∂y| = -log(σ)(constant across observations)
- Transform: Z-score standardization
Applies global affine transformations z = ay + b with a > 0:
-
Positive continuous families (
GA,GG,LOGNO,IG):- If
min(y) ≤ 0: Apply shiftb = -min(y) + ε,a = 1 - Otherwise: Identity
a = 1,b = 0 - Result:
z ∈ (ε, +∞) - Jacobian:
log|∂z/∂y| = log(a) = 0
- If
-
Unit interval families (
BE,BEINF,BEO,BEZI,BEo,BEINF0):- Without epsilon:
a = 1/(max(y) - min(y)),b = -min(y) · a - With epsilon (when 0 or 1 excluded):
a = (1 - 2ε)/(max(y) - min(y)),b = ε - min(y) · a - Result:
z ∈ [0, 1]orz ∈ [ε, 1-ε]depending on family - Jacobian:
log|∂z/∂y| = log(a)(constant across observations)
- Without epsilon:
-
Real-valued families (
NO,TF,GU):- Same as strict mode: Z-score standardization
- Jacobian:
log|∂z/∂y| = -log(σ)
-
Count families (
PO,NBI,ZIP,ZINBI,ZIP2,BI,BB):- Transform: Identity
z = y(no rounding applied) - Validity: All finite
y - Jacobian:
log|∂z/∂y| = 0
- Transform: Identity
-
Jacobian correction: For any monotonic transformation
z = g(y), the log-likelihood on the original scale is:log L(θ; y) = log L(θ; z) + Σᵢ log|∂zᵢ/∂yᵢ| -
Affine transforms: For
z = ay + bwitha > 0:- Jacobian:
∂z/∂y = a(constant) - Reversibility:
y = (z - b)/a
- Jacobian:
-
Information criterion adjustment: IC values become comparable across families by adding
-2 · Σᵢ log|∂zᵢ/∂yᵢ|
# Strict mode: excludes invalid observations
y <- c(-1, 0, 1, 2, 3)
res_strict <- transform_response(y, "GA", mode = "strict")
# res_strict$mask: c(FALSE, FALSE, TRUE, TRUE, TRUE)
# res_strict$y[1:2]: excluded (epsilon or NA)
# Safe mode: global shift to positive domain
res_safe <- transform_response(y, "GA", mode = "safe")
# res_safe$mask: c(TRUE, TRUE, TRUE, TRUE, TRUE)
# res_safe$y: c(1e-6, 1+1e-6, 2+1e-6, 3+1e-6, 4+1e-6)
# res_safe$meta$kind: "affine"
# res_safe$meta$params: list(a = 1, b = 1+1e-6)
# Reversibility in safe mode
y_back <- inverse_transform(res_safe$y, res_safe$meta)
# y_back ≈ y (within numerical tolerance)Status: Legacy. Use transform_response() for new code.
Simplified transformation function without Jacobian correction. Applies observation-wise clipping and rounding.
y: Numeric vectorfam: GAMLSS family namestrategy:"safe"(default) or"strict"eps: Epsilon value
strategy = "safe": Clips/rounds values to fit domain (e.g.,y[y ≤ 0] <- epsfor positive families)strategy = "strict": Replaces invalid values withNA
- No Jacobian correction
- Observation-wise operations (non-global)
- Not suitable for family comparison
Status: Internal. Called by transform_response() when mode = "strict".
Direct implementation of strict mode transformations. Returns same structure as transform_response() but without mode_used field.
Reverses transformations applied by transform_response() or transform_for_family_strict().
Given transformed data z and metadata meta, recovers original scale y:
- Identity (
kind = "identity"):y = z - Z-score (
kind = "zscore"):y = μ + σzwhereμ = meta$params$center,σ = meta$params$scale - Min-max (
kind = "minmax"):y = a + z(b - a)wherea = meta$params$min,b = meta$params$max - Affine (
kind = "affine"):y = (z - b)/awherea = meta$params$a,b = meta$params$b
z: Numeric vector on transformed scalemeta: List with elementskind(character) andparams(list)
Numeric vector on original scale
# Z-score inversion
y <- c(-2, -1, 0, 1, 2)
res <- transform_response(y, "NO", mode = "strict")
y_back <- inverse_transform(res$y, res$meta)
all.equal(y, y_back) # TRUE (within tolerance)
# Affine inversion (SAFE mode)
y <- c(-1, 0, 1, 2, 3)
res <- transform_response(y, "GA", mode = "safe")
y_back <- inverse_transform(res$y, res$meta)
all.equal(y, y_back) # TRUE
# Min-max inversion
y <- c(0.1, 0.3, 0.5, 0.7, 0.9)
res <- transform_response(y, "BE", mode = "strict")
y_back <- inverse_transform(res$y, res$meta)
all.equal(y, y_back) # TRUEReturns GAMLSS families partitioned by theoretical support.
List with four named elements:
count: Character vector of count families ("PO","NBI","ZIP","ZINBI","ZIP2","BI","BB")unit: Character vector of unit interval families ("BE","BEINF","BEO","BEZI","BEo","BEINF0")positive: Character vector of positive continuous families ("GA","GG","LOGNO","IG")real: Character vector of real-valued families ("NO","TF","GU")
- Count:
𝕊 = ℤ₊ ∪ {0} = {0, 1, 2, ...} - Unit:
𝕊 = (0, 1)or𝕊 = [0, 1]depending on inflation variant - Positive:
𝕊 = ℝ₊ = (0, +∞) - Real:
𝕊 = ℝ = (-∞, +∞)
groups <- family_groups()
groups$count # "PO" "NBI" "ZIP" "ZINBI" "ZIP2" "BI" "BB"
groups$positive # "GA" "GG" "LOGNO" "IG"Determines empirical support of a numeric vector.
- Remove non-finite values
- Check if all values are integers ≥ 0 →
"count" - Else if all values in
[0, 1]→"unit" - Else if all values > 0 →
"positive" - Otherwise →
"real"
y: Numeric vector
Character scalar: "count", "unit", "positive", "real", or "none" (if no valid data)
infer_support(c(0, 1, 2, 5, 10)) # "count"
infer_support(c(0.1, 0.3, 0.5, 0.9)) # "unit"
infer_support(c(1.2, 3.5, 7.8)) # "positive"
infer_support(c(-2, -1, 0, 1, 2)) # "real"
infer_support(c(NA, NaN, Inf)) # "none"- Threshold for integer detection:
|y - round(y)| < 1e-8 - Empirical support may differ from theoretical support (e.g., data with small positive values might be count data rounded to avoid zeros)
Computes total log-Jacobian contribution over valid observations.
For transformation z = g(y) with Jacobian J(y) = ∂z/∂y:
jacobian_sum = Σᵢ log|J(yᵢ)| = Σᵢ log|∂zᵢ/∂yᵢ|
Used to adjust log-likelihood:
log L(θ; y) = log L(θ; z) + jacobian_sum
logJ_per_obs: Numeric vector of per-observation log-Jacobians (fromtransform_response())mask: Optional logical vector indicating which observations to include (default: all finite values)
Numeric scalar
y <- c(1, 2, 3, 4, 5)
res <- transform_response(y, "GA", mode = "strict")
# Sum over all observations
jacobian_sum(res$logJ_per_obs)
# Sum over masked observations only
mask <- c(TRUE, TRUE, TRUE, FALSE, FALSE)
jacobian_sum(res$logJ_per_obs, mask)- Data preparation: Ensure
ycontains numeric finite values - Transform: Use
transform_response(y, fam, mode = "strict")for each candidate family - Common mask: Compute
mask_common = Reduce('&', list_of_masks) - Subset data:
z_masked = z[mask_common]for each family - Fit models: Fit GAMLSS on
z_maskedwith design matrix - Compute IC: Adjust with
-2 * jacobian_sum(logJ_per_obs, mask_common) - Select best: Choose family minimizing adjusted IC
- Transform: Use
transform_response(y, fam, mode = "safe")to adapt data to family domain - Fit model: Fit GAMLSS on transformed data
- Predictions: Generate predictions on transformed scale
- Back-transform: Use
inverse_transform(predictions, meta)to obtain original scale
- For direct plotting: Use
transform_response()withmode = "safe"or legacytransform_for_family() - For diagnostic plots: Use transformed scale directly (e.g., Q-Q plots on z-score scale)
- For presentation: Back-transform to original scale using
inverse_transform()
When comparing models fitted to transformed data, information criteria (AIC, BIC, GAIC) must account for the transformation Jacobian to be valid on the original data scale. Without correction, IC values are incomparable across families using different transformations.
Proof: For transformation z = g(y) with density f_Z(z; θ), the density on original scale is:
f_Y(y; θ) = f_Z(g(y); θ) · |J(y)|
Therefore:
log f_Y(y; θ) = log f_Z(g(y); θ) + log|J(y)|
Summing over observations:
log L(θ; y) = log L(θ; z) + Σᵢ log|J(yᵢ)|
For z = ay + b with a > 0:
- Linearity preservation: Differences are scaled by
a - Monotonicity: Order preservation (strictly increasing)
- Bijectivity: One-to-one mapping
- Constant Jacobian:
∂z/∂y = afor ally
Using a common mask across all candidate families ensures:
- Fair comparison: All models fitted to same observations
- Consistent sample size: IC penalties comparable
- Valid inference: Standard errors account for actual data used
Alternative approach (family-specific masks) would confound model fit quality with data availability, biasing selection toward families with less restrictive domains.
| Family | Full Name | Domain | Parameters | Inflation |
|---|---|---|---|---|
| PO | Poisson | ℤ₊ ∪ {0} | μ | No |
| NBI | Negative Binomial (Type I) | ℤ₊ ∪ {0} | μ, σ | No |
| ZIP | Zero-Inflated Poisson | ℤ₊ ∪ {0} | μ, σ | At 0 |
| ZINBI | Zero-Inflated NBI | ℤ₊ ∪ {0} | μ, σ, ν | At 0 |
| ZIP2 | Zero-Inflated Poisson (alt) | ℤ₊ ∪ {0} | μ, σ | At 0 |
| BI | Binomial | {0, 1, ..., n} | μ, bd | No |
| BB | Beta-Binomial | {0, 1, ..., n} | μ, σ, bd | No |
| Family | Full Name | Domain | Inflation |
|---|---|---|---|
| BE | Beta | (0, 1) | No |
| BEINF | Beta Inflated | [0, 1] | At 0 and 1 |
| BEO | Beta Inflated at One | (0, 1] | At 1 |
| BEZI | Beta Inflated at Zero | [0, 1) | At 0 |
| BEo | One-Inflated Beta (alt) | (0, 1] | At 1 |
| BEINF0 | Zero-One Inflated Beta (alt) | [0, 1] | At 0 and 1 |
| Family | Full Name | Domain | Skewness |
|---|---|---|---|
| GA | Gamma | (0, +∞) | Right |
| GG | Generalized Gamma | (0, +∞) | Flexible |
| LOGNO | Log-Normal | (0, +∞) | Right |
| IG | Inverse Gaussian | (0, +∞) | Right |
| Family | Full Name | Domain | Tails |
|---|---|---|---|
| NO | Normal (Gaussian) | ℝ | Light |
| TF | t Family | ℝ | Heavy |
| GU | Gumbel | ℝ | Asymmetric |