Title: | Fit Distributions and Neural Networks to Censored and Truncated Data |
---|---|
Description: | Define distribution families and fit them to interval-censored and interval-truncated data, where the truncation bounds may depend on the individual observation. The defined distributions feature density, probability, sampling and fitting methods as well as efficient implementations of the log-density log f(x) and log-probability log P(x0 <= X <= x1) for use in 'TensorFlow' neural networks via the 'tensorflow' package. Allows training parametric neural networks on interval-censored and interval-truncated data with flexible parameterization. Applications include Claims Development in Non-Life Insurance, e.g. modelling reporting delay distributions from incomplete data, see Bücher, Rosenstock (2022) <doi:10.1007/s13385-022-00314-4>. |
Authors: | Alexander Rosenstock [aut, cre, cph] |
Maintainer: | Alexander Rosenstock <[email protected]> |
License: | GPL |
Version: | 0.0.3.9000 |
Built: | 2025-01-21 04:59:16 UTC |
Source: | https://github.com/ashesitr/reservr |
Convert TensorFlow tensors to distribution parameters recursively
as_params(x)
as_params(x)
x |
possibly nested list structure of |
A nested list of vectors suitable as distribution parameters
if (interactive()) { tf_params <- list( probs = k_matrix(t(c(0.5, 0.3, 0.2))), shapes = k_matrix(t(c(1L, 2L, 3L)), dtype = "int32"), scale = keras3::as_tensor(1.0, keras3::config_floatx()) ) params <- as_params(tf_params) dist <- dist_erlangmix(vector("list", 3L)) dist$sample(10L, with_params = params) }
if (interactive()) { tf_params <- list( probs = k_matrix(t(c(0.5, 0.3, 0.2))), shapes = k_matrix(t(c(1L, 2L, 3L)), dtype = "int32"), scale = keras3::as_tensor(1.0, keras3::config_floatx()) ) params <- as_params(tf_params) dist <- dist_erlangmix(vector("list", 3L)) dist$sample(10L, with_params = params) }
Transition functions for blended distributions
blended_transition(x, u, eps, .gradient = FALSE, .extend_na = FALSE) blended_transition_inv(x, u, eps, .component)
blended_transition(x, u, eps, .gradient = FALSE, .extend_na = FALSE) blended_transition_inv(x, u, eps, .component)
x |
Points to evaluate at |
u |
Sorted vector of blending thresholds, or rowwise sorted matrix of blending thresholds |
eps |
Corresponding vector or matrix of blending bandwidths.
Must be positive and the same dimensions as |
.gradient |
Also evaluate the gradient with respect to |
.extend_na |
Extend out-of range transitions by the last in-range value (i.e. the corresponding u) or by NA? |
.component |
Component index (up to |
blended_transition
returns a matrix with length(x)
rows and length(u) + 1
columns containing the
transformed values for each of the blending components.
If .gradient
is TRUE, an attribute "gradient"
is attached with the same dimensions, containing the derivative
of the respective transition component with respect to x
.
blended_transition_inv
returns a vector with length(x)
values containing the inverse of the transformed
values for the .component
th blending component.
library(ggplot2) xx <- seq(from = 0, to = 20, length.out = 101) blend_mat <- blended_transition(xx, u = 10, eps = 3, .gradient = TRUE) ggplot( data.frame( x = rep(xx, 2L), fun = rep(c("p", "q"), each = length(xx)), y = as.numeric(blend_mat), relevant = c(xx <= 13, xx >= 7) ), aes(x = x, y = y, color = fun, linetype = relevant) ) %+% geom_line() %+% theme_bw() %+% theme( legend.position = "bottom", legend.box = "horizontal" ) %+% guides(color = guide_legend(direction = "horizontal", title = ""), linetype = guide_none()) %+% scale_linetype_manual(values = c("TRUE" = 1, "FALSE" = 3)) ggplot( data.frame( x = rep(xx, 2L), fun = rep(c("p'", "q'"), each = length(xx)), y = as.numeric(attr(blend_mat, "gradient")), relevant = c(xx <= 13, xx >= 7) ), aes(x = x, y = y, color = fun, linetype = relevant) ) %+% geom_line() %+% theme_bw() %+% theme( legend.position = "bottom", legend.box = "horizontal" ) %+% guides(color = guide_legend(direction = "horizontal", title = ""), linetype = guide_none()) %+% scale_linetype_manual(values = c("TRUE" = 1, "FALSE" = 3))
library(ggplot2) xx <- seq(from = 0, to = 20, length.out = 101) blend_mat <- blended_transition(xx, u = 10, eps = 3, .gradient = TRUE) ggplot( data.frame( x = rep(xx, 2L), fun = rep(c("p", "q"), each = length(xx)), y = as.numeric(blend_mat), relevant = c(xx <= 13, xx >= 7) ), aes(x = x, y = y, color = fun, linetype = relevant) ) %+% geom_line() %+% theme_bw() %+% theme( legend.position = "bottom", legend.box = "horizontal" ) %+% guides(color = guide_legend(direction = "horizontal", title = ""), linetype = guide_none()) %+% scale_linetype_manual(values = c("TRUE" = 1, "FALSE" = 3)) ggplot( data.frame( x = rep(xx, 2L), fun = rep(c("p'", "q'"), each = length(xx)), y = as.numeric(attr(blend_mat, "gradient")), relevant = c(xx <= 13, xx >= 7) ), aes(x = x, y = y, color = fun, linetype = relevant) ) %+% geom_line() %+% theme_bw() %+% theme( legend.position = "bottom", legend.box = "horizontal" ) %+% guides(color = guide_legend(direction = "horizontal", title = ""), linetype = guide_none()) %+% scale_linetype_manual(values = c("TRUE" = 1, "FALSE" = 3))
Provides a keras callback similar to keras3::callback_reduce_lr_on_plateau()
but which also restores the weights
to the best seen so far whenever a learning rate reduction occurs, and with slightly more restrictive improvement
detection.
callback_adaptive_lr( monitor = "val_loss", factor = 0.1, patience = 10L, verbose = 0L, mode = c("auto", "min", "max"), delta_abs = 1e-04, delta_rel = 0, cooldown = 0L, min_lr = 0, restore_weights = TRUE )
callback_adaptive_lr( monitor = "val_loss", factor = 0.1, patience = 10L, verbose = 0L, mode = c("auto", "min", "max"), delta_abs = 1e-04, delta_rel = 0, cooldown = 0L, min_lr = 0, restore_weights = TRUE )
monitor |
quantity to be monitored. |
factor |
factor by which the learning rate will be reduced. |
patience |
number of epochs with no significant improvement after which the learning rate will be reduced. |
verbose |
integer. Set to 1 to receive update messages. |
mode |
Optimisation mode. "auto" detects the mode from the name of |
delta_abs |
Minimum absolute metric improvement per epoch. The learning rate will be reduced if the average
improvement is less than |
delta_rel |
Minimum relative metric improvement per epoch. The learning rate will be reduced if the average
improvement is less than |
cooldown |
number of epochs to wait before resuming normal operation after learning rate has been reduced.
The minimum number of epochs between two learning rate reductions is |
min_lr |
lower bound for the learning rate. If a learning rate reduction would lower the learning rate below
|
restore_weights |
Bool. If TRUE, the best weights will be restored at each learning rate reduction. This is very useful if the metric oscillates. |
Note that while keras3::callback_reduce_lr_on_plateau()
automatically logs the learning rate as a metric 'lr',
this is currently impossible from R.
Thus, if you want to also log the learning rate, you should add keras3::callback_reduce_lr_on_plateau()
with a high
min_lr
to effectively disable the callback but still monitor the learning rate.
A KerasCallback
suitable for passing to keras3::fit()
.
dist <- dist_exponential() group <- sample(c(0, 1), size = 100, replace = TRUE) x <- dist$sample(100, with_params = list(rate = group + 1)) global_fit <- fit(dist, x) if (interactive()) { library(keras3) l_in <- layer_input(shape = 1L) mod <- tf_compile_model( inputs = list(l_in), intermediate_output = l_in, dist = dist, optimizer = optimizer_adam(), censoring = FALSE, truncation = FALSE ) tf_initialise_model(mod, global_fit$params) fit_history <- fit( mod, x = as_tensor(group, config_floatx()), y = as_trunc_obs(x), epochs = 20L, callbacks = list( callback_adaptive_lr("loss", factor = 0.5, patience = 2L, verbose = 1L, min_lr = 1.0e-4), callback_reduce_lr_on_plateau("loss", min_lr = 1.0) # to track lr ) ) plot(fit_history) predicted_means <- predict(mod, data = as_tensor(c(0, 1), config_floatx())) }
dist <- dist_exponential() group <- sample(c(0, 1), size = 100, replace = TRUE) x <- dist$sample(100, with_params = list(rate = group + 1)) global_fit <- fit(dist, x) if (interactive()) { library(keras3) l_in <- layer_input(shape = 1L) mod <- tf_compile_model( inputs = list(l_in), intermediate_output = l_in, dist = dist, optimizer = optimizer_adam(), censoring = FALSE, truncation = FALSE ) tf_initialise_model(mod, global_fit$params) fit_history <- fit( mod, x = as_tensor(group, config_floatx()), y = as_trunc_obs(x), epochs = 20L, callbacks = list( callback_adaptive_lr("loss", factor = 0.5, patience = 2L, verbose = 1L, min_lr = 1.0e-4), callback_reduce_lr_on_plateau("loss", min_lr = 1.0) # to track lr ) ) plot(fit_history) predicted_means <- predict(mod, data = as_tensor(c(0, 1), config_floatx())) }
Provides a keras callback to monitor the individual components of the censored and truncated likelihood. Useful for debugging TensorFlow implementations of Distributions.
callback_debug_dist_gradients( object, data, obs, keep_grads = FALSE, stop_on_na = TRUE, verbose = TRUE )
callback_debug_dist_gradients( object, data, obs, keep_grads = FALSE, stop_on_na = TRUE, verbose = TRUE )
object |
A |
data |
Input data for the model. |
obs |
Observations associated to |
keep_grads |
Log actual gradients? (memory hungry!) |
stop_on_na |
Stop if any likelihood component as NaN in its gradients? |
verbose |
Print a message if training is halted? The Message will contain information about which likelihood components have NaN in their gradients. |
A KerasCallback
suitable for passing to keras3::fit()
.
dist <- dist_exponential() group <- sample(c(0, 1), size = 100, replace = TRUE) x <- dist$sample(100, with_params = list(rate = group + 1)) global_fit <- fit(dist, x) if (interactive()) { library(keras3) l_in <- layer_input(shape = 1L) mod <- tf_compile_model( inputs = list(l_in), intermediate_output = l_in, dist = dist, optimizer = optimizer_adam(), censoring = FALSE, truncation = FALSE ) tf_initialise_model(mod, global_fit$params) gradient_tracker <- callback_debug_dist_gradients( mod, as_tensor(group, config_floatx()), x, keep_grads = TRUE ) fit_history <- fit( mod, x = as_tensor(group, config_floatx()), y = x, epochs = 20L, callbacks = list( callback_adaptive_lr("loss", factor = 0.5, patience = 2L, verbose = 1L, min_lr = 1.0e-4), gradient_tracker, callback_reduce_lr_on_plateau("loss", min_lr = 1.0) # to track lr ) ) gradient_tracker$gradient_logs[[20]]$dens plot(fit_history) predicted_means <- predict(mod, data = as_tensor(c(0, 1), config_floatx())) }
dist <- dist_exponential() group <- sample(c(0, 1), size = 100, replace = TRUE) x <- dist$sample(100, with_params = list(rate = group + 1)) global_fit <- fit(dist, x) if (interactive()) { library(keras3) l_in <- layer_input(shape = 1L) mod <- tf_compile_model( inputs = list(l_in), intermediate_output = l_in, dist = dist, optimizer = optimizer_adam(), censoring = FALSE, truncation = FALSE ) tf_initialise_model(mod, global_fit$params) gradient_tracker <- callback_debug_dist_gradients( mod, as_tensor(group, config_floatx()), x, keep_grads = TRUE ) fit_history <- fit( mod, x = as_tensor(group, config_floatx()), y = x, epochs = 20L, callbacks = list( callback_adaptive_lr("loss", factor = 0.5, patience = 2L, verbose = 1L, min_lr = 1.0e-4), gradient_tracker, callback_reduce_lr_on_plateau("loss", min_lr = 1.0) # to track lr ) ) gradient_tracker$gradient_logs[[20]]$dens plot(fit_history) predicted_means <- predict(mod, data = as_tensor(c(0, 1), config_floatx())) }
Constructs a BDEGP-Family distribution with fixed number of components and blending interval.
dist_bdegp(n, m, u, epsilon)
dist_bdegp(n, m, u, epsilon)
n |
Number of dirac components, starting with a point mass at 0. |
m |
Number of erlang components, translated by |
u |
Blending cut-off, must be a positive real. |
epsilon |
Blending radius, must be a positive real less than |
A MixtureDistribution
of
n
DiracDistribution
s at 0 .. n - 1 and
a BlendedDistribution
object with child Distributions
a TranslatedDistribution
with offset n - 0.5
of an ErlangMixtureDistribution
with m
shapes
and a GeneralizedParetoDistribution
with shape parameter restricted to [0, 1] and location parameter fixed
at u
With break u
and bandwidth epsilon
.
Other Distributions:
Distribution
,
dist_beta()
,
dist_binomial()
,
dist_blended()
,
dist_dirac()
,
dist_discrete()
,
dist_empirical()
,
dist_erlangmix()
,
dist_exponential()
,
dist_gamma()
,
dist_genpareto()
,
dist_lognormal()
,
dist_mixture()
,
dist_negbinomial()
,
dist_normal()
,
dist_pareto()
,
dist_poisson()
,
dist_translate()
,
dist_trunc()
,
dist_uniform()
,
dist_weibull()
dist <- dist_bdegp(n = 1, m = 2, u = 10, epsilon = 3) params <- list( dists = list( list(), list( dists = list( list( dist = list( shapes = list(1L, 2L), scale = 1.0, probs = list(0.7, 0.3) ) ), list( sigmau = 1.0, xi = 0.1 ) ), probs = list(0.1, 0.9) ) ), probs = list(0.95, 0.05) ) x <- dist$sample(100, with_params = params) plot_distributions( theoretical = dist, empirical = dist_empirical(x), .x = seq(0, 20, length.out = 101), with_params = list(theoretical = params) )
dist <- dist_bdegp(n = 1, m = 2, u = 10, epsilon = 3) params <- list( dists = list( list(), list( dists = list( list( dist = list( shapes = list(1L, 2L), scale = 1.0, probs = list(0.7, 0.3) ) ), list( sigmau = 1.0, xi = 0.1 ) ), probs = list(0.1, 0.9) ) ), probs = list(0.95, 0.05) ) x <- dist$sample(100, with_params = params) plot_distributions( theoretical = dist, empirical = dist_empirical(x), .x = seq(0, 20, length.out = 101), with_params = list(theoretical = params) )
See stats::Beta
dist_beta(shape1 = NULL, shape2 = NULL, ncp = NULL)
dist_beta(shape1 = NULL, shape2 = NULL, ncp = NULL)
shape1 |
First scalar shape parameter, or |
shape2 |
Second scalar shape parameter, or |
ncp |
Scalar non-centrality parameter, or |
All parameters can be overridden with
with_params = list(shape = ..., scale = ...)
.
A BetaDistribution
object.
Other Distributions:
Distribution
,
dist_bdegp()
,
dist_binomial()
,
dist_blended()
,
dist_dirac()
,
dist_discrete()
,
dist_empirical()
,
dist_erlangmix()
,
dist_exponential()
,
dist_gamma()
,
dist_genpareto()
,
dist_lognormal()
,
dist_mixture()
,
dist_negbinomial()
,
dist_normal()
,
dist_pareto()
,
dist_poisson()
,
dist_translate()
,
dist_trunc()
,
dist_uniform()
,
dist_weibull()
d_beta <- dist_beta(shape1 = 2, shape2 = 2, ncp = 0) x <- d_beta$sample(100) d_emp <- dist_empirical(x) plot_distributions( empirical = d_emp, theoretical = d_beta, estimated = d_beta, with_params = list( estimated = inflate_params( fitdistrplus::fitdist(x, distr = "beta")$estimate ) ), .x = seq(0, 2, length.out = 100) )
d_beta <- dist_beta(shape1 = 2, shape2 = 2, ncp = 0) x <- d_beta$sample(100) d_emp <- dist_empirical(x) plot_distributions( empirical = d_emp, theoretical = d_beta, estimated = d_beta, with_params = list( estimated = inflate_params( fitdistrplus::fitdist(x, distr = "beta")$estimate ) ), .x = seq(0, 2, length.out = 100) )
See stats::Binomial
dist_binomial(size = NULL, prob = NULL)
dist_binomial(size = NULL, prob = NULL)
size |
Number of trials parameter (integer), or |
prob |
Success probability parameter, or |
Both parameters can be overridden with
with_params = list(size = ..., prob = ...)
.
A BinomialDistribution
object.
Other Distributions:
Distribution
,
dist_bdegp()
,
dist_beta()
,
dist_blended()
,
dist_dirac()
,
dist_discrete()
,
dist_empirical()
,
dist_erlangmix()
,
dist_exponential()
,
dist_gamma()
,
dist_genpareto()
,
dist_lognormal()
,
dist_mixture()
,
dist_negbinomial()
,
dist_normal()
,
dist_pareto()
,
dist_poisson()
,
dist_translate()
,
dist_trunc()
,
dist_uniform()
,
dist_weibull()
d_binom <- dist_binomial(size = 10, prob = 0.5) x <- d_binom$sample(100) d_emp <- dist_empirical(x) plot_distributions( empirical = d_emp, theoretical = d_binom, estimated = d_binom, with_params = list( estimated = list( size = max(x), prob = mean(x) / max(x) ) ), .x = 0:max(x) )
d_binom <- dist_binomial(size = 10, prob = 0.5) x <- d_binom$sample(100) d_emp <- dist_empirical(x) plot_distributions( empirical = d_emp, theoretical = d_binom, estimated = d_binom, with_params = list( estimated = list( size = max(x), prob = mean(x) / max(x) ) ), .x = 0:max(x) )
Blended distribution
dist_blended(dists, probs = NULL, breaks = NULL, bandwidths = NULL)
dist_blended(dists, probs = NULL, breaks = NULL, bandwidths = NULL)
dists |
A list of k >= 2 component Distributions. |
probs |
k Mixture weight parameters |
breaks |
k - 1 Centers of the blending zones.
|
bandwidths |
k - 1 Radii of the blending zones.
The i-th blending zone will begin at |
A BlendedDistribution
object.
Other Distributions:
Distribution
,
dist_bdegp()
,
dist_beta()
,
dist_binomial()
,
dist_dirac()
,
dist_discrete()
,
dist_empirical()
,
dist_erlangmix()
,
dist_exponential()
,
dist_gamma()
,
dist_genpareto()
,
dist_lognormal()
,
dist_mixture()
,
dist_negbinomial()
,
dist_normal()
,
dist_pareto()
,
dist_poisson()
,
dist_translate()
,
dist_trunc()
,
dist_uniform()
,
dist_weibull()
bd <- dist_blended( list( dist_normal(mean = 0.0, sd = 1.0), dist_genpareto(u = 3.0, sigmau = 1.0, xi = 3.0) ), breaks = list(3.0), bandwidths = list(0.5), probs = list(0.9, 0.1) ) plot_distributions( bd, .x = seq(-3, 10, length.out = 100), plots = c("d", "p") )
bd <- dist_blended( list( dist_normal(mean = 0.0, sd = 1.0), dist_genpareto(u = 3.0, sigmau = 1.0, xi = 3.0) ), breaks = list(3.0), bandwidths = list(0.5), probs = list(0.9, 0.1) ) plot_distributions( bd, .x = seq(-3, 10, length.out = 100), plots = c("d", "p") )
A degenerate distribution with all mass at a single point.
dist_dirac(point = NULL)
dist_dirac(point = NULL)
point |
The point with probability mass 1. |
The parameter can be overridden with
with_params = list(point = ...)
.
A DiracDistribution
object.
Other Distributions:
Distribution
,
dist_bdegp()
,
dist_beta()
,
dist_binomial()
,
dist_blended()
,
dist_discrete()
,
dist_empirical()
,
dist_erlangmix()
,
dist_exponential()
,
dist_gamma()
,
dist_genpareto()
,
dist_lognormal()
,
dist_mixture()
,
dist_negbinomial()
,
dist_normal()
,
dist_pareto()
,
dist_poisson()
,
dist_translate()
,
dist_trunc()
,
dist_uniform()
,
dist_weibull()
d_dirac <- dist_dirac(1.5) d_dirac$sample(2L) d_dirac$sample(2L, list(point = 42.0))
d_dirac <- dist_dirac(1.5) d_dirac$sample(2L) d_dirac$sample(2L, list(point = 42.0))
A full-flexibility discrete distribution with values from 1 to size
.
dist_discrete(size = NULL, probs = NULL)
dist_discrete(size = NULL, probs = NULL)
size |
Number of classes parameter (integer). Required if |
probs |
Vector of probabilties parameter, or |
Parameters can be overridden with
with_params = list(probs = ...)
.
A DiscreteDistribution
object.
Other Distributions:
Distribution
,
dist_bdegp()
,
dist_beta()
,
dist_binomial()
,
dist_blended()
,
dist_dirac()
,
dist_empirical()
,
dist_erlangmix()
,
dist_exponential()
,
dist_gamma()
,
dist_genpareto()
,
dist_lognormal()
,
dist_mixture()
,
dist_negbinomial()
,
dist_normal()
,
dist_pareto()
,
dist_poisson()
,
dist_translate()
,
dist_trunc()
,
dist_uniform()
,
dist_weibull()
d_discrete <- dist_discrete(probs = list(0.5, 0.25, 0.15, 0.1)) x <- d_discrete$sample(100) d_emp <- dist_empirical(x) plot_distributions( empirical = d_emp, theoretical = d_discrete, estimated = d_discrete, with_params = list( estimated = list( size = max(x), probs = as.list(unname(table(x)) / 100) ) ), .x = 0:max(x) )
d_discrete <- dist_discrete(probs = list(0.5, 0.25, 0.15, 0.1)) x <- d_discrete$sample(100) d_emp <- dist_empirical(x) plot_distributions( empirical = d_emp, theoretical = d_discrete, estimated = d_discrete, with_params = list( estimated = list( size = max(x), probs = as.list(unname(table(x)) / 100) ) ), .x = 0:max(x) )
Creates an empirical distribution object from a sample.
Assumes iid. samples. with_params
should not be used with this
distribution because estimation of the relevant indicators happens during
construction.
dist_empirical(sample, positive = FALSE, bw = "nrd0")
dist_empirical(sample, positive = FALSE, bw = "nrd0")
sample |
Sample to build the empirical distribution from |
positive |
Is the underlying distribution known to be positive?
This will effect the density estimation procedure.
|
bw |
Bandwidth parameter for density estimation. Passed to the density
estimation function selected by |
sample()
samples iid. from sample
. This approach is similar to
bootstrapping.
density()
evaluates a kernel density estimate, approximating with zero
outside of the known support. This estimate is either obtained using
stats::density or logKDE::logdensity_fft, depending on positive
.
probability()
evaluates the empirical cumulative density function
obtained by stats::ecdf.
quantile()
evaluates the empirical quantiles using stats::quantile
hazard()
estimates the hazard rate using the density estimate and the
empirical cumulative density function: h(t) = df(t) / (1 - cdf(t))
.
An EmpiricalDistribution
object.
Other Distributions:
Distribution
,
dist_bdegp()
,
dist_beta()
,
dist_binomial()
,
dist_blended()
,
dist_dirac()
,
dist_discrete()
,
dist_erlangmix()
,
dist_exponential()
,
dist_gamma()
,
dist_genpareto()
,
dist_lognormal()
,
dist_mixture()
,
dist_negbinomial()
,
dist_normal()
,
dist_pareto()
,
dist_poisson()
,
dist_translate()
,
dist_trunc()
,
dist_uniform()
,
dist_weibull()
x <- rexp(20, rate = 1) dx <- dist_empirical(sample = x, positive = TRUE) y <- rnorm(20) dy <- dist_empirical(sample = y) plot_distributions( exponential = dx, normal = dy, .x = seq(-3, 3, length.out = 100) )
x <- rexp(20, rate = 1) dx <- dist_empirical(sample = x, positive = TRUE) y <- rnorm(20) dy <- dist_empirical(sample = y) plot_distributions( exponential = dx, normal = dy, .x = seq(-3, 3, length.out = 100) )
Erlang Mixture distribution
dist_erlangmix(shapes, scale = NULL, probs = NULL)
dist_erlangmix(shapes, scale = NULL, probs = NULL)
shapes |
Shape parameters, a trunc_erlangmix fit, or |
scale |
Common scale parameter, or |
probs |
Mixing probabilities, or |
An ErlangMixtureDistribution
object.
Other Distributions:
Distribution
,
dist_bdegp()
,
dist_beta()
,
dist_binomial()
,
dist_blended()
,
dist_dirac()
,
dist_discrete()
,
dist_empirical()
,
dist_exponential()
,
dist_gamma()
,
dist_genpareto()
,
dist_lognormal()
,
dist_mixture()
,
dist_negbinomial()
,
dist_normal()
,
dist_pareto()
,
dist_poisson()
,
dist_translate()
,
dist_trunc()
,
dist_uniform()
,
dist_weibull()
params <- list(scale = 1.0, probs = list(0.5, 0.3, 0.2), shapes = list(1L, 2L, 3L)) dist <- dist_erlangmix(vector("list", 3L)) x <- dist$sample(20, with_params = params) d_emp <- dist_empirical(x, positive = TRUE) plot_distributions( empirical = d_emp, theoretical = dist, with_params = list( theoretical = params ), .x = seq(1e-4, 5, length.out = 100) )
params <- list(scale = 1.0, probs = list(0.5, 0.3, 0.2), shapes = list(1L, 2L, 3L)) dist <- dist_erlangmix(vector("list", 3L)) x <- dist$sample(20, with_params = params) d_emp <- dist_empirical(x, positive = TRUE) plot_distributions( empirical = d_emp, theoretical = dist, with_params = list( theoretical = params ), .x = seq(1e-4, 5, length.out = 100) )
See stats::Exponential.
dist_exponential(rate = NULL)
dist_exponential(rate = NULL)
rate |
Scalar rate parameter, or |
The parameter can be overridden with with_params = list(rate = ...)
.
An ExponentialDistribution
object.
Other Distributions:
Distribution
,
dist_bdegp()
,
dist_beta()
,
dist_binomial()
,
dist_blended()
,
dist_dirac()
,
dist_discrete()
,
dist_empirical()
,
dist_erlangmix()
,
dist_gamma()
,
dist_genpareto()
,
dist_lognormal()
,
dist_mixture()
,
dist_negbinomial()
,
dist_normal()
,
dist_pareto()
,
dist_poisson()
,
dist_translate()
,
dist_trunc()
,
dist_uniform()
,
dist_weibull()
rate <- 1 d_exp <- dist_exponential() x <- d_exp$sample(20, with_params = list(rate = rate)) d_emp <- dist_empirical(x, positive = TRUE) plot_distributions( empirical = d_emp, theoretical = d_exp, estimated = d_exp, with_params = list( theoretical = list(rate = rate), estimated = list(rate = 1 / mean(x)) ), .x = seq(1e-4, 5, length.out = 100) )
rate <- 1 d_exp <- dist_exponential() x <- d_exp$sample(20, with_params = list(rate = rate)) d_emp <- dist_empirical(x, positive = TRUE) plot_distributions( empirical = d_emp, theoretical = d_exp, estimated = d_exp, with_params = list( theoretical = list(rate = rate), estimated = list(rate = 1 / mean(x)) ), .x = seq(1e-4, 5, length.out = 100) )
See stats::GammaDist.
dist_gamma(shape = NULL, rate = NULL)
dist_gamma(shape = NULL, rate = NULL)
shape |
Scalar shape parameter, or |
rate |
Scalar rate parameter, or |
Both parameters can be overridden with
with_params = list(shape = ..., rate = ...)
.
A GammaDistribution
object.
Other Distributions:
Distribution
,
dist_bdegp()
,
dist_beta()
,
dist_binomial()
,
dist_blended()
,
dist_dirac()
,
dist_discrete()
,
dist_empirical()
,
dist_erlangmix()
,
dist_exponential()
,
dist_genpareto()
,
dist_lognormal()
,
dist_mixture()
,
dist_negbinomial()
,
dist_normal()
,
dist_pareto()
,
dist_poisson()
,
dist_translate()
,
dist_trunc()
,
dist_uniform()
,
dist_weibull()
alpha <- 2 beta <- 2 d_gamma <- dist_gamma(shape = alpha, rate = beta) x <- d_gamma$sample(100) d_emp <- dist_empirical(x, positive = TRUE) plot_distributions( empirical = d_emp, theoretical = d_gamma, estimated = d_gamma, with_params = list( estimated = inflate_params( fitdistrplus::fitdist(x, distr = "gamma")$estimate ) ), .x = seq(1e-3, max(x), length.out = 100) )
alpha <- 2 beta <- 2 d_gamma <- dist_gamma(shape = alpha, rate = beta) x <- d_gamma$sample(100) d_emp <- dist_empirical(x, positive = TRUE) plot_distributions( empirical = d_emp, theoretical = d_gamma, estimated = d_gamma, with_params = list( estimated = inflate_params( fitdistrplus::fitdist(x, distr = "gamma")$estimate ) ), .x = seq(1e-3, max(x), length.out = 100) )
See evmix::gpd
dist_genpareto(u = NULL, sigmau = NULL, xi = NULL) dist_genpareto1(u = NULL, sigmau = NULL, xi = NULL)
dist_genpareto(u = NULL, sigmau = NULL, xi = NULL) dist_genpareto1(u = NULL, sigmau = NULL, xi = NULL)
u |
Scalar location parameter, or |
sigmau |
Scalar scale parameter, or |
xi |
Scalar shape parameter, or |
All parameters can be overridden with
with_params = list(u = ..., sigmau = ..., xi = ...)
.
dist_genpareto1
is equivalent to dist_genpareto
but enforces
bound constraints on xi
to [0, 1]
.
This ensures unboundedness and finite expected value unless xi == 1.0
.
A GeneralizedParetoDistribution
object.
Other Distributions:
Distribution
,
dist_bdegp()
,
dist_beta()
,
dist_binomial()
,
dist_blended()
,
dist_dirac()
,
dist_discrete()
,
dist_empirical()
,
dist_erlangmix()
,
dist_exponential()
,
dist_gamma()
,
dist_lognormal()
,
dist_mixture()
,
dist_negbinomial()
,
dist_normal()
,
dist_pareto()
,
dist_poisson()
,
dist_translate()
,
dist_trunc()
,
dist_uniform()
,
dist_weibull()
d_genpareto <- dist_genpareto(u = 0, sigmau = 1, xi = 1) x <- d_genpareto$sample(100) d_emp <- dist_empirical(x) d_genpareto$export_functions("gpd") # so fitdistrplus finds it plot_distributions( empirical = d_emp, theoretical = d_genpareto, estimated = d_genpareto, with_params = list( estimated = fit(dist_genpareto(), x)$params ), .x = seq(0, 5, length.out = 100) )
d_genpareto <- dist_genpareto(u = 0, sigmau = 1, xi = 1) x <- d_genpareto$sample(100) d_emp <- dist_empirical(x) d_genpareto$export_functions("gpd") # so fitdistrplus finds it plot_distributions( empirical = d_emp, theoretical = d_genpareto, estimated = d_genpareto, with_params = list( estimated = fit(dist_genpareto(), x)$params ), .x = seq(0, 5, length.out = 100) )
See stats::Lognormal.
dist_lognormal(meanlog = NULL, sdlog = NULL)
dist_lognormal(meanlog = NULL, sdlog = NULL)
meanlog |
Scalar mean parameter on the log scale,
or |
sdlog |
Scalar standard deviation parameter on the log scale,
or |
Both parameters can be overridden with
with_params = list(meanlog = ..., sdlog = ...)
.
A LognormalDistribution
object.
Other Distributions:
Distribution
,
dist_bdegp()
,
dist_beta()
,
dist_binomial()
,
dist_blended()
,
dist_dirac()
,
dist_discrete()
,
dist_empirical()
,
dist_erlangmix()
,
dist_exponential()
,
dist_gamma()
,
dist_genpareto()
,
dist_mixture()
,
dist_negbinomial()
,
dist_normal()
,
dist_pareto()
,
dist_poisson()
,
dist_translate()
,
dist_trunc()
,
dist_uniform()
,
dist_weibull()
mu <- 0 sigma <- 1 d_lnorm <- dist_lognormal(meanlog = mu, sdlog = sigma) x <- d_lnorm$sample(20) d_emp <- dist_empirical(x, positive = TRUE) plot_distributions( empirical = d_emp, theoretical = d_lnorm, estimated = d_lnorm, with_params = list( estimated = inflate_params( fitdistrplus::fitdist(x, distr = "lnorm")$estimate ) ), .x = seq(1e-3, 5, length.out = 100) )
mu <- 0 sigma <- 1 d_lnorm <- dist_lognormal(meanlog = mu, sdlog = sigma) x <- d_lnorm$sample(20) d_emp <- dist_empirical(x, positive = TRUE) plot_distributions( empirical = d_emp, theoretical = d_lnorm, estimated = d_lnorm, with_params = list( estimated = inflate_params( fitdistrplus::fitdist(x, distr = "lnorm")$estimate ) ), .x = seq(1e-3, 5, length.out = 100) )
Parameters of mixing components can be overridden with
with_params = list(dists = list(..., ..., ...))
.
#' Mixing probabilites can be overridden with
with_params = list(probs = list(..., ..., ...))
.
The number of components cannot be overridden.
dist_mixture(dists = list(), probs = NULL)
dist_mixture(dists = list(), probs = NULL)
dists |
A list of mixing distributions. May contain placeholders and duplicates. |
probs |
A list of mixing probabilities with the same length as |
Does not support the quantile()
capability!
A MixtureDistribution
object.
Other Distributions:
Distribution
,
dist_bdegp()
,
dist_beta()
,
dist_binomial()
,
dist_blended()
,
dist_dirac()
,
dist_discrete()
,
dist_empirical()
,
dist_erlangmix()
,
dist_exponential()
,
dist_gamma()
,
dist_genpareto()
,
dist_lognormal()
,
dist_negbinomial()
,
dist_normal()
,
dist_pareto()
,
dist_poisson()
,
dist_translate()
,
dist_trunc()
,
dist_uniform()
,
dist_weibull()
# A complicated way to define a uniform distribution on \[0, 2\] dist_mixture( dists = list( dist_uniform(min = 0, max = 1), dist_uniform(min = 1, max = 2) ), probs = list(0.5, 0.5) )
# A complicated way to define a uniform distribution on \[0, 2\] dist_mixture( dists = list( dist_uniform(min = 0, max = 1), dist_uniform(min = 1, max = 2) ), probs = list(0.5, 0.5) )
dist_negbinomial(size = NULL, mu = NULL)
dist_negbinomial(size = NULL, mu = NULL)
size |
Number of successful trials parameter, or |
mu |
Mean parameter, or |
Both parameters can be overridden with
with_params = list(size = ..., prob = ...)
.
A NegativeBinomialDistribution
object.
Other Distributions:
Distribution
,
dist_bdegp()
,
dist_beta()
,
dist_binomial()
,
dist_blended()
,
dist_dirac()
,
dist_discrete()
,
dist_empirical()
,
dist_erlangmix()
,
dist_exponential()
,
dist_gamma()
,
dist_genpareto()
,
dist_lognormal()
,
dist_mixture()
,
dist_normal()
,
dist_pareto()
,
dist_poisson()
,
dist_translate()
,
dist_trunc()
,
dist_uniform()
,
dist_weibull()
d_nbinom <- dist_negbinomial(size = 3.5, mu = 8.75) x <- d_nbinom$sample(100) d_emp <- dist_empirical(x) plot_distributions( empirical = d_emp, theoretical = d_nbinom, estimated = d_nbinom, with_params = list( estimated = inflate_params( fitdistrplus::fitdist(x, distr = "nbinom")$estimate ) ), .x = 0:max(x) )
d_nbinom <- dist_negbinomial(size = 3.5, mu = 8.75) x <- d_nbinom$sample(100) d_emp <- dist_empirical(x) plot_distributions( empirical = d_emp, theoretical = d_nbinom, estimated = d_nbinom, with_params = list( estimated = inflate_params( fitdistrplus::fitdist(x, distr = "nbinom")$estimate ) ), .x = 0:max(x) )
See stats::Normal.
dist_normal(mean = NULL, sd = NULL)
dist_normal(mean = NULL, sd = NULL)
mean |
Scalar mean parameter, or |
sd |
Scalar standard deviation parameter, or |
Both parameters can be overridden with
with_params = list(mean = ..., sd = ...)
.
A NormalDistribution
object.
Other Distributions:
Distribution
,
dist_bdegp()
,
dist_beta()
,
dist_binomial()
,
dist_blended()
,
dist_dirac()
,
dist_discrete()
,
dist_empirical()
,
dist_erlangmix()
,
dist_exponential()
,
dist_gamma()
,
dist_genpareto()
,
dist_lognormal()
,
dist_mixture()
,
dist_negbinomial()
,
dist_pareto()
,
dist_poisson()
,
dist_translate()
,
dist_trunc()
,
dist_uniform()
,
dist_weibull()
mu <- 0 sigma <- 1 d_norm <- dist_normal(mean = mu, sd = sigma) x <- d_norm$sample(20) d_emp <- dist_empirical(x) plot_distributions( empirical = d_emp, theoretical = d_norm, estimated = d_norm, with_params = list( estimated = list(mean = mean(x), sd = sd(x)) ), .x = seq(-3, 3, length.out = 100) )
mu <- 0 sigma <- 1 d_norm <- dist_normal(mean = mu, sd = sigma) x <- d_norm$sample(20) d_emp <- dist_empirical(x) plot_distributions( empirical = d_emp, theoretical = d_norm, estimated = d_norm, with_params = list( estimated = list(mean = mean(x), sd = sd(x)) ), .x = seq(-3, 3, length.out = 100) )
See Pareto
dist_pareto(shape = NULL, scale = NULL)
dist_pareto(shape = NULL, scale = NULL)
shape |
Scalar shape parameter, or |
scale |
Scalar scale parameter, or |
Both parameters can be overridden with
with_params = list(shape = ..., scale = ...)
.
A ParetoDistribution
object.
Other Distributions:
Distribution
,
dist_bdegp()
,
dist_beta()
,
dist_binomial()
,
dist_blended()
,
dist_dirac()
,
dist_discrete()
,
dist_empirical()
,
dist_erlangmix()
,
dist_exponential()
,
dist_gamma()
,
dist_genpareto()
,
dist_lognormal()
,
dist_mixture()
,
dist_negbinomial()
,
dist_normal()
,
dist_poisson()
,
dist_translate()
,
dist_trunc()
,
dist_uniform()
,
dist_weibull()
d_pareto <- dist_pareto(shape = 3, scale = 1) x <- d_pareto$sample(100) d_emp <- dist_empirical(x) plot_distributions( empirical = d_emp, theoretical = d_pareto, estimated = d_pareto, with_params = list( estimated = inflate_params( fitdistrplus::fitdist(x, distr = "pareto")$estimate ) ), .x = seq(0, 2, length.out = 100) )
d_pareto <- dist_pareto(shape = 3, scale = 1) x <- d_pareto$sample(100) d_emp <- dist_empirical(x) plot_distributions( empirical = d_emp, theoretical = d_pareto, estimated = d_pareto, with_params = list( estimated = inflate_params( fitdistrplus::fitdist(x, distr = "pareto")$estimate ) ), .x = seq(0, 2, length.out = 100) )
See stats::Poisson
dist_poisson(lambda = NULL)
dist_poisson(lambda = NULL)
lambda |
Scalar rate parameter, or |
The parameter can be overridden with
with_params = list(lambda = ...)
.
A PoissonDistribution
object.
Other Distributions:
Distribution
,
dist_bdegp()
,
dist_beta()
,
dist_binomial()
,
dist_blended()
,
dist_dirac()
,
dist_discrete()
,
dist_empirical()
,
dist_erlangmix()
,
dist_exponential()
,
dist_gamma()
,
dist_genpareto()
,
dist_lognormal()
,
dist_mixture()
,
dist_negbinomial()
,
dist_normal()
,
dist_pareto()
,
dist_translate()
,
dist_trunc()
,
dist_uniform()
,
dist_weibull()
d_pois <- dist_poisson(lambda = 5.0) x <- d_pois$sample(100) d_emp <- dist_empirical(x) plot_distributions( empirical = d_emp, theoretical = d_pois, estimated = d_pois, with_params = list( estimated = inflate_params( fitdistrplus::fitdist(x, distr = "pois")$estimate ) ), .x = 0:max(x) )
d_pois <- dist_poisson(lambda = 5.0) x <- d_pois$sample(100) d_emp <- dist_empirical(x) plot_distributions( empirical = d_emp, theoretical = d_pois, estimated = d_pois, with_params = list( estimated = inflate_params( fitdistrplus::fitdist(x, distr = "pois")$estimate ) ), .x = 0:max(x) )
Tranlsated distribution
dist_translate(dist = NULL, offset = NULL, multiplier = 1)
dist_translate(dist = NULL, offset = NULL, multiplier = 1)
dist |
An underlying distribution, or |
offset |
Offset to be added to each observation, or |
multiplier |
Factor to multiply each observation by, or |
A TranslatedDistribution
object.
Other Distributions:
Distribution
,
dist_bdegp()
,
dist_beta()
,
dist_binomial()
,
dist_blended()
,
dist_dirac()
,
dist_discrete()
,
dist_empirical()
,
dist_erlangmix()
,
dist_exponential()
,
dist_gamma()
,
dist_genpareto()
,
dist_lognormal()
,
dist_mixture()
,
dist_negbinomial()
,
dist_normal()
,
dist_pareto()
,
dist_poisson()
,
dist_trunc()
,
dist_uniform()
,
dist_weibull()
d_norm <- dist_normal(mean = 0, sd = 1) d_tnorm <- dist_translate(dist = d_norm, offset = 1) plot_distributions(d_norm, d_tnorm, .x = seq(-2, 3, length.out = 100))
d_norm <- dist_normal(mean = 0, sd = 1) d_tnorm <- dist_translate(dist = d_norm, offset = 1) plot_distributions(d_norm, d_tnorm, .x = seq(-2, 3, length.out = 100))
Truncated distribution
dist_trunc(dist = NULL, min = NULL, max = NULL, offset = 0, max_retry = 100)
dist_trunc(dist = NULL, min = NULL, max = NULL, offset = 0, max_retry = 100)
dist |
An underlying distribution, or |
min |
Minimum value to truncate at (exclusive),
or |
max |
Maxmimum value to truncate at (inclusive),
or |
offset |
Offset to be added to each observation after truncation,
or |
max_retry |
Maximum number of resample attempts when trying to sample with rejection. |
A TruncatedDistribution
object.
Other Distributions:
Distribution
,
dist_bdegp()
,
dist_beta()
,
dist_binomial()
,
dist_blended()
,
dist_dirac()
,
dist_discrete()
,
dist_empirical()
,
dist_erlangmix()
,
dist_exponential()
,
dist_gamma()
,
dist_genpareto()
,
dist_lognormal()
,
dist_mixture()
,
dist_negbinomial()
,
dist_normal()
,
dist_pareto()
,
dist_poisson()
,
dist_translate()
,
dist_uniform()
,
dist_weibull()
d_norm <- dist_normal(mean = 0, sd = 1) d_tnorm <- dist_trunc(dist = d_norm, min = -2, max = 2, offset = 1) plot_distributions(d_norm, d_tnorm, .x = seq(-2, 3, length.out = 100))
d_norm <- dist_normal(mean = 0, sd = 1) d_tnorm <- dist_trunc(dist = d_norm, min = -2, max = 2, offset = 1) plot_distributions(d_norm, d_tnorm, .x = seq(-2, 3, length.out = 100))
See stats::Uniform
dist_uniform(min = NULL, max = NULL)
dist_uniform(min = NULL, max = NULL)
min |
Lower limit, or |
max |
Upper limit, or |
Both parameters can be overridden with
with_params = list(min = ..., max = ...)
.
A UniformDistribution
object.
Other Distributions:
Distribution
,
dist_bdegp()
,
dist_beta()
,
dist_binomial()
,
dist_blended()
,
dist_dirac()
,
dist_discrete()
,
dist_empirical()
,
dist_erlangmix()
,
dist_exponential()
,
dist_gamma()
,
dist_genpareto()
,
dist_lognormal()
,
dist_mixture()
,
dist_negbinomial()
,
dist_normal()
,
dist_pareto()
,
dist_poisson()
,
dist_translate()
,
dist_trunc()
,
dist_weibull()
d_unif <- dist_uniform(min = 0, max = 1) x <- d_unif$sample(100) d_emp <- dist_empirical(x) plot_distributions( empirical = d_emp, theoretical = d_unif, estimated = d_unif, with_params = list( estimated = inflate_params( fitdistrplus::fitdist(x, distr = "unif")$estimate ) ), .x = seq(0, 1, length.out = 100) )
d_unif <- dist_uniform(min = 0, max = 1) x <- d_unif$sample(100) d_emp <- dist_empirical(x) plot_distributions( empirical = d_emp, theoretical = d_unif, estimated = d_unif, with_params = list( estimated = inflate_params( fitdistrplus::fitdist(x, distr = "unif")$estimate ) ), .x = seq(0, 1, length.out = 100) )
See stats::Weibull
dist_weibull(shape = NULL, scale = NULL)
dist_weibull(shape = NULL, scale = NULL)
shape |
Scalar shape parameter, or |
scale |
Scalar scale parameter, or |
Both parameters can be overridden with
with_params = list(shape = ..., scale = ...)
.
A WeibullDistribution
object.
Other Distributions:
Distribution
,
dist_bdegp()
,
dist_beta()
,
dist_binomial()
,
dist_blended()
,
dist_dirac()
,
dist_discrete()
,
dist_empirical()
,
dist_erlangmix()
,
dist_exponential()
,
dist_gamma()
,
dist_genpareto()
,
dist_lognormal()
,
dist_mixture()
,
dist_negbinomial()
,
dist_normal()
,
dist_pareto()
,
dist_poisson()
,
dist_translate()
,
dist_trunc()
,
dist_uniform()
d_weibull <- dist_weibull(shape = 3, scale = 1) x <- d_weibull$sample(100) d_emp <- dist_empirical(x) plot_distributions( empirical = d_emp, theoretical = d_weibull, estimated = d_weibull, with_params = list( estimated = inflate_params( fitdistrplus::fitdist(x, distr = "weibull")$estimate ) ), .x = seq(0, 2, length.out = 100) )
d_weibull <- dist_weibull(shape = 3, scale = 1) x <- d_weibull$sample(100) d_emp <- dist_empirical(x) plot_distributions( empirical = d_emp, theoretical = d_weibull, estimated = d_weibull, with_params = list( estimated = inflate_params( fitdistrplus::fitdist(x, distr = "weibull")$estimate ) ), .x = seq(0, 2, length.out = 100) )
Represents a modifiable Distribution family
default_params
Get or set (non-recursive) default parameters of a Distribution
param_bounds
Get or set (non-recursive) parameter bounds (box constraints) of a Distribution
new()
Distribution$new(type, caps, params, name, default_params)
type
Type of distribution. This is a string constant for the
default implementation. Distributions with non-constant type must
override the get_type()
function.
caps
Character vector of capabilities to fuel the default
implementations of has_capability()
and require_capability()
.
Distributions with dynamic capabilities must override the
has_capability()
function.
params
Initial parameter bounds structure, backing the
param_bounds
active binding (usually a list of intervals).
name
Name of the Distribution class. Should be CamelCase
and end
with "Distribution"
.
default_params
Initial fixed parameters backing the
default_params
active binding (usually a list of numeric / NULLs).
Construct a Distribution instance
Used internally by the dist_*
functions.
sample()
Distribution$sample(n, with_params = list())
n
number of samples to draw.
with_params
Distribution parameters to use.
Each parameter value can also be a numeric vector of length n
. In that
case the i
-th sample will use the i
-th parameters.
Sample from a Distribution
A length n
vector of i.i.d. random samples from the
Distribution with the specified parameters.
dist_exponential(rate = 2.0)$sample(10)
density()
Distribution$density(x, log = FALSE, with_params = list())
x
Vector of points to evaluate the density at.
log
Flag. If TRUE
, return the log-density instead.
with_params
Distribution parameters to use.
Each parameter value can also be a numeric vector of length length(x)
.
In that case, the i
-th density point will use the i
-th parameters.
Density of a Distribution
A numeric vector of (log-)densities
dist_exponential()$density(c(1.0, 2.0), with_params = list(rate = 2.0))
tf_logdensity()
Distribution$tf_logdensity()
Compile a TensorFlow function for log-density evaluation
A tf_function
taking arguments x
and args
returning the
log-density of the Distribution evaluated at x
with parameters args
.
probability()
Distribution$probability( q, lower.tail = TRUE, log.p = FALSE, with_params = list() )
q
Vector of points to evaluate the probability function at.
lower.tail
If TRUE
, return P(X <= q). Otherwise return P(X > q).
log.p
If TRUE
, probabilities are returned as log(p)
.
with_params
Distribution parameters to use.
Each parameter value can also be a numeric vector of length length(q)
.
In that case, the i
-th probability point will use the i
-th
parameters.
Cumulative probability of a Distribution
A numeric vector of (log-)probabilities
dist_exponential()$probability( c(1.0, 2.0), with_params = list(rate = 2.0) )
tf_logprobability()
Distribution$tf_logprobability()
Compile a TensorFlow function for log-probability evaluation
A tf_function
taking arguments qmin
, qmax
and args
returning the log-probability of the Distribution evaluated over the
closed interval [qmin
, qmax
] with parameters args
.
quantile()
Distribution$quantile( p, lower.tail = TRUE, log.p = FALSE, with_params = list() )
p
Vector of probabilities.
lower.tail
If TRUE
, return P(X <= q). Otherwise return P(X > q).
log.p
If TRUE
, probabilities are returned as log(p)
.
with_params
Distribution parameters to use.
Each parameter value can also be a numeric vector of length length(p)
.
In that case, the i
-th quantile will use the i
-th parameters.
Quantile function of a Distribution
A numeric vector of quantiles
dist_exponential()$quantile(c(0.1, 0.5), with_params = list(rate = 2.0))
hazard()
Distribution$hazard(x, log = FALSE, with_params = list())
x
Vector of points.
log
Flag. If TRUE
, return the log-hazard instead.
with_params
Distribution parameters to use.
Each parameter value can also be a numeric vector of length length(x)
.
In that case, the i
-th hazard point will use the i
-th parameters.
Hazard function of a Distribution
A numeric vector of (log-)hazards
dist_exponential(rate = 2.0)$hazard(c(1.0, 2.0))
diff_density()
Distribution$diff_density(x, log = FALSE, with_params = list())
x
Vector of points.
log
Flag. If TRUE
, return the gradient of the log-density
instead.
with_params
Distribution parameters to use.
Each parameter value can also be a numeric vector of length length(x)
.
In that case, the i
-th density point will use the i
-th parameters.
Gradients of the density of a Distribution
A list structure containing the (log-)density gradients of all
free parameters of the Distribution evaluated at x
.
dist_exponential()$diff_density( c(1.0, 2.0), with_params = list(rate = 2.0) )
diff_probability()
Distribution$diff_probability( q, lower.tail = TRUE, log.p = FALSE, with_params = list() )
q
Vector of points to evaluate the probability function at.
lower.tail
If TRUE
, return P(X <= q). Otherwise return P(X > q).
log.p
If TRUE
, probabilities are returned as log(p)
.
with_params
Distribution parameters to use.
Each parameter value can also be a numeric vector of length length(q)
.
In that case, the i
-th probability point will use the i
-th
parameters.
Gradients of the cumulative probability of a Distribution
A list structure containing the cumulative (log-)probability
gradients of all free parameters of the Distribution evaluated at q
.
dist_exponential()$diff_probability( c(1.0, 2.0), with_params = list(rate = 2.0) )
is_in_support()
Distribution$is_in_support(x, with_params = list())
x
Vector of points
with_params
Distribution parameters to use.
Each parameter value can also be a numeric vector of length length(x)
.
In that case, the i
-th point will use the i
-th parameters.
Determine if a value is in the support of a Distribution
A logical vector with the same length as x
indicating whether
x
is part of the support of the distribution given its parameters.
dist_exponential(rate = 1.0)$is_in_support(c(-1.0, 0.0, 1.0))
is_discrete_at()
Distribution$is_discrete_at(x, with_params = list())
x
Vector of points
with_params
Distribution parameters to use.
Each parameter value can also be a numeric vector of length length(x)
.
In that case, the i
-th point will use the i
-th parameters.
Determine if a value has positive probability
A logical vector with the same length as x
indicating whether
there is a positive probability mass at x
given the Distribution
parameters.
dist_dirac(point = 0.0)$is_discrete_at(c(0.0, 1.0))
tf_is_discrete_at()
Distribution$tf_is_discrete_at()
Compile a TensorFlow function for discrete support checking
A tf_function
taking arguments x
and args
returning whether
the Distribution has a point mass at x
given parameters args
.
has_capability()
Distribution$has_capability(caps)
caps
Character vector of capabilities
Check if a capability is present
A logical vector the same length as caps
.
dist_exponential()$has_capability("density")
get_type()
Distribution$get_type()
Get the type of a Distribution. Type can be one of discrete
,
continuous
or mixed
.
A string representing the type of the Distribution.
dist_exponential()$get_type() dist_dirac()$get_type() dist_mixture(list(dist_dirac(), dist_exponential()))$get_type() dist_mixture(list(dist_dirac(), dist_binomial()))$get_type()
get_components()
Distribution$get_components()
Get the component Distributions of a transformed Distribution.
A possibly empty list of Distributions
dist_trunc(dist_exponential())$get_components() dist_dirac()$get_components() dist_mixture(list(dist_exponential(), dist_gamma()))$get_components()
is_discrete()
Distribution$is_discrete()
Check if a Distribution is discrete, i.e. it has a density with respect to the counting measure.
TRUE
if the Distribution is discrete, FALSE
otherwise.
Note that mixed distributions are not discrete but can have point masses.
dist_exponential()$is_discrete() dist_dirac()$is_discrete()
is_continuous()
Distribution$is_continuous()
Check if a Distribution is continuous, i.e. it has a density with respect to the Lebesgue measure.
TRUE
if the Distribution is continuous, FALSE
otherwise.
Note that mixed distributions are not continuous.
dist_exponential()$is_continuous() dist_dirac()$is_continuous()
require_capability()
Distribution$require_capability( caps, fun_name = paste0(sys.call(-1)[[1]], "()") )
caps
Character vector of Capabilities to require
fun_name
Frienly text to use for generating the error message in case of failure.
Ensure that a Distribution has all required capabilities. Will throw an error if any capability is missing.
Invisibly TRUE
.
dist_exponential()$require_capability("diff_density")
get_dof()
Distribution$get_dof()
Get the number of degrees of freedom of a Distribution family. Only parameters without a fixed default are considered free.
An integer representing the degrees of freedom suitable e.g. for AIC calculations.
dist_exponential()$get_dof() dist_exponential(rate = 1.0)$get_dof()
get_placeholders()
Distribution$get_placeholders()
Get Placeholders of a Distribution family.
Returns a list of free parameters of the family.
Their values will be NULL
.
If the Distribution has Distributions as parameters, placeholders will be computed recursively.
A named list containing any combination of (named or unnamed)
lists and NULL
s.
dist_exponential()$get_placeholders() dist_mixture(list(dist_dirac(), dist_exponential()))$get_placeholders()
get_params()
Distribution$get_params(with_params = list())
with_params
Optional parameter overrides with the same structure
as dist$get_params()
. Given Parameter values are expected to be length
1.
Get a full list of parameters, possibly including placeholders.
A list representing the (recursive) parameter structure of the
Distribution with values for specified parameters and NULL
for free
parameters that are missing both in the Distributions parameters and in
with_params
.
dist_mixture(list(dist_dirac(), dist_exponential()))$get_params( with_params = list(probs = list(0.5, 0.5)) )
tf_make_constants()
Distribution$tf_make_constants(with_params = list())
with_params
Optional parameter overrides with the same structure
as dist$tf_make_constants()
. Given Parameter values are expected to be
length 1.
Get a list of constant TensorFlow parameters
A list representing the (recursive) constant parameters of the
Distribution with values sprecified by parameters. Each constant is a
TensorFlow Tensor of dtype floatx
.
tf_compile_params()
Distribution$tf_compile_params(input, name_prefix = "")
input
A keras layer to bind all outputs to
name_prefix
Prefix to use for layer names
Compile distribution parameters into tensorflow outputs
A list with two elements
outputs
a flat list of keras output layers, one for each parameter.
output_inflater
a function taking keras output layers and
transforming them into a list structure suitable for passing to the
loss function returned by tf_compile_model()
get_param_bounds()
Distribution$get_param_bounds()
Get Interval bounds on all Distribution parameters
A list representing the free (recursive) parameter structure of
the Distribution with Interval
objects as values representing the
bounds of the respective free parameters.
dist_mixture( list(dist_dirac(), dist_exponential()), probs = list(0.5, 0.5) )$get_param_bounds() dist_mixture( list(dist_dirac(), dist_exponential()) )$get_param_bounds() dist_genpareto()$get_param_bounds() dist_genpareto1()$get_param_bounds()
get_param_constraints()
Distribution$get_param_constraints()
Get additional (non-linear) equality constraints on Distribution parameters
NULL
if the box constraints specified by
dist$get_param_bounds()
are sufficient, or a function taking full
Distribution parameters and returning either a numeric vector
(which must be 0 for valid parameter combinations) or a list with
elements
constraints
: The numeric vector of constraints
jacobian
: The Jacobi matrix of the constraints with respect to the
parameters
dist_mixture( list(dist_dirac(), dist_exponential()) )$get_param_constraints()
export_functions()
Distribution$export_functions( name, envir = parent.frame(), with_params = list() )
name
common suffix of the exported functions
envir
Environment to export the functions to
with_params
Optional list of parameters to use as default values for the exported functions
Export sampling, density, probability and quantile functions to plain R functions
Creates new functions in envir
named {r,d,p,q}<name>
which implement
dist$sample
, dist$density
, dist$probability
and dist$quantile
as
plain functions with default arguments specified by with_params
or the
fixed parameters.
The resulting functions will have signatures taking all parameters as separate arguments.
Invisibly NULL
.
tmp_env <- new.env(parent = globalenv()) dist_exponential()$export_functions( name = "exp", envir = tmp_env, with_params = list(rate = 2.0) ) evalq( fitdistrplus::fitdist(rexp(100), "exp"), envir = tmp_env )
clone()
The objects of this class are cloneable with this method.
Distribution$clone(deep = FALSE)
deep
Whether to make a deep clone.
Other Distributions:
dist_bdegp()
,
dist_beta()
,
dist_binomial()
,
dist_blended()
,
dist_dirac()
,
dist_discrete()
,
dist_empirical()
,
dist_erlangmix()
,
dist_exponential()
,
dist_gamma()
,
dist_genpareto()
,
dist_lognormal()
,
dist_mixture()
,
dist_negbinomial()
,
dist_normal()
,
dist_pareto()
,
dist_poisson()
,
dist_translate()
,
dist_trunc()
,
dist_uniform()
,
dist_weibull()
# Example for param_bounds: # Create an Exponential Distribution with rate constrained to (0, 2) # instead of (0, Inf) my_exp <- dist_exponential() my_exp$param_bounds$rate <- interval(c(0, 2)) my_exp$get_param_bounds() fit_dist(my_exp, rexp(100, rate = 3), start = list(rate = 1))$params$rate ## ------------------------------------------------ ## Method `Distribution$sample` ## ------------------------------------------------ dist_exponential(rate = 2.0)$sample(10) ## ------------------------------------------------ ## Method `Distribution$density` ## ------------------------------------------------ dist_exponential()$density(c(1.0, 2.0), with_params = list(rate = 2.0)) ## ------------------------------------------------ ## Method `Distribution$probability` ## ------------------------------------------------ dist_exponential()$probability( c(1.0, 2.0), with_params = list(rate = 2.0) ) ## ------------------------------------------------ ## Method `Distribution$quantile` ## ------------------------------------------------ dist_exponential()$quantile(c(0.1, 0.5), with_params = list(rate = 2.0)) ## ------------------------------------------------ ## Method `Distribution$hazard` ## ------------------------------------------------ dist_exponential(rate = 2.0)$hazard(c(1.0, 2.0)) ## ------------------------------------------------ ## Method `Distribution$diff_density` ## ------------------------------------------------ dist_exponential()$diff_density( c(1.0, 2.0), with_params = list(rate = 2.0) ) ## ------------------------------------------------ ## Method `Distribution$diff_probability` ## ------------------------------------------------ dist_exponential()$diff_probability( c(1.0, 2.0), with_params = list(rate = 2.0) ) ## ------------------------------------------------ ## Method `Distribution$is_in_support` ## ------------------------------------------------ dist_exponential(rate = 1.0)$is_in_support(c(-1.0, 0.0, 1.0)) ## ------------------------------------------------ ## Method `Distribution$is_discrete_at` ## ------------------------------------------------ dist_dirac(point = 0.0)$is_discrete_at(c(0.0, 1.0)) ## ------------------------------------------------ ## Method `Distribution$has_capability` ## ------------------------------------------------ dist_exponential()$has_capability("density") ## ------------------------------------------------ ## Method `Distribution$get_type` ## ------------------------------------------------ dist_exponential()$get_type() dist_dirac()$get_type() dist_mixture(list(dist_dirac(), dist_exponential()))$get_type() dist_mixture(list(dist_dirac(), dist_binomial()))$get_type() ## ------------------------------------------------ ## Method `Distribution$get_components` ## ------------------------------------------------ dist_trunc(dist_exponential())$get_components() dist_dirac()$get_components() dist_mixture(list(dist_exponential(), dist_gamma()))$get_components() ## ------------------------------------------------ ## Method `Distribution$is_discrete` ## ------------------------------------------------ dist_exponential()$is_discrete() dist_dirac()$is_discrete() ## ------------------------------------------------ ## Method `Distribution$is_continuous` ## ------------------------------------------------ dist_exponential()$is_continuous() dist_dirac()$is_continuous() ## ------------------------------------------------ ## Method `Distribution$require_capability` ## ------------------------------------------------ dist_exponential()$require_capability("diff_density") ## ------------------------------------------------ ## Method `Distribution$get_dof` ## ------------------------------------------------ dist_exponential()$get_dof() dist_exponential(rate = 1.0)$get_dof() ## ------------------------------------------------ ## Method `Distribution$get_placeholders` ## ------------------------------------------------ dist_exponential()$get_placeholders() dist_mixture(list(dist_dirac(), dist_exponential()))$get_placeholders() ## ------------------------------------------------ ## Method `Distribution$get_params` ## ------------------------------------------------ dist_mixture(list(dist_dirac(), dist_exponential()))$get_params( with_params = list(probs = list(0.5, 0.5)) ) ## ------------------------------------------------ ## Method `Distribution$get_param_bounds` ## ------------------------------------------------ dist_mixture( list(dist_dirac(), dist_exponential()), probs = list(0.5, 0.5) )$get_param_bounds() dist_mixture( list(dist_dirac(), dist_exponential()) )$get_param_bounds() dist_genpareto()$get_param_bounds() dist_genpareto1()$get_param_bounds() ## ------------------------------------------------ ## Method `Distribution$get_param_constraints` ## ------------------------------------------------ dist_mixture( list(dist_dirac(), dist_exponential()) )$get_param_constraints() ## ------------------------------------------------ ## Method `Distribution$export_functions` ## ------------------------------------------------ tmp_env <- new.env(parent = globalenv()) dist_exponential()$export_functions( name = "exp", envir = tmp_env, with_params = list(rate = 2.0) ) evalq( fitdistrplus::fitdist(rexp(100), "exp"), envir = tmp_env )
# Example for param_bounds: # Create an Exponential Distribution with rate constrained to (0, 2) # instead of (0, Inf) my_exp <- dist_exponential() my_exp$param_bounds$rate <- interval(c(0, 2)) my_exp$get_param_bounds() fit_dist(my_exp, rexp(100, rate = 3), start = list(rate = 1))$params$rate ## ------------------------------------------------ ## Method `Distribution$sample` ## ------------------------------------------------ dist_exponential(rate = 2.0)$sample(10) ## ------------------------------------------------ ## Method `Distribution$density` ## ------------------------------------------------ dist_exponential()$density(c(1.0, 2.0), with_params = list(rate = 2.0)) ## ------------------------------------------------ ## Method `Distribution$probability` ## ------------------------------------------------ dist_exponential()$probability( c(1.0, 2.0), with_params = list(rate = 2.0) ) ## ------------------------------------------------ ## Method `Distribution$quantile` ## ------------------------------------------------ dist_exponential()$quantile(c(0.1, 0.5), with_params = list(rate = 2.0)) ## ------------------------------------------------ ## Method `Distribution$hazard` ## ------------------------------------------------ dist_exponential(rate = 2.0)$hazard(c(1.0, 2.0)) ## ------------------------------------------------ ## Method `Distribution$diff_density` ## ------------------------------------------------ dist_exponential()$diff_density( c(1.0, 2.0), with_params = list(rate = 2.0) ) ## ------------------------------------------------ ## Method `Distribution$diff_probability` ## ------------------------------------------------ dist_exponential()$diff_probability( c(1.0, 2.0), with_params = list(rate = 2.0) ) ## ------------------------------------------------ ## Method `Distribution$is_in_support` ## ------------------------------------------------ dist_exponential(rate = 1.0)$is_in_support(c(-1.0, 0.0, 1.0)) ## ------------------------------------------------ ## Method `Distribution$is_discrete_at` ## ------------------------------------------------ dist_dirac(point = 0.0)$is_discrete_at(c(0.0, 1.0)) ## ------------------------------------------------ ## Method `Distribution$has_capability` ## ------------------------------------------------ dist_exponential()$has_capability("density") ## ------------------------------------------------ ## Method `Distribution$get_type` ## ------------------------------------------------ dist_exponential()$get_type() dist_dirac()$get_type() dist_mixture(list(dist_dirac(), dist_exponential()))$get_type() dist_mixture(list(dist_dirac(), dist_binomial()))$get_type() ## ------------------------------------------------ ## Method `Distribution$get_components` ## ------------------------------------------------ dist_trunc(dist_exponential())$get_components() dist_dirac()$get_components() dist_mixture(list(dist_exponential(), dist_gamma()))$get_components() ## ------------------------------------------------ ## Method `Distribution$is_discrete` ## ------------------------------------------------ dist_exponential()$is_discrete() dist_dirac()$is_discrete() ## ------------------------------------------------ ## Method `Distribution$is_continuous` ## ------------------------------------------------ dist_exponential()$is_continuous() dist_dirac()$is_continuous() ## ------------------------------------------------ ## Method `Distribution$require_capability` ## ------------------------------------------------ dist_exponential()$require_capability("diff_density") ## ------------------------------------------------ ## Method `Distribution$get_dof` ## ------------------------------------------------ dist_exponential()$get_dof() dist_exponential(rate = 1.0)$get_dof() ## ------------------------------------------------ ## Method `Distribution$get_placeholders` ## ------------------------------------------------ dist_exponential()$get_placeholders() dist_mixture(list(dist_dirac(), dist_exponential()))$get_placeholders() ## ------------------------------------------------ ## Method `Distribution$get_params` ## ------------------------------------------------ dist_mixture(list(dist_dirac(), dist_exponential()))$get_params( with_params = list(probs = list(0.5, 0.5)) ) ## ------------------------------------------------ ## Method `Distribution$get_param_bounds` ## ------------------------------------------------ dist_mixture( list(dist_dirac(), dist_exponential()), probs = list(0.5, 0.5) )$get_param_bounds() dist_mixture( list(dist_dirac(), dist_exponential()) )$get_param_bounds() dist_genpareto()$get_param_bounds() dist_genpareto1()$get_param_bounds() ## ------------------------------------------------ ## Method `Distribution$get_param_constraints` ## ------------------------------------------------ dist_mixture( list(dist_dirac(), dist_exponential()) )$get_param_constraints() ## ------------------------------------------------ ## Method `Distribution$export_functions` ## ------------------------------------------------ tmp_env <- new.env(parent = globalenv()) dist_exponential()$export_functions( name = "exp", envir = tmp_env, with_params = list(rate = 2.0) ) evalq( fitdistrplus::fitdist(rexp(100), "exp"), envir = tmp_env )
Fit a Blended mixture using an ECME-Algorithm
fit_blended( dist, obs, start, min_iter = 0L, max_iter = 100L, skip_first_e = FALSE, tolerance = 1e-05, trace = FALSE, ... )
fit_blended( dist, obs, start, min_iter = 0L, max_iter = 100L, skip_first_e = FALSE, tolerance = 1e-05, trace = FALSE, ... )
dist |
A |
obs |
Set of observations as produced by |
start |
Initial values of all placeholder parameters.
If missing, starting values are obtained from |
min_iter |
Minimum number of EM-Iterations |
max_iter |
Maximum number of EM-Iterations (weight updates) |
skip_first_e |
Skip the first E-Step (update Probability weights)? This can help if the initial values cause a mixture component to vanish in the first E-Step before the starting values can be improved. |
tolerance |
Numerical tolerance. |
trace |
Include tracing information in output?
If |
... |
Passed to |
A list with elements
params
the fitted parameters in the same structure as init
.
params_hist
(if trace
is TRUE) the history of parameters
(after each e- and m- step)
iter
the number of outer EM-iterations
logLik
the final log-likelihood
Other distribution fitting functions:
fit_dist()
,
fit_erlang_mixture()
,
fit_mixture()
dist <- dist_blended( list( dist_exponential(), dist_genpareto() ) ) params <- list( probs = list(0.9, 0.1), dists = list( list(rate = 2.0), list(u = 1.5, xi = 0.2, sigmau = 1.0) ), breaks = list(1.5), bandwidths = list(0.3) ) x <- dist$sample(100L, with_params = params) dist$default_params$breaks <- params$breaks dist$default_params$bandwidths <- params$bandwidths if (interactive()) { fit_blended(dist, x) }
dist <- dist_blended( list( dist_exponential(), dist_genpareto() ) ) params <- list( probs = list(0.9, 0.1), dists = list( list(rate = 2.0), list(u = 1.5, xi = 0.2, sigmau = 1.0) ), breaks = list(1.5), bandwidths = list(0.3) ) x <- dist$sample(100L, with_params = params) dist$default_params$breaks <- params$breaks dist$default_params$bandwidths <- params$bandwidths if (interactive()) { fit_blended(dist, x) }
The default implementation performs maximum likelihood estimation on all placeholder parameters.
fit_dist(dist, obs, start, ...) fit_dist_direct(dist, obs, start, ..., .start_with_default = FALSE) ## S3 method for class 'Distribution' fit(object, obs, start, ...)
fit_dist(dist, obs, start, ...) fit_dist_direct(dist, obs, start, ..., .start_with_default = FALSE) ## S3 method for class 'Distribution' fit(object, obs, start, ...)
dist |
A |
obs |
Set of observations as produced by |
start |
Initial values of all placeholder parameters.
If missing, starting values are obtained from |
... |
Distribution-specific arguments for the fitting procedure |
.start_with_default |
Before directly optimising the likelihood, use an optimised algorithm for finding better starting values? |
object |
same as parameter |
For Erlang mixture distributions and for Mixture distributions, an EM-Algorithm is instead used to improve stability.
fit()
and fit_dist()
will chose an optimisation method optimized for the specific distribution given.
fit_dist_direct()
can be used to force direct maximisation of the likelihood.
A list with at least the elements
params
the fitted parameters in the same structure as init
.
logLik
the final log-likelihood
Additional information may be provided depending on dist
.
Other distribution fitting functions:
fit_blended()
,
fit_erlang_mixture()
,
fit_mixture()
Other distribution fitting functions:
fit_blended()
,
fit_erlang_mixture()
,
fit_mixture()
x <- rexp(100) lambda_hat <- 1 / mean(x) lambda_hat2 <- fit_dist(dist_exponential(), x)$params$rate identical(lambda_hat, lambda_hat2) dist <- dist_mixture(list(dist_normal(), dist_translate(dist_exponential(), offset = 6))) params <- list( dists = list(list(mean = 5, sd = 1), list(dist = list(rate = 1))), probs = list(0.95, 0.05) ) set.seed(2000) u <- runif(100, 10, 20) x <- dist$sample(100, with_params = params) obs <- trunc_obs(x = x[x <= u], tmin = -Inf, tmax = u[x <= u]) default_fit <- fit_dist(dist, obs) direct_fit <- fit_dist_direct(dist, obs) # NB: direct optimisation steps with pre-run take a few seconds direct_fit_init <- fit_dist_direct(dist, obs, start = default_fit$params) direct_fit_auto_init <- fit_dist_direct(dist, obs, .start_with_default = TRUE) stopifnot(direct_fit_init$logLik == direct_fit_auto_init$logLik) c(default_fit$logLik, direct_fit$logLik, direct_fit_init$logLik)
x <- rexp(100) lambda_hat <- 1 / mean(x) lambda_hat2 <- fit_dist(dist_exponential(), x)$params$rate identical(lambda_hat, lambda_hat2) dist <- dist_mixture(list(dist_normal(), dist_translate(dist_exponential(), offset = 6))) params <- list( dists = list(list(mean = 5, sd = 1), list(dist = list(rate = 1))), probs = list(0.95, 0.05) ) set.seed(2000) u <- runif(100, 10, 20) x <- dist$sample(100, with_params = params) obs <- trunc_obs(x = x[x <= u], tmin = -Inf, tmax = u[x <= u]) default_fit <- fit_dist(dist, obs) direct_fit <- fit_dist_direct(dist, obs) # NB: direct optimisation steps with pre-run take a few seconds direct_fit_init <- fit_dist_direct(dist, obs, start = default_fit$params) direct_fit_auto_init <- fit_dist_direct(dist, obs, .start_with_default = TRUE) stopifnot(direct_fit_init$logLik == direct_fit_auto_init$logLik) c(default_fit$logLik, direct_fit$logLik, direct_fit_init$logLik)
Find starting values for distribution parameters
## S3 method for class 'MixtureDistribution' fit_dist_start(dist, obs, dists_start = NULL, ...) fit_dist_start(dist, obs, ...)
## S3 method for class 'MixtureDistribution' fit_dist_start(dist, obs, dists_start = NULL, ...) fit_dist_start(dist, obs, ...)
dist |
A |
obs |
Observations to fit to. |
dists_start |
List of initial parameters for all component
distributions. If left empty, initialisation will be automatically performed
using |
... |
Additional arguments for the initialisation procedure |
A list of initial parameters suitable for passing to fit_dist()
.
fit_dist_start(dist_exponential(), rexp(100))
fit_dist_start(dist_exponential(), rexp(100))
Fit an Erlang mixture using an ECME-Algorithm
fit_erlang_mixture( dist, obs, start, min_iter = 0L, max_iter = 100L, skip_first_e = FALSE, tolerance = 1e-05, trace = FALSE, parallel = FALSE, ... )
fit_erlang_mixture( dist, obs, start, min_iter = 0L, max_iter = 100L, skip_first_e = FALSE, tolerance = 1e-05, trace = FALSE, parallel = FALSE, ... )
dist |
An |
obs |
Set of observations as produced by |
start |
Initial values of all placeholder parameters.
If missing, starting values are obtained from |
min_iter |
Minimum number of EM-Iterations |
max_iter |
Maximum number of EM-Iterations (weight updates) |
skip_first_e |
Skip the first E-Step (update Probability weights)? This can help if the initial values cause a mixture component to vanish in the first E-Step before the starting values can be improved. |
tolerance |
Numerical tolerance. |
trace |
Include tracing information in output?
If |
parallel |
Enable experimental parallel evaluation of expected log-likelihood? |
... |
Passed to |
A list with elements
params
the fitted parameters in the same structure as init
.
params_hist
(if trace
is TRUE) the history of parameters
(after each e- and m- step). Otherwise an empty list.
iter
the number of outer EM-iterations
logLik
the final log-likelihood
Other distribution fitting functions:
fit_blended()
,
fit_dist()
,
fit_mixture()
dist <- dist_erlangmix(list(NULL, NULL, NULL)) params <- list( shapes = list(1L, 4L, 12L), scale = 2.0, probs = list(0.5, 0.3, 0.2) ) x <- dist$sample(100L, with_params = params) fit_erlang_mixture(dist, x, init = "kmeans")
dist <- dist_erlangmix(list(NULL, NULL, NULL)) params <- list( shapes = list(1L, 4L, 12L), scale = 2.0, probs = list(0.5, 0.3, 0.2) ) x <- dist$sample(100L, with_params = params) fit_erlang_mixture(dist, x, init = "kmeans")
Fit a generic mixture using an ECME-Algorithm
fit_mixture( dist, obs, start, min_iter = 0L, max_iter = 100L, skip_first_e = FALSE, tolerance = 1e-05, trace = FALSE, ... )
fit_mixture( dist, obs, start, min_iter = 0L, max_iter = 100L, skip_first_e = FALSE, tolerance = 1e-05, trace = FALSE, ... )
dist |
A |
obs |
Set of observations as produced by |
start |
Initial values of all placeholder parameters.
If missing, starting values are obtained from |
min_iter |
Minimum number of EM-Iterations |
max_iter |
Maximum number of EM-Iterations (weight updates) |
skip_first_e |
Skip the first E-Step (update Probability weights)? This can help if the initial values cause a mixture component to vanish in the first E-Step before the starting values can be improved. |
tolerance |
Numerical tolerance. |
trace |
Include tracing information in output?
If |
... |
Passed to |
A list with elements
params
the fitted parameters in the same structure as init
.
params_hist
(if trace
is TRUE) the history of parameters
(after each e- and m- step)
iter
the number of outer EM-iterations
logLik
the final log-likelihood
Other distribution fitting functions:
fit_blended()
,
fit_dist()
,
fit_erlang_mixture()
dist <- dist_mixture( list( dist_dirac(0.0), dist_exponential() ) ) params <- list( probs = list(0.1, 0.9), dists = list( list(), list(rate = 1.0) ) ) x <- dist$sample(100L, with_params = params) fit_mixture(dist, x)
dist <- dist_mixture( list( dist_dirac(0.0), dist_exponential() ) ) params <- list( probs = list(0.1, 0.9), dists = list( list(), list(rate = 1.0) ) ) x <- dist$sample(100L, with_params = params) fit_mixture(dist, x)
This function delegates most work to keras3::fit.keras.src.models.model.Model()
and performs additional consistency
checks to make sure tf_compile_model()
was called with the appropriate options to support fitting the observations
y
as well as automatically converting y
to a n x 6 matrix needed by the compiled loss function.
## S3 method for class 'reservr_keras_model' fit( object, x, y, batch_size = NULL, epochs = 10, verbose = getOption("keras.fit_verbose", default = 1), callbacks = NULL, view_metrics = getOption("keras.view_metrics", default = "auto"), validation_split = 0, validation_data = NULL, shuffle = TRUE, class_weight = NULL, sample_weight = NULL, initial_epoch = 0, steps_per_epoch = NULL, validation_steps = NULL, ... )
## S3 method for class 'reservr_keras_model' fit( object, x, y, batch_size = NULL, epochs = 10, verbose = getOption("keras.fit_verbose", default = 1), callbacks = NULL, view_metrics = getOption("keras.view_metrics", default = "auto"), validation_split = 0, validation_data = NULL, shuffle = TRUE, class_weight = NULL, sample_weight = NULL, initial_epoch = 0, steps_per_epoch = NULL, validation_steps = NULL, ... )
object |
A compiled |
x |
A list of input tensors (predictors) |
y |
A |
batch_size |
Integer or |
epochs |
Integer. Number of epochs to train the model.
An epoch is an iteration over the entire |
verbose |
|
callbacks |
List of |
view_metrics |
View realtime plot of training metrics (by epoch). The
default ( |
validation_split |
Float between 0 and 1.
Fraction of the training data to be used as validation data.
The model will set apart this fraction of the training data,
will not train on it, and will evaluate
the loss and any model metrics
on this data at the end of each epoch.
The validation data is selected from the last samples
in the |
validation_data |
Data on which to evaluate
the loss and any model metrics at the end of each epoch.
The model will not be trained on this data. Thus, note the fact
that the validation loss of data provided using
|
shuffle |
Boolean, whether to shuffle the training data
before each epoch. This argument is
ignored when |
class_weight |
Optional named list mapping class indices (integers, 0-based)
to a weight (float) value, used for weighting the loss function
(during training only).
This can be useful to tell the model to
"pay more attention" to samples from
an under-represented class. When |
sample_weight |
Optional array of weights for
the training samples, used for weighting the loss function
(during training only). You can either pass a flat (1D)
array/vector with the same length as the input samples
(1:1 mapping between weights and samples),
or in the case of temporal data,
you can pass a 2D array (matrix) with shape
|
initial_epoch |
Integer. Epoch at which to start training (useful for resuming a previous training run). |
steps_per_epoch |
Integer or |
validation_steps |
Only relevant if |
... |
Unused. If old arguments are supplied, an error message will be raised informing how to fix the issue. |
Additionally, the default batch_size
is min(nrow(y), 10000)
instead of keras default of 32
because the latter
is a very bad choice for fitting most distributions since the involved loss is much less stable than typical losses
used in machine learning, leading to divergence for small batch sizes.
A history
object that contains all information collected during training.
The model object will be updated in-place as a side-effect.
predict.reservr_keras_model tf_compile_model keras3::fit.keras.src.models.model.Model
dist <- dist_exponential() params <- list(rate = 1.0) N <- 100L rand_input <- runif(N) x <- dist$sample(N, with_params = params) if (interactive()) { tf_in <- keras3::layer_input(1L) mod <- tf_compile_model( inputs = list(tf_in), intermediate_output = tf_in, dist = dist, optimizer = keras3::optimizer_adam(), censoring = FALSE, truncation = FALSE ) tf_fit <- fit( object = mod, x = k_matrix(rand_input), y = x, epochs = 10L, callbacks = list( callback_debug_dist_gradients(mod, k_matrix(rand_input), x, keep_grads = TRUE) ) ) }
dist <- dist_exponential() params <- list(rate = 1.0) N <- 100L rand_input <- runif(N) x <- dist$sample(N, with_params = params) if (interactive()) { tf_in <- keras3::layer_input(1L) mod <- tf_compile_model( inputs = list(tf_in), intermediate_output = tf_in, dist = dist, optimizer = keras3::optimizer_adam(), censoring = FALSE, truncation = FALSE ) tf_fit <- fit( object = mod, x = k_matrix(rand_input), y = x, epochs = 10L, callbacks = list( callback_debug_dist_gradients(mod, k_matrix(rand_input), x, keep_grads = TRUE) ) ) }
Flatten / Inflate parameter lists / vectors
flatten_params(params) flatten_params_matrix(params) flatten_bounds(bounds) inflate_params(flat_params)
flatten_params(params) flatten_params_matrix(params) flatten_bounds(bounds) inflate_params(flat_params)
params |
A named list of parameters to be flattened.
Should be in a form to be passed as the |
bounds |
List of parameter bounds as returned by
|
flat_params |
A named numeric vector of parameters |
flatten_params
returns a 'flattened' vector of parameters.
It is intended as an adapter for multi-dimensional optimisation functions
to distribution objects.
flatten_params_matrix
returns a 'flattened' matrix of parameters.
It is intended as an adapter for multi-dimensional optimisation functions
to distribution objects. Each column corresponds to one input element.
flatten_bounds
returns a named list of vectors with names lower
and upper
. Containing the upper and lower bounds of each parameter.
inflate_params
returns an 'inflated' list of parameters.
This can be passed as the with_params
argument to most distribution
functions.
library(ggplot2) mm <- dist_mixture(list( dist_exponential(NULL), dist_lognormal(0.5, NULL) ), list(NULL, 1)) ph <- mm$get_placeholders() ph_flat <- flatten_params(ph) ph_reinflated <- inflate_params(ph_flat) ph_flat[] <- c(1, 1, 6) ph_sample <- inflate_params(ph_flat) x <- mm$sample( 100, with_params = ph_sample ) emp_cdf <- ecdf(x) ggplot(data.frame(t = seq(from = min(x), to = max(x), length.out = 100))) %+% geom_point(aes(x = t, y = emp_cdf(t))) %+% geom_line(aes(x = t, y = mm$probability(t, with_params = ph_sample)), linetype = 2)
library(ggplot2) mm <- dist_mixture(list( dist_exponential(NULL), dist_lognormal(0.5, NULL) ), list(NULL, 1)) ph <- mm$get_placeholders() ph_flat <- flatten_params(ph) ph_reinflated <- inflate_params(ph_flat) ph_flat[] <- c(1, 1, 6) ph_sample <- inflate_params(ph_flat) x <- mm$sample( 100, with_params = ph_sample ) emp_cdf <- ecdf(x) ggplot(data.frame(t = seq(from = min(x), to = max(x), length.out = 100))) %+% geom_point(aes(x = t, y = emp_cdf(t))) %+% geom_line(aes(x = t, y = mm$probability(t, with_params = ph_sample)), linetype = 2)
These functions provide information about the generalized Pareto distribution
with threshold u
. dgpd
gives the density, pgpd
gives the distribution
function, qgpd
gives the quantile function and rgpd
generates random
deviates.
rgpd(n = 1L, u = 0, sigmau = 1, xi = 0) dgpd(x, u = 0, sigmau = 1, xi = 0, log = FALSE) pgpd(q, u = 0, sigmau = 1, xi = 0, lower.tail = TRUE, log.p = FALSE) qgpd(p, u = 0, sigmau = 1, xi = 0, lower.tail = TRUE, log.p = FALSE)
rgpd(n = 1L, u = 0, sigmau = 1, xi = 0) dgpd(x, u = 0, sigmau = 1, xi = 0, log = FALSE) pgpd(q, u = 0, sigmau = 1, xi = 0, lower.tail = TRUE, log.p = FALSE) qgpd(p, u = 0, sigmau = 1, xi = 0, lower.tail = TRUE, log.p = FALSE)
n |
integer number of observations. |
u |
threshold parameter (minimum value). |
sigmau |
scale parameter (must be positive). |
xi |
shape parameter |
x , q
|
vector of quantiles. |
log , log.p
|
logical; if |
lower.tail |
logical; if |
p |
vector of probabilities. |
If u
, sigmau
or xi
are not specified, they assume the default values of
0
, 1
and 0
respectively.
The generalized Pareto distribution has density
where and
if
is 0.
The support is
for
and
for
.
The Expected value exists if and is equal to
k-th moments exist in general for .
rgpd
generates random deviates.
dgpd
gives the density.
pgpd
gives the distribution function.
qgpd
gives the quantile function.
https://en.wikipedia.org/wiki/Generalized_Pareto_distribution
x <- rgpd(1000, u = 1, sigmau = 0.5, xi = 0.1) xx <- seq(-1, 10, 0.01) hist(x, breaks = 100, freq = FALSE, xlim = c(-1, 10)) lines(xx, dgpd(xx, u = 1, sigmau = 0.5, xi = 0.1)) plot(xx, dgpd(xx, u = 1, sigmau = 1, xi = 0), type = "l") lines(xx, dgpd(xx, u = 0.5, sigmau = 1, xi = -0.3), col = "blue", lwd = 2) lines(xx, dgpd(xx, u = 1.5, sigmau = 1, xi = 0.3), col = "red", lwd = 2) plot(xx, dgpd(xx, u = 1, sigmau = 1, xi = 0), type = "l") lines(xx, dgpd(xx, u = 1, sigmau = 0.5, xi = 0), col = "blue", lwd = 2) lines(xx, dgpd(xx, u = 1, sigmau = 2, xi = 0), col = "red", lwd = 2)
x <- rgpd(1000, u = 1, sigmau = 0.5, xi = 0.1) xx <- seq(-1, 10, 0.01) hist(x, breaks = 100, freq = FALSE, xlim = c(-1, 10)) lines(xx, dgpd(xx, u = 1, sigmau = 0.5, xi = 0.1)) plot(xx, dgpd(xx, u = 1, sigmau = 1, xi = 0), type = "l") lines(xx, dgpd(xx, u = 0.5, sigmau = 1, xi = -0.3), col = "blue", lwd = 2) lines(xx, dgpd(xx, u = 1.5, sigmau = 1, xi = 0.3), col = "red", lwd = 2) plot(xx, dgpd(xx, u = 1, sigmau = 1, xi = 0), type = "l") lines(xx, dgpd(xx, u = 1, sigmau = 0.5, xi = 0), col = "blue", lwd = 2) lines(xx, dgpd(xx, u = 1, sigmau = 2, xi = 0), col = "red", lwd = 2)
Integrates fun over the bounds [ lower, upper ] vectorized over lower
and
upper
. Vectorized list structures of parameters can also be passed.
integrate_gk( fun, lower, upper, params = list(), .tolerance = .Machine$double.eps^0.25, .max_iter = 100L )
integrate_gk( fun, lower, upper, params = list(), .tolerance = .Machine$double.eps^0.25, .max_iter = 100L )
fun |
A function to integrate. Must be vectorized and take one or two arguments, the first being points to evaluate at and the second (optionally) being parameters to apply. It must return a numeric vector the same length as its first input. Currently, infinite bounds are not supported. |
lower , upper
|
Integration bounds. Must have the same length. |
params |
Parameters to pass as a second argument to |
.tolerance |
Absolute element-wise tolerance. |
.max_iter |
Maximum number of iterations. The number of
integration intervals will be at most |
The integration error is estimated by the Gauss-Kronrod quadrature as the
absolute difference between the 7-point quadrature and the 15-point
quadrature. Integrals that did not converge will be bisected at the midpoint.
The params
object will be recursively subsetted on all numeric vectors with
the same length as the number of observations.
A vector of integrals with the i-th entry containing an approximation of the integral of fun(t, pick_params_at(params, i)) dt over the interval lower[i] to upper[i]
# Argument recycling and parallel integration of two intervals integrate_gk(sin, 0, c(pi, 2 * pi)) dist <- dist_exponential() integrate_gk( function(x, p) dist$density(x, with_params = p), lower = 0, upper = 1:10, params = list(rate = 1 / 1:10) ) dist$probability(1:10, with_params = list(rate = 1 / 1:10))
# Argument recycling and parallel integration of two intervals integrate_gk(sin, 0, c(pi, 2 * pi)) dist <- dist_exponential() integrate_gk( function(x, p) dist$density(x, with_params = p), lower = 0, upper = 1:10, params = list(rate = 1 / 1:10) ) dist$probability(1:10, with_params = list(rate = 1 / 1:10))
Intervals
interval( range = c(-Inf, Inf), ..., include_lowest = closed, include_highest = closed, closed = FALSE, integer = FALSE, read_only = FALSE ) is.Interval(x)
interval( range = c(-Inf, Inf), ..., include_lowest = closed, include_highest = closed, closed = FALSE, integer = FALSE, read_only = FALSE ) is.Interval(x)
range |
The interval boundaries as a sorted two-element numeric vector. |
... |
First argument is used as the endpoint if |
include_lowest |
Is the lower boundary part of the interval? |
include_highest |
Is the upper boundary part of the interval? |
closed |
Is the interval closed? |
integer |
Is the interval only over the integers? |
read_only |
Make the interval object read-only? |
x |
An object. |
interval
returns an Interval
.
is.Interval
returns TRUE
if x
is an Interval
, FALSE
otherwise.
interval-operations
# The real line interval() # Closed unit interval interval(c(0, 1), closed = TRUE) # Alternative form interval(0, 1, closed = TRUE) # Non-negative real line interval(c(0, Inf), include_lowest = TRUE)
# The real line interval() # Closed unit interval interval(c(0, 1), closed = TRUE) # Alternative form interval(0, 1, closed = TRUE) # Non-negative real line interval(c(0, Inf), include_lowest = TRUE)
Convex union and intersection of intervals
interval_union(..., intervals = list()) interval_intersection(..., intervals = list())
interval_union(..., intervals = list()) interval_intersection(..., intervals = list())
... |
appened to |
intervals |
A list of |
interval_union
returns the convex union of all intervals in intervals
.
This is the smallest interval completely containing all intervals.
interval_intersection
returns the set intersection of all intervals in
intervals
. The empty set is represented by the open interval (0, 0).
interval
interval_union( interval(c(0, 1), closed = TRUE), interval(c(1, 2)) ) interval_union( interval(c(0, 5)), interval(c(1, 4), closed = TRUE) ) # Convex union is not equal to set union: interval_union( interval(c(0, 1)), interval(c(2, 3)) ) # The empty union is {} interval_union() interval_intersection( interval(c(0, 1)), interval(c(0.5, 2)) ) interval_intersection( interval(c(0, Inf)), interval(c(-Inf, 0)) ) interval_intersection( interval(c(0, Inf), include_lowest = TRUE), interval(c(-Inf, 0), include_highest = TRUE) ) interval_intersection( interval(c(0, 5)), interval(c(1, 6), closed = TRUE) ) # The empty intersection is (-Inf, Inf) interval_intersection()
interval_union( interval(c(0, 1), closed = TRUE), interval(c(1, 2)) ) interval_union( interval(c(0, 5)), interval(c(1, 4), closed = TRUE) ) # Convex union is not equal to set union: interval_union( interval(c(0, 1)), interval(c(2, 3)) ) # The empty union is {} interval_union() interval_intersection( interval(c(0, 1)), interval(c(0.5, 2)) ) interval_intersection( interval(c(0, Inf)), interval(c(-Inf, 0)) ) interval_intersection( interval(c(0, Inf), include_lowest = TRUE), interval(c(-Inf, 0), include_highest = TRUE) ) interval_intersection( interval(c(0, 5)), interval(c(1, 6), closed = TRUE) ) # The empty intersection is (-Inf, Inf) interval_intersection()
Test if object is a Distribution
is.Distribution(object)
is.Distribution(object)
object |
An R object. |
TRUE
if object
is a Distribution, FALSE
otherwise.
is.Distribution(dist_dirac())
is.Distribution(dist_dirac())
Cast to a TensorFlow matrix
k_matrix(x, dtype = NULL)
k_matrix(x, dtype = NULL)
x |
Numeric object to be converted to a matrix Tensor. |
dtype |
Type of the elements of the resulting tensor. Defaults to |
A two-dimensional tf.Tensor
with values from x
.
The shape will be (nrow(x), ncol(x))
where x
is first converted to an R matrix via as.matrix()
.
if (interactive()) { k_matrix(diag(1:3)) k_matrix(diag(1:3), dtype = "int32") # Vectors are converted to columns: k_matrix(1:3) }
if (interactive()) { k_matrix(diag(1:3)) k_matrix(diag(1:3), dtype = "int32") # Vectors are converted to columns: k_matrix(1:3) }
These functions provide information about the Pareto distribution.
dpareto
gives the density, ppareto
gives the distribution
function, qpareto
gives the quantile function and rpareto
generates random
deviates.
rpareto(n = 1L, shape = 0, scale = 1) dpareto(x, shape = 1, scale = 1, log = FALSE) ppareto(q, shape = 1, scale = 1, lower.tail = TRUE, log.p = FALSE) qpareto(p, shape = 1, scale = 1, lower.tail = TRUE, log.p = FALSE)
rpareto(n = 1L, shape = 0, scale = 1) dpareto(x, shape = 1, scale = 1, log = FALSE) ppareto(q, shape = 1, scale = 1, lower.tail = TRUE, log.p = FALSE) qpareto(p, shape = 1, scale = 1, lower.tail = TRUE, log.p = FALSE)
n |
integer number of observations. |
shape |
shape parameter (must be positive). |
scale |
scale parameter (must be positive). |
x , q
|
vector of quantiles. |
log , log.p
|
logical; if |
lower.tail |
logical; if |
p |
vector of probabilities. |
If shape
or scale
are not specified, they assume the default values of 1
.
The Pareto distribution with scale and shape
has density
The support is .
The Expected value exists if and is equal to
k-th moments exist in general for .
rpareto
generates random deviates.
dpareto
gives the density.
ppareto
gives the distribution function.
qpareto
gives the quantile function.
https://en.wikipedia.org/wiki/Pareto_distribution - named Lomax therein.
x <- rpareto(1000, shape = 10, scale = 5) xx <- seq(-1, 10, 0.01) hist(x, breaks = 100, freq = FALSE, xlim = c(-1, 10)) lines(xx, dpareto(xx, shape = 10, scale = 5)) plot(xx, dpareto(xx, shape = 10, scale = 5), type = "l") lines(xx, dpareto(xx, shape = 3, scale = 5), col = "red", lwd = 2) plot(xx, dpareto(xx, shape = 10, scale = 10), type = "l") lines(xx, dpareto(xx, shape = 10, scale = 5), col = "blue", lwd = 2) lines(xx, dpareto(xx, shape = 10, scale = 20), col = "red", lwd = 2)
x <- rpareto(1000, shape = 10, scale = 5) xx <- seq(-1, 10, 0.01) hist(x, breaks = 100, freq = FALSE, xlim = c(-1, 10)) lines(xx, dpareto(xx, shape = 10, scale = 5)) plot(xx, dpareto(xx, shape = 10, scale = 5), type = "l") lines(xx, dpareto(xx, shape = 3, scale = 5), col = "red", lwd = 2) plot(xx, dpareto(xx, shape = 10, scale = 10), type = "l") lines(xx, dpareto(xx, shape = 10, scale = 5), col = "blue", lwd = 2) lines(xx, dpareto(xx, shape = 10, scale = 20), col = "red", lwd = 2)
Plot several distributions
plot_distributions( ..., distributions = list(), .x, plots = c("density", "probability", "hazard"), with_params = list(), as_list = FALSE )
plot_distributions( ..., distributions = list(), .x, plots = c("density", "probability", "hazard"), with_params = list(), as_list = FALSE )
... |
distribution objects (must be named) |
distributions |
Named list of distribution objects.
This is concatenated with |
.x |
Numeric vector of points to evaluate at. |
plots |
Plots to be created. May be abbreviated. The plots will be stacked in the order given from top to bottom. |
with_params |
list of distribution parameters to be given to each
distribution using |
as_list |
return a list of ggplots instead of a patchwork? |
A stacked patchwork of the requested ggplots
rate <- 1 x <- rexp(20, rate) d_emp <- dist_empirical(x, positive = TRUE) d_exp <- dist_exponential() plot_distributions( empirical = d_emp, theoretical = d_exp, estimated = d_exp, with_params = list( theoretical = list(rate = rate), estimated = list(rate = 1 / mean(x)) ), .x = seq(1e-4, 5, length.out = 100) )
rate <- 1 x <- rexp(20, rate) d_emp <- dist_empirical(x, positive = TRUE) d_exp <- dist_exponential() plot_distributions( empirical = d_emp, theoretical = d_exp, estimated = d_exp, with_params = list( theoretical = list(rate = rate), estimated = list(rate = 1 / mean(x)) ), .x = seq(1e-4, 5, length.out = 100) )
Predict individual distribution parameters
## S3 method for class 'reservr_keras_model' predict(object, data, as_matrix = FALSE, ...)
## S3 method for class 'reservr_keras_model' predict(object, data, as_matrix = FALSE, ...)
object |
A compiled and trained |
data |
Input data compatible with the model. |
as_matrix |
Return a parameter matrix instead of a list structure? |
... |
ignored |
A parameter list suitable for the with_params
argument of the distribution family used for the model.
Contains one set of parameters per row in data
.
if (interactive()) { dist <- dist_exponential() params <- list(rate = 1.0) N <- 100L rand_input <- runif(N) x <- dist$sample(N, with_params = params) tf_in <- keras3::layer_input(1L) mod <- tf_compile_model( inputs = list(tf_in), intermediate_output = tf_in, dist = dist, optimizer = keras3::optimizer_adam(), censoring = FALSE, truncation = FALSE ) tf_fit <- fit( object = mod, x = k_matrix(rand_input), y = x, epochs = 10L, callbacks = list( callback_debug_dist_gradients(mod, k_matrix(rand_input), x) ) ) tf_preds <- predict(mod, data = k_matrix(rand_input)) }
if (interactive()) { dist <- dist_exponential() params <- list(rate = 1.0) N <- 100L rand_input <- runif(N) x <- dist$sample(N, with_params = params) tf_in <- keras3::layer_input(1L) mod <- tf_compile_model( inputs = list(tf_in), intermediate_output = tf_in, dist = dist, optimizer = keras3::optimizer_adam(), censoring = FALSE, truncation = FALSE ) tf_fit <- fit( object = mod, x = k_matrix(rand_input), y = x, epochs = 10L, callbacks = list( callback_debug_dist_gradients(mod, k_matrix(rand_input), x) ) ) tf_preds <- predict(mod, data = k_matrix(rand_input)) }
Determines the probability that claims occuring under a Poisson process with
arrival intensity expo
and reporting delay distribution dist
during the
time between t_min
and t_max
are reported between tau_min
and
tau_max
.
prob_report( dist, intervals, expo = NULL, with_params = list(), .tolerance = .Machine$double.eps^0.5, .max_iter = 100L, .try_compile = TRUE )
prob_report( dist, intervals, expo = NULL, with_params = list(), .tolerance = .Machine$double.eps^0.5, .max_iter = 100L, .try_compile = TRUE )
dist |
A reporting delay Distribution, or a compiled interval probability function. |
intervals |
A data frame with columns |
expo |
Poisson intensity. If given, must be a vectorised function that
yields the intensity of the claim arrival process at a specified time.
|
with_params |
Parameters of |
.tolerance |
Absolute element-wise tolerance. |
.max_iter |
Maximum number of iterations. The number of
integration intervals will be at most |
.try_compile |
Try compiling the distributions probability function to speed up integration? |
The reporting probability is given by
P(x + d in [tmin, tmax] | x in [xmin, xmax]) = E(P(x + d in [tmin, tmax] | x) | x in [xmin, xmax]) / P(x in [xmin, xmax]) = int_[xmin, xmax] expo(x) P(x + d in [tmin, tmax]) dx = int_[xmin, xmax] expo(x) P(d in [tmin - x, tmax - x]) dx / int_[xmin, xmax] expo(x) dx
prob_report
uses integrate_gk()
to compute the two integrals.
A vector of reporting probabilities, with one entry per row of intervals
.
dist <- dist_exponential() ints <- data.frame( xmin = 0, xmax = 1, tmin = seq_len(10) - 1.0, tmax = seq_len(10) ) params <- list(rate = rep(c(1, 0.5), each = 5)) prob_report(dist, ints, with_params = params)
dist <- dist_exponential() ints <- data.frame( xmin = 0, xmax = 1, tmin = seq_len(10) - 1.0, tmax = seq_len(10) ) params <- list(rate = rep(c(1, 0.5), each = 5)) prob_report(dist, ints, with_params = params)
Produces quantiles corresponding to the given probabilities with configurable distribution parameters.
## S3 method for class 'Distribution' quantile(x, probs = seq(0, 1, 0.25), with_params = list(), ..., .start = 0)
## S3 method for class 'Distribution' quantile(x, probs = seq(0, 1, 0.25), with_params = list(), ..., .start = 0)
x |
A |
probs |
Quantiles to compute. |
with_params |
Optional list of distribution parameters. Note that if |
... |
ignored |
.start |
Starting value if quantiles are computed numerically. Must be within the support of |
If x$has_capability("quantile")
is true, this returns the same as x$quantile(probs, with_params = with_params)
.
In this case, with_params
may contain separate sets of parameters for each quantile to be determined.
Otherwise, a numerical estimation of the quantiles is done using the density and probability function.
This method assumes with_params
to cantain only one set of parameters.
The strategy uses two steps:
Find the smallest and largest quantiles in probs
using a newton method starting from .start
.
Find the remaining quantiles with bisection using stats::uniroot()
.
The quantiles of x
corresponding to probs
with parameters with_params
.
# With quantiles available dist <- dist_normal(sd = 1) qqs <- quantile(dist, probs = rep(0.5, 3), with_params = list(mean = 1:3)) stopifnot(all.equal(qqs, 1:3)) # Without quantiles available dist <- dist_erlangmix(shapes = list(1, 2, 3), scale = 1.0) my_probs <- c(0, 0.01, 0.25, 0.5, 0.75, 1) qqs <- quantile( dist, probs = my_probs, with_params = list(probs = list(0.5, 0.3, 0.2)), .start = 2 ) all.equal(dist$probability(qqs, with_params = list(probs = list(0.5, 0.3, 0.2))), my_probs) # Careful: Numerical estimation of extreme quantiles can result in out-of-bounds values. # The correct 0-quantile would be 0 in this case, but it was estimated < 0. qqs[1L]
# With quantiles available dist <- dist_normal(sd = 1) qqs <- quantile(dist, probs = rep(0.5, 3), with_params = list(mean = 1:3)) stopifnot(all.equal(qqs, 1:3)) # Without quantiles available dist <- dist_erlangmix(shapes = list(1, 2, 3), scale = 1.0) my_probs <- c(0, 0.01, 0.25, 0.5, 0.75, 1) qqs <- quantile( dist, probs = my_probs, with_params = list(probs = list(0.5, 0.3, 0.2)), .start = 2 ) all.equal(dist$probability(qqs, with_params = list(probs = list(0.5, 0.3, 0.2))), my_probs) # Careful: Numerical estimation of extreme quantiles can result in out-of-bounds values. # The correct 0-quantile would be 0 in this case, but it was estimated < 0. qqs[1L]
Softmax for a vector x is defined as
softmax(x) dsoftmax(x)
softmax(x) dsoftmax(x)
x |
A numeric vector or matrix |
It satisfies sum(s) == 1.0
and can be used to smoothly enforce a sum
constraint.
softmax
returns the softmax of x
; rowwise if x
is a matrix.
dsoftmax
returns the Jacobi-matrix of softmax(x)
at x
. x
must be a vector.
softmax(c(5, 5)) softmax(diag(nrow = 5, ncol = 6))
softmax(c(5, 5)) softmax(diag(nrow = 5, ncol = 6))
Compile a Keras model for truncated data under dist
tf_compile_model( inputs, intermediate_output, dist, optimizer, censoring = TRUE, truncation = TRUE, metrics = NULL, weighted_metrics = NULL )
tf_compile_model( inputs, intermediate_output, dist, optimizer, censoring = TRUE, truncation = TRUE, metrics = NULL, weighted_metrics = NULL )
inputs |
List of keras input layers |
intermediate_output |
Intermediate model layer to be used as input to distribution parameters |
dist |
A |
optimizer |
String (name of optimizer) or optimizer instance. See
|
censoring |
A flag, whether the compiled model should support censored
observations. Set to |
truncation |
A flag, whether the compiled model should support truncated
observations. Set to |
metrics |
List of metrics to be evaluated by the model during training and testing. Each of these can be:
Typically you will use
If providing an anonymous R function, you can customize the printed name
during training by assigning |
weighted_metrics |
List of metrics to be evaluated and weighted by
|
A reservr_keras_model
that can be used to train truncated
and censored observations from dist
based on input data from inputs
.
dist <- dist_exponential() params <- list(rate = 1.0) N <- 100L rand_input <- runif(N) x <- dist$sample(N, with_params = params) if (interactive()) { tf_in <- keras3::layer_input(1L) mod <- tf_compile_model( inputs = list(tf_in), intermediate_output = tf_in, dist = dist, optimizer = keras3::optimizer_adam(), censoring = FALSE, truncation = FALSE ) }
dist <- dist_exponential() params <- list(rate = 1.0) N <- 100L rand_input <- runif(N) x <- dist$sample(N, with_params = params) if (interactive()) { tf_in <- keras3::layer_input(1L) mod <- tf_compile_model( inputs = list(tf_in), intermediate_output = tf_in, dist = dist, optimizer = keras3::optimizer_adam(), censoring = FALSE, truncation = FALSE ) }
Initialises a compiled reservr_keras_model
weights such that the predictions are equal to, or close to, the
distribution parameters given by params
.
tf_initialise_model( model, params, mode = c("scale", "perturb", "zero", "none") )
tf_initialise_model( model, params, mode = c("scale", "perturb", "zero", "none") )
model |
A |
params |
A list of distribution parameters compatible with |
mode |
An initialisation mode
|
Invisibly model
with changed weights
dist <- dist_exponential() group <- sample(c(0, 1), size = 100, replace = TRUE) x <- dist$sample(100, with_params = list(rate = group + 1)) global_fit <- fit(dist, x) if (interactive()) { library(keras3) l_in <- layer_input(shape = 1L) mod <- tf_compile_model( inputs = list(l_in), intermediate_output = l_in, dist = dist, optimizer = optimizer_adam(), censoring = FALSE, truncation = FALSE ) tf_initialise_model(mod, global_fit$params) fit_history <- fit( mod, x = group, y = x, epochs = 200L ) predicted_means <- predict(mod, data = as_tensor(c(0, 1), config_floatx())) }
dist <- dist_exponential() group <- sample(c(0, 1), size = 100, replace = TRUE) x <- dist$sample(100, with_params = list(rate = group + 1)) global_fit <- fit(dist, x) if (interactive()) { library(keras3) l_in <- layer_input(shape = 1L) mod <- tf_compile_model( inputs = list(l_in), intermediate_output = l_in, dist = dist, optimizer = optimizer_adam(), censoring = FALSE, truncation = FALSE ) tf_initialise_model(mod, global_fit$params) fit_history <- fit( mod, x = group, y = x, epochs = 200L ) predicted_means <- predict(mod, data = as_tensor(c(0, 1), config_floatx())) }
If x
is missing, both xmin
and xmax
must be specified.
trunc_obs(x, xmin = x, xmax = x, tmin = -Inf, tmax = Inf, w = 1) as_trunc_obs(.data) truncate_obs(.data, tmin_new = -Inf, tmax_new = Inf, .partial = FALSE) repdel_obs(.data, accident, delay, time, .truncate = FALSE)
trunc_obs(x, xmin = x, xmax = x, tmin = -Inf, tmax = Inf, w = 1) as_trunc_obs(.data) truncate_obs(.data, tmin_new = -Inf, tmax_new = Inf, .partial = FALSE) repdel_obs(.data, accident, delay, time, .truncate = FALSE)
x |
Observations |
xmin , xmax
|
Censoring bounds. If |
tmin , tmax
|
Truncation bounds. May vary per observation. |
w |
Case weights |
.data |
A data frame or numeric vector. |
tmin_new |
New truncation minimum |
tmax_new |
New truncation maximum |
.partial |
Enable partial truncation of censored observations? This could potentially create inconsistent data if the actual observation lies outside of the truncation bounds but the censoring interval overlaps. |
accident |
accident time (unquoted, evaluated in |
delay |
reporting delay (unquoted, evaluated in |
time |
evaluation time (unquoted, evaluated in |
.truncate |
Should claims reported after |
Uncensored observations must satisfy tmin <= xmin = x = xmax <= tmax
.
Censored observations must satisfy tmin <= xmin < xmax <= tmax
and x = NA
.
trunc_obs: A trunc_obs
tibble with columns x
, xmin
, xmax
,
tmin
and tmax
describing possibly interval-censored observations with
truncation
as_trunc_obs
returns a trunc_obs
tibble.
truncate_obs
returns a trunc_obs
tibble with possibly fewer
observations than .data
and updated truncation bounds.
repdel_obs
returns a trunc_obs
tibble corresponding to the reporting
delay observations of each claim. If .truncate
is FALSE
, the result is
guaranteed to have the same number of rows as .data
.
N <- 100 x <- rexp(N, 0.5) # Random, observation dependent truncation intervals tmin <- runif(N, 0, 1) tmax <- tmin + runif(N, 1, 2) oob <- x < tmin | x > tmax x <- x[!oob] tmin <- tmin[!oob] tmax <- tmax[!oob] # Number of observations after truncation N <- length(x) # Randomly interval censor 30% of observations cens <- rbinom(N, 1, 0.3) == 1L xmin <- x xmax <- x xmin[cens] <- pmax(tmin[cens], floor(x[cens])) xmax[cens] <- pmin(tmax[cens], ceiling(x[cens])) x[cens] <- NA trunc_obs(x, xmin, xmax, tmin, tmax) as_trunc_obs(c(1, 2, 3)) as_trunc_obs(data.frame(x = 1:3, tmin = 0, tmax = 10)) as_trunc_obs(data.frame(x = c(1, NA), xmin = c(1, 2), xmax = c(1, 3))) truncate_obs(1:10, tmin_new = 2.0, tmax_new = 8.0)
N <- 100 x <- rexp(N, 0.5) # Random, observation dependent truncation intervals tmin <- runif(N, 0, 1) tmax <- tmin + runif(N, 1, 2) oob <- x < tmin | x > tmax x <- x[!oob] tmin <- tmin[!oob] tmax <- tmax[!oob] # Number of observations after truncation N <- length(x) # Randomly interval censor 30% of observations cens <- rbinom(N, 1, 0.3) == 1L xmin <- x xmax <- x xmin[cens] <- pmax(tmin[cens], floor(x[cens])) xmax[cens] <- pmin(tmax[cens], ceiling(x[cens])) x[cens] <- NA trunc_obs(x, xmin, xmax, tmin, tmax) as_trunc_obs(c(1, 2, 3)) as_trunc_obs(data.frame(x = 1:3, tmin = 0, tmax = 10)) as_trunc_obs(data.frame(x = c(1, NA), xmin = c(1, 2), xmax = c(1, 3))) truncate_obs(1:10, tmin_new = 2.0, tmax_new = 8.0)
Truncate claims data subject to reporting delay
truncate_claims(data, accident, delay, time, .report_col = "report")
truncate_claims(data, accident, delay, time, .report_col = "report")
data |
Full claims data including IBNR |
accident |
Accident times. May be an unquoted column name from |
delay |
Reporting delays. May be an unquoted column name from |
time |
Observation time (scalar number or one per claim).
Claims with |
.report_col |
|
Truncated data
.
The reporting time is stored in a colnumn named by .report_col
unless
.report_col
is NULL
.
If both .report_col
is NULL
and time
contains only Inf
s,
a warning will be issued since data
will be
returned unchanged and no work will be done.
claims_full <- data.frame( acc = runif(100), repdel = rexp(100) ) tau <- 2.0 truncate_claims(claims_full, acc, repdel, tau)
claims_full <- data.frame( acc = runif(100), repdel = rexp(100) ) tau <- 2.0 truncate_claims(claims_full, acc, repdel, tau)
Compute weighted moments
weighted_moments(x, w, n = 2L, center = TRUE)
weighted_moments(x, w, n = 2L, center = TRUE)
x |
Observations |
w |
Case weights (optional) |
n |
Number of moments to calculate |
center |
Calculate centralized moments (default) or noncentralized moments, i.e. E((X - E(X))^k) or E(X^k). |
A vector of length n
where the k
th entry is the k
th weighted
moment of x
with weights w
. If center
is TRUE
the moments are
centralized, i.e. E((X - E(X))^k). The first moment is never centralized.
The moments are scaled with 1 / sum(w), so they are not de-biased.
e.g. the second central weighted moment
weighted_moment(x, w)[2L]
is equal to
var(rep(x, w)) * (sum(w) - 1) / sum(w)
for integer w
Other weighted statistics:
weighted_quantile()
,
weighted_tabulate()
weighted_moments(rexp(100)) weighted_moments(c(1, 2, 3), c(1, 2, 3)) c(mean(rep(1:3, 1:3)), var(rep(1:3, 1:3)) * 5 / 6)
weighted_moments(rexp(100)) weighted_moments(c(1, 2, 3), c(1, 2, 3)) c(mean(rep(1:3, 1:3)), var(rep(1:3, 1:3)) * 5 / 6)
Compute weighted quantiles
weighted_quantile(x, w, probs) weighted_median(x, w)
weighted_quantile(x, w, probs) weighted_median(x, w)
x |
Observations |
w |
Case weights (optional) |
probs |
Quantiles to calculate |
A vector the same length as probs
with the corresponding weighted
quantiles of x
with weight w
. For integer weights, this is equivalent to
quantile(rep(x, w), probs)
The weighted median of x
with weights w
.
For integer weights, this is equivalent to median(rep(x, w))
Other weighted statistics:
weighted_moments()
,
weighted_tabulate()
weighted_median(1:6) weighted_median(1:3, c(1, 4, 9)) weighted_median(1:3, c(9, 4, 1)) weighted_quantile(1:3, c(1, 4, 9), seq(0.0, 1.0, by = 0.25)) quantile(rep(1:3, c(1, 4, 9)), seq(0.0, 1.0, by = 0.25))
weighted_median(1:6) weighted_median(1:3, c(1, 4, 9)) weighted_median(1:3, c(9, 4, 1)) weighted_quantile(1:3, c(1, 4, 9), seq(0.0, 1.0, by = 0.25)) quantile(rep(1:3, c(1, 4, 9)), seq(0.0, 1.0, by = 0.25))
Computes the sum of w
grouped by bin
.
If w
is missing the result is equivalent to
tabulate(bin, nbins)
weighted_tabulate(bin, w, nbins = max(1L, bin, na.rm = TRUE))
weighted_tabulate(bin, w, nbins = max(1L, bin, na.rm = TRUE))
bin |
An integer vector with values from |
w |
Weights per entry in |
nbins |
Number of bins |
A vector with length nbins
where the i
th result is equal to
sum(w[bin == i])
or sum(bin == i)
if w
is missing.
For integer weights, this is equivalent to tabulate(rep(bin, w), nbins)
.
Other weighted statistics:
weighted_moments()
,
weighted_quantile()
weighted_tabulate(c(1, 1, 2)) weighted_tabulate(c(1, 1, 2), nbins = 3L) weighted_tabulate(c(1, 1, 2), w = c(0.5, 0.5, 1), nbins = 3L)
weighted_tabulate(c(1, 1, 2)) weighted_tabulate(c(1, 1, 2), nbins = 3L) weighted_tabulate(c(1, 1, 2), w = c(0.5, 0.5, 1), nbins = 3L)