| Title: | Vectorised Probability Distributions |
|---|---|
| Description: | Vectorised distribution objects with tools for manipulating, visualising, and using probability distributions. Designed to allow model prediction outputs to return distributions rather than their parameters, allowing users to directly interact with predictive distributions in a data-oriented workflow. In addition to providing generic replacements for p/d/q/r functions, other useful statistics can be computed including means, variances, intervals, and highest density regions. |
| Authors: | Mitchell O'Hara-Wild [aut, cre] (ORCID: <https://orcid.org/0000-0001-6729-7695>), Matthew Kay [aut] (ORCID: <https://orcid.org/0000-0001-9446-0419>), Alex Hayes [aut] (ORCID: <https://orcid.org/0000-0002-4985-5160>), Rob Hyndman [aut] (ORCID: <https://orcid.org/0000-0002-2140-5352>), Earo Wang [ctb] (ORCID: <https://orcid.org/0000-0001-6448-5260>), Vencislav Popov [ctb] (ORCID: <https://orcid.org/0000-0002-8073-4199>) |
| Maintainer: | Mitchell O'Hara-Wild <[email protected]> |
| License: | GPL-3 |
| Version: | 0.7.0.9000 |
| Built: | 2026-05-27 14:38:33 UTC |
| Source: | https://github.com/mitchelloharawild/distributional |
cdf(x, q, ..., log = FALSE) ## S3 method for class 'distribution' cdf(x, q, ...)cdf(x, q, ..., log = FALSE) ## S3 method for class 'distribution' cdf(x, q, ...)
x |
The distribution(s). |
q |
The quantile at which the cdf is calculated. |
... |
Additional arguments passed to methods. |
log |
If |
A generic function for computing the covariance of an object.
covariance(x, ...)covariance(x, ...)
x |
An object. |
... |
Additional arguments used by methods. |
covariance.distribution(), variance()
Returns the empirical covariance of the probability distribution. If the method does not exist, the covariance of a random sample will be returned.
## S3 method for class 'distribution' covariance(x, ...)## S3 method for class 'distribution' covariance(x, ...)
x |
The distribution(s). |
... |
Additional arguments used by methods. |
Computes the probability density function for a continuous distribution, or the probability mass function for a discrete distribution.
## S3 method for class 'distribution' density(x, at, ..., log = FALSE)## S3 method for class 'distribution' density(x, at, ..., log = FALSE)
x |
The distribution(s). |
at |
The point at which to compute the density/mass. |
... |
Additional arguments passed to methods. |
log |
If |
Bernoulli distributions are used to represent events like coin flips
when there is single trial that is either successful or unsuccessful.
The Bernoulli distribution is a special case of the Binomial()
distribution with n = 1.
dist_bernoulli(prob)dist_bernoulli(prob)
prob |
The probability of success on each trial, |
We recommend reading this documentation on pkgdown which renders math nicely. https://pkg.mitchelloharawild.com/distributional/reference/dist_bernoulli.html
In the following, let be a Bernoulli random variable with parameter
prob = . Some textbooks also define , or use
instead of .
The Bernoulli probability distribution is widely used to model
binary variables, such as 'failure' and 'success'. The most
typical example is the flip of a coin, when is thought as the
probability of flipping a head, and is the
probability of flipping a tail.
Support:
Mean:
Variance:
Probability mass function (p.m.f):
Cumulative distribution function (c.d.f):
Moment generating function (m.g.f):
Skewness:
Excess Kurtosis:
dist <- dist_bernoulli(prob = c(0.05, 0.5, 0.3, 0.9, 0.1)) dist mean(dist) variance(dist) skewness(dist) kurtosis(dist) generate(dist, 10) density(dist, 2) density(dist, 2, log = TRUE) cdf(dist, 4) quantile(dist, 0.7)dist <- dist_bernoulli(prob = c(0.05, 0.5, 0.3, 0.9, 0.1)) dist mean(dist) variance(dist) skewness(dist) kurtosis(dist) generate(dist, 10) density(dist, 2) density(dist, 2, log = TRUE) cdf(dist, 4) quantile(dist, 0.7)
The Beta distribution is a continuous probability distribution defined on the interval [0, 1], commonly used to model probabilities and proportions.
dist_beta(shape1, shape2)dist_beta(shape1, shape2)
shape1, shape2
|
The non-negative shape parameters of the Beta distribution. |
We recommend reading this documentation on pkgdown which renders math nicely. https://pkg.mitchelloharawild.com/distributional/reference/dist_beta.html
In the following, let be a Beta random variable with parameters
shape1 = and shape2 = .
Support:
Mean:
Variance:
Probability density function (p.d.f):
where
is the Beta function.
Cumulative distribution function (c.d.f):
where is the regularized incomplete beta function and
.
Moment generating function (m.g.f):
The moment generating function does not have a simple closed form, but the moments can be calculated as:
dist <- dist_beta(shape1 = c(0.5, 5, 1, 2, 2), shape2 = c(0.5, 1, 3, 2, 5)) dist mean(dist) variance(dist) skewness(dist) kurtosis(dist) generate(dist, 10) density(dist, 2) density(dist, 2, log = TRUE) cdf(dist, 4) quantile(dist, 0.7)dist <- dist_beta(shape1 = c(0.5, 5, 1, 2, 2), shape2 = c(0.5, 1, 3, 2, 5)) dist mean(dist) variance(dist) skewness(dist) kurtosis(dist) generate(dist, 10) density(dist, 2) density(dist, 2, log = TRUE) cdf(dist, 4) quantile(dist, 0.7)
Binomial distributions are used to represent situations can that can
be thought as the result of Bernoulli experiments (here the
is defined as the size of the experiment). The classical
example is independent coin flips, where each coin flip has
probability p of success. In this case, the individual probability of
flipping heads or tails is given by the Bernoulli(p) distribution,
and the probability of having equal results ( heads,
for example), in trials is given by the Binomial(n, p) distribution.
The equation of the Binomial distribution is directly derived from
the equation of the Bernoulli distribution.
dist_binomial(size, prob)dist_binomial(size, prob)
size |
The number of trials. Must be an integer greater than or equal
to one. When |
prob |
The probability of success on each trial, |
We recommend reading this documentation on pkgdown which renders math nicely. https://pkg.mitchelloharawild.com/distributional/reference/dist_binomial.html
The Binomial distribution comes up when you are interested in the portion
of people who do a thing. The Binomial distribution
also comes up in the sign test, sometimes called the Binomial test
(see stats::binom.test()), where you may need the Binomial C.D.F. to
compute p-values.
In the following, let be a Binomial random variable with parameter
size = and p = . Some textbooks define ,
or called instead of .
Support:
Mean:
Variance:
Probability mass function (p.m.f):
Cumulative distribution function (c.d.f):
Moment generating function (m.g.f):
Skewness:
Excess kurtosis:
dist <- dist_binomial(size = 1:5, prob = c(0.05, 0.5, 0.3, 0.9, 0.1)) dist mean(dist) variance(dist) skewness(dist) kurtosis(dist) generate(dist, 10) density(dist, 2) density(dist, 2, log = TRUE) cdf(dist, 4) quantile(dist, 0.7)dist <- dist_binomial(size = 1:5, prob = c(0.05, 0.5, 0.3, 0.9, 0.1)) dist mean(dist) variance(dist) skewness(dist) kurtosis(dist) generate(dist, 10) density(dist, 2) density(dist, 2, log = TRUE) cdf(dist, 4) quantile(dist, 0.7)
The Burr distribution (Type XII) is a flexible continuous probability distribution often used for modeling income distributions, reliability data, and failure times.
dist_burr(shape1, shape2, rate = 1, scale = 1/rate)dist_burr(shape1, shape2, rate = 1, scale = 1/rate)
shape1, shape2, scale
|
parameters. Must be strictly positive. |
rate |
an alternative way to specify the scale. |
We recommend reading this documentation on pkgdown which renders math nicely. https://pkg.mitchelloharawild.com/distributional/reference/dist_burr.html
In the following, let be a Burr random variable with parameters
shape1 = , shape2 = , and rate = .
Support:
Mean: (for )
Variance: (for )
Probability density function (p.d.f):
Cumulative distribution function (c.d.f):
Quantile function:
Moment generating function (m.g.f):
Does not exist in closed form.
dist <- dist_burr(shape1 = c(1,1,1,2,3,0.5), shape2 = c(1,2,3,1,1,2)) dist mean(dist) variance(dist) support(dist) generate(dist, 10) density(dist, 2) density(dist, 2, log = TRUE) cdf(dist, 4) quantile(dist, 0.7)dist <- dist_burr(shape1 = c(1,1,1,2,3,0.5), shape2 = c(1,2,3,1,1,2)) dist mean(dist) variance(dist) support(dist) generate(dist, 10) density(dist, 2) density(dist, 2, log = TRUE) cdf(dist, 4) quantile(dist, 0.7)
Categorical distributions are used to represent events with multiple
outcomes, such as what number appears on the roll of a dice. This is also
referred to as the 'generalised Bernoulli' or 'multinoulli' distribution.
The Categorical distribution is a special case of the Multinomial()
distribution with n = 1.
dist_categorical(prob, outcomes = NULL)dist_categorical(prob, outcomes = NULL)
prob |
A list of probabilities of observing each outcome category. |
outcomes |
The list of vectors where each value represents each outcome. |
We recommend reading this documentation on pkgdown which renders math nicely. https://pkg.mitchelloharawild.com/distributional/reference/dist_categorical.html
In the following, let be a Categorical random variable with
probability parameters prob = .
The Categorical probability distribution is widely used to model the
occurance of multiple events. A simple example is the roll of a dice, where
giving equal chance of observing
each number on a 6 sided dice.
Support:
Mean: Not defined for unordered categories. For ordered categories with
integer outcomes , the mean is:
Variance: Not defined for unordered categories. For ordered categories
with integer outcomes , the variance is:
Probability mass function (p.m.f):
Cumulative distribution function (c.d.f):
The c.d.f is undefined for unordered categories. For ordered categories
with outcomes , the c.d.f is:
Moment generating function (m.g.f):
Skewness: Approximated numerically for ordered categories.
Kurtosis: Approximated numerically for ordered categories.
dist <- dist_categorical(prob = list(c(0.05, 0.5, 0.15, 0.2, 0.1), c(0.3, 0.1, 0.6))) dist generate(dist, 10) density(dist, 2) density(dist, 2, log = TRUE) # The outcomes aren't ordered, so many statistics are not applicable. cdf(dist, 0.6) quantile(dist, 0.7) mean(dist) variance(dist) skewness(dist) kurtosis(dist) # Some of these statistics are meaningful for ordered outcomes dist <- dist_categorical(list(rpois(26, 3)), list(ordered(letters))) dist cdf(dist, "m") quantile(dist, 0.5) dist <- dist_categorical( prob = list(c(0.05, 0.5, 0.15, 0.2, 0.1), c(0.3, 0.1, 0.6)), outcomes = list(letters[1:5], letters[24:26]) ) generate(dist, 10) density(dist, "a") density(dist, "z", log = TRUE)dist <- dist_categorical(prob = list(c(0.05, 0.5, 0.15, 0.2, 0.1), c(0.3, 0.1, 0.6))) dist generate(dist, 10) density(dist, 2) density(dist, 2, log = TRUE) # The outcomes aren't ordered, so many statistics are not applicable. cdf(dist, 0.6) quantile(dist, 0.7) mean(dist) variance(dist) skewness(dist) kurtosis(dist) # Some of these statistics are meaningful for ordered outcomes dist <- dist_categorical(list(rpois(26, 3)), list(ordered(letters))) dist cdf(dist, "m") quantile(dist, 0.5) dist <- dist_categorical( prob = list(c(0.05, 0.5, 0.15, 0.2, 0.1), c(0.3, 0.1, 0.6)), outcomes = list(letters[1:5], letters[24:26]) ) generate(dist, 10) density(dist, "a") density(dist, "z", log = TRUE)
The Cauchy distribution is the student's t distribution with one degree of freedom. The Cauchy distribution does not have a well defined mean or variance. Cauchy distributions often appear as priors in Bayesian contexts due to their heavy tails.
dist_cauchy(location, scale)dist_cauchy(location, scale)
location, scale
|
location and scale parameters. |
We recommend reading this documentation on pkgdown which renders math nicely. https://pkg.mitchelloharawild.com/distributional/reference/dist_cauchy.html
In the following, let be a Cauchy variable with mean
location = and scale = .
Support: , the set of all real numbers
Mean: Undefined.
Variance: Undefined.
Probability density function (p.d.f):
Cumulative distribution function (c.d.f):
Moment generating function (m.g.f):
Does not exist.
dist <- dist_cauchy(location = c(0, 0, 0, -2), scale = c(0.5, 1, 2, 1)) dist mean(dist) variance(dist) skewness(dist) kurtosis(dist) generate(dist, 10) density(dist, 2) density(dist, 2, log = TRUE) cdf(dist, 4) quantile(dist, 0.7)dist <- dist_cauchy(location = c(0, 0, 0, -2), scale = c(0.5, 1, 2, 1)) dist mean(dist) variance(dist) skewness(dist) kurtosis(dist) generate(dist, 10) density(dist, 2) density(dist, 2, log = TRUE) cdf(dist, 4) quantile(dist, 0.7)
Chi-square distributions show up often in frequentist settings as the sampling distribution of test statistics, especially in maximum likelihood estimation settings.
dist_chisq(df, ncp = 0)dist_chisq(df, ncp = 0)
df |
Degrees of freedom (non-centrality parameter). Can be any positive real number. |
ncp |
Non-centrality parameter. Can be any non-negative real number. Defaults to 0 (central chi-squared distribution). |
We recommend reading this documentation on pkgdown which renders math nicely. https://pkg.mitchelloharawild.com/distributional/reference/dist_chisq.html
In the following, let be a random variable with
df = and ncp = .
Support: , the set of positive real numbers
Mean:
Variance:
Probability density function (p.d.f):
For the central chi-squared distribution ():
For the non-central chi-squared distribution ():
where is the modified Bessel function of the first kind.
Cumulative distribution function (c.d.f):
For the central chi-squared distribution ():
where is the lower incomplete gamma function and
is the regularized gamma function.
For the non-central chi-squared distribution ():
This is approximated numerically.
Moment generating function (m.g.f):
For the central chi-squared distribution ():
For the non-central chi-squared distribution ():
Skewness:
For the central case (), this simplifies to
.
Excess Kurtosis:
For the central case (), this simplifies to
.
dist <- dist_chisq(df = c(1,2,3,4,6,9)) dist mean(dist) variance(dist) skewness(dist) kurtosis(dist) generate(dist, 10) density(dist, 2) density(dist, 2, log = TRUE) cdf(dist, 4) quantile(dist, 0.7)dist <- dist_chisq(df = c(1,2,3,4,6,9)) dist mean(dist) variance(dist) skewness(dist) kurtosis(dist) generate(dist, 10) density(dist, 2) density(dist, 2, log = TRUE) cdf(dist, 4) quantile(dist, 0.7)
The degenerate distribution takes a single value which is certain to be observed. It takes a single parameter, which is the value that is observed by the distribution.
dist_degenerate(x)dist_degenerate(x)
x |
The value of the distribution (location parameter). Can be any real number. |
We recommend reading this documentation on pkgdown which renders math nicely. https://pkg.mitchelloharawild.com/distributional/reference/dist_degenerate.html
In the following, let be a degenerate random variable with value
x = .
Support: , a single point
Mean:
Variance:
Probability density function (p.d.f):
Cumulative distribution function (c.d.f):
Moment generating function (m.g.f):
Skewness: Undefined (NA)
Excess Kurtosis: Undefined (NA)
dist <- dist_degenerate(x = 1:5) dist mean(dist) variance(dist) skewness(dist) kurtosis(dist) generate(dist, 10) density(dist, 2) density(dist, 2, log = TRUE) cdf(dist, 4) quantile(dist, 0.7)dist <- dist_degenerate(x = 1:5) dist mean(dist) variance(dist) skewness(dist) kurtosis(dist) generate(dist, 10) density(dist, 2) density(dist, 2, log = TRUE) cdf(dist, 4) quantile(dist, 0.7)
The Dirichlet distribution is a multivariate generalisation of the Beta
distribution. It is the conjugate prior of the Categorical and Multinomial
distributions, and describes a probability distribution over the
-simplex — the set of -dimensional vectors whose
components are non-negative and sum to one.
dist_dirichlet(alpha)dist_dirichlet(alpha)
alpha |
A list of positive numeric concentration vectors. |
We recommend reading this documentation on pkgdown which renders math nicely. https://pkg.mitchelloharawild.com/distributional/reference/dist_dirichlet.html
In the following, let be a
Dirichlet random variable with concentration parameter
alpha = ,
where each .
Support: on the -simplex,
i.e. and .
Mean: where
.
Variance:
Covariance:
Probability density function (p.d.f):
where
is the multivariate Beta function.
LaplacesDemon::ddirichlet(), LaplacesDemon::rdirichlet()
dist <- dist_dirichlet(alpha = list(c(2, 5, 3))) dist mean(dist) variance(dist) support(dist) generate(dist, 10) density(dist, cbind(0.2, 0.5, 0.3)) density(dist, cbind(0.2, 0.5, 0.3), log = TRUE)dist <- dist_dirichlet(alpha = list(c(2, 5, 3))) dist mean(dist) variance(dist) support(dist) generate(dist, 10) density(dist, cbind(0.2, 0.5, 0.3)) density(dist, cbind(0.2, 0.5, 0.3), log = TRUE)
Exponential distributions are frequently used to model waiting times and the time between events in a Poisson process.
dist_exponential(rate)dist_exponential(rate)
rate |
vector of rates. |
We recommend reading this documentation on pkgdown which renders math nicely. https://pkg.mitchelloharawild.com/distributional/reference/dist_exponential.html
In the following, let be an Exponential random variable with
parameter rate = .
Support:
Mean:
Variance:
Probability density function (p.d.f):
Cumulative distribution function (c.d.f):
Moment generating function (m.g.f):
dist <- dist_exponential(rate = c(2, 1, 2/3)) dist mean(dist) variance(dist) skewness(dist) kurtosis(dist) generate(dist, 10) density(dist, 2) density(dist, 2, log = TRUE) cdf(dist, 4) quantile(dist, 0.7)dist <- dist_exponential(rate = c(2, 1, 2/3)) dist mean(dist) variance(dist) skewness(dist) kurtosis(dist) generate(dist, 10) density(dist, 2) density(dist, 2, log = TRUE) cdf(dist, 4) quantile(dist, 0.7)
The F distribution is commonly used in statistical inference, particularly in the analysis of variance (ANOVA), testing the equality of variances, and in regression analysis. It arises as the ratio of two scaled chi-squared distributions divided by their respective degrees of freedom.
dist_f(df1, df2, ncp = NULL)dist_f(df1, df2, ncp = NULL)
df1 |
Degrees of freedom for the numerator. Can be any positive number. |
df2 |
Degrees of freedom for the denominator. Can be any positive number. |
ncp |
Non-centrality parameter. If |
We recommend reading this documentation on pkgdown which renders math nicely. https://pkg.mitchelloharawild.com/distributional/reference/dist_f.html
In the following, let be an F random variable with numerator
degrees of freedom df1 = and denominator degrees of freedom
df2 = .
Support:
Mean:
For the central F distribution (ncp = NULL):
for , otherwise undefined.
For the non-central F distribution with non-centrality parameter
ncp = :
for , otherwise undefined.
Variance:
For the central F distribution (ncp = NULL):
for , otherwise undefined.
For the non-central F distribution with non-centrality parameter
ncp = :
for , otherwise undefined.
Skewness:
For the central F distribution (ncp = NULL):
for , otherwise undefined.
For the non-central F distribution, skewness has no simple closed form and is not computed.
Excess Kurtosis:
For the central F distribution (ncp = NULL):
for , otherwise undefined.
For the non-central F distribution, kurtosis has no simple closed form and is not computed.
Probability density function (p.d.f):
For the central F distribution (ncp = NULL):
where is the beta function.
For the non-central F distribution, the density involves an infinite series and is approximated numerically.
Cumulative distribution function (c.d.f):
The c.d.f. does not have a simple closed form expression and is approximated numerically using regularized incomplete beta functions and related special functions.
Moment generating function (m.g.f):
The moment generating function for the F distribution does not exist
in general (it diverges for ).
dist <- dist_f(df1 = c(1,2,5,10,100), df2 = c(1,1,2,1,100)) dist mean(dist) variance(dist) skewness(dist) kurtosis(dist) generate(dist, 10) density(dist, 2) density(dist, 2, log = TRUE) cdf(dist, 4) quantile(dist, 0.7)dist <- dist_f(df1 = c(1,2,5,10,100), df2 = c(1,1,2,1,100)) dist mean(dist) variance(dist) skewness(dist) kurtosis(dist) generate(dist, 10) density(dist, 2) density(dist, 2, log = TRUE) cdf(dist, 4) quantile(dist, 0.7)
Several important distributions are special cases of the Gamma
distribution. When the shape parameter is 1, the Gamma is an
exponential distribution with parameter . When the
and , the Gamma is a equivalent to
a chi squared distribution with n degrees of freedom. Moreover, if
we have is and
is , a function of these two variables
of the form .
This last property frequently appears in another distributions, and it
has extensively been used in multivariate methods. More about the Gamma
distribution will be added soon.
dist_gamma(shape, rate = 1/scale, scale = 1/rate)dist_gamma(shape, rate = 1/scale, scale = 1/rate)
shape, scale
|
shape and scale parameters. Must be positive,
|
rate |
an alternative way to specify the scale. |
We recommend reading this documentation on pkgdown which renders math nicely. https://pkg.mitchelloharawild.com/distributional/reference/dist_gamma.html
In the following, let be a Gamma random variable
with parameters
shape = and
rate = .
Support:
Mean:
Variance:
Probability density function (p.m.f):
Cumulative distribution function (c.d.f):
Moment generating function (m.g.f):
dist <- dist_gamma(shape = c(1,2,3,5,9,7.5,0.5), rate = c(0.5,0.5,0.5,1,2,1,1)) dist mean(dist) variance(dist) skewness(dist) kurtosis(dist) generate(dist, 10) density(dist, 2) density(dist, 2, log = TRUE) cdf(dist, 4) quantile(dist, 0.7)dist <- dist_gamma(shape = c(1,2,3,5,9,7.5,0.5), rate = c(0.5,0.5,0.5,1,2,1,1)) dist mean(dist) variance(dist) skewness(dist) kurtosis(dist) generate(dist, 10) density(dist, 2) density(dist, 2, log = TRUE) cdf(dist, 4) quantile(dist, 0.7)
The Geometric distribution can be thought of as a generalization
of the dist_bernoulli() distribution where we ask: "if I keep flipping a
coin with probability p of heads, what is the probability I need
flips before I get my first heads?" The Geometric
distribution is a special case of Negative Binomial distribution.
dist_geometric(prob)dist_geometric(prob)
prob |
probability of success in each trial. |
We recommend reading this documentation on pkgdown which renders math nicely. https://pkg.mitchelloharawild.com/distributional/reference/dist_geometric.html
In the following, let be a Geometric random variable with
success probability prob = . Note that there are multiple
parameterizations of the Geometric distribution.
Support:
Mean:
Variance:
Probability mass function (p.m.f):
Cumulative distribution function (c.d.f):
Moment generating function (m.g.f):
Skewness:
Excess Kurtosis:
dist <- dist_geometric(prob = c(0.2, 0.5, 0.8)) dist mean(dist) variance(dist) skewness(dist) kurtosis(dist) generate(dist, 10) density(dist, 2) density(dist, 2, log = TRUE) cdf(dist, 4) quantile(dist, 0.7)dist <- dist_geometric(prob = c(0.2, 0.5, 0.8)) dist mean(dist) variance(dist) skewness(dist) kurtosis(dist) generate(dist, 10) density(dist, 2) density(dist, 2, log = TRUE) cdf(dist, 4) quantile(dist, 0.7)
The GEV distribution is widely used in extreme value theory to model the distribution of maxima (or minima) of samples. The parametric form encompasses the Gumbel, Frechet, and reverse Weibull distributions.
dist_gev(location, scale, shape)dist_gev(location, scale, shape)
location |
the location parameter |
scale |
the scale parameter |
shape |
the shape parameter |
We recommend reading this documentation on pkgdown which renders math nicely. https://pkg.mitchelloharawild.com/distributional/reference/dist_gev.html
In the following, let be a GEV random variable with parameters
location = , scale = , and shape = .
Support:
(all real numbers) if
if
if
Mean:
where is the Euler-Mascheroni constant and
is the gamma function.
Median:
Variance:
Probability density function (p.d.f):
For (Gumbel):
For :
where .
Cumulative distribution function (c.d.f):
For (Gumbel):
For :
where .
Quantile function:
For (Gumbel):
For :
Jenkinson, A. F. (1955) The frequency distribution of the annual maximum (or minimum) of meteorological elements. Quart. J. R. Met. Soc., 81, 158–171.
# Create GEV distributions with different shape parameters # Gumbel distribution (shape = 0) gumbel <- dist_gev(location = 0, scale = 1, shape = 0) # Frechet distribution (shape > 0, heavy-tailed) frechet <- dist_gev(location = 0, scale = 1, shape = 0.3) # Reverse Weibull distribution (shape < 0, bounded above) weibull <- dist_gev(location = 0, scale = 1, shape = -0.2) dist <- c(gumbel, frechet, weibull) dist # Statistical properties mean(dist) median(dist) variance(dist) # Generate random samples generate(dist, 10) # Evaluate density density(dist, 2) density(dist, 2, log = TRUE) # Evaluate cumulative distribution cdf(dist, 4) # Calculate quantiles quantile(dist, 0.95)# Create GEV distributions with different shape parameters # Gumbel distribution (shape = 0) gumbel <- dist_gev(location = 0, scale = 1, shape = 0) # Frechet distribution (shape > 0, heavy-tailed) frechet <- dist_gev(location = 0, scale = 1, shape = 0.3) # Reverse Weibull distribution (shape < 0, bounded above) weibull <- dist_gev(location = 0, scale = 1, shape = -0.2) dist <- c(gumbel, frechet, weibull) dist # Statistical properties mean(dist) median(dist) variance(dist) # Generate random samples generate(dist, 10) # Evaluate density density(dist, 2) density(dist, 2, log = TRUE) # Evaluate cumulative distribution cdf(dist, 4) # Calculate quantiles quantile(dist, 0.95)
The generalised g-and-h distribution is a flexible distribution used to model univariate data, similar to the g-k distribution. It is known for its ability to handle skewness and heavy-tailed behavior.
dist_gh(A, B, g, h, c = 0.8)dist_gh(A, B, g, h, c = 0.8)
A |
Vector of A (location) parameters. |
B |
Vector of B (scale) parameters. Must be positive. |
g |
Vector of g parameters. |
h |
Vector of h parameters. Must be non-negative. |
c |
Vector of c parameters (used for generalised g-and-h). Often fixed at 0.8 which is the default. |
We recommend reading this documentation on pkgdown which renders math nicely. https://pkg.mitchelloharawild.com/distributional/reference/dist_gh.html
In the following, let be a g-and-h random variable with parameters
A = , B = , g = , h = , and c = .
Support:
Mean: Does not have a closed-form expression. Approximated numerically.
Variance: Does not have a closed-form expression. Approximated numerically.
Probability density function (p.d.f):
The g-and-h distribution does not have a closed-form expression for its density. The density is approximated numerically from the quantile function. The distribution is defined through its quantile function:
where is the standard normal quantile function.
Cumulative distribution function (c.d.f):
Does not have a closed-form expression. The cumulative distribution function is approximated numerically by inverting the quantile function.
Quantile function:
where is the standard normal quantile function.
gk::dgh(), gk::pgh(), gk::qgh(), gk::rgh(), dist_gk()
dist <- dist_gh(A = 0, B = 1, g = 0, h = 0.5) dist mean(dist) variance(dist) support(dist) generate(dist, 10) density(dist, 2) density(dist, 2, log = TRUE) cdf(dist, 4) quantile(dist, 0.7)dist <- dist_gh(A = 0, B = 1, g = 0, h = 0.5) dist mean(dist) variance(dist) support(dist) generate(dist, 10) density(dist, 2) density(dist, 2, log = TRUE) cdf(dist, 4) quantile(dist, 0.7)
The g-and-k distribution is a flexible distribution often used to model univariate data. It is particularly known for its ability to handle skewness and heavy-tailed behavior.
dist_gk(A, B, g, k, c = 0.8)dist_gk(A, B, g, k, c = 0.8)
A |
Vector of A (location) parameters. |
B |
Vector of B (scale) parameters. Must be positive. |
g |
Vector of g parameters. |
k |
Vector of k parameters. Must be at least -0.5. |
c |
Vector of c parameters. Often fixed at 0.8 which is the default. |
We recommend reading this documentation on pkgdown which renders math nicely. https://pkg.mitchelloharawild.com/distributional/reference/dist_gk.html
In the following, let be a g-k random variable with parameters
A, B, g, k, and c.
Support:
Mean: Not available in closed form.
Variance: Not available in closed form.
Probability density function (p.d.f):
The g-k distribution does not have a closed-form expression for its density. Instead, it is defined through its quantile function:
where , the standard normal quantile of u.
Cumulative distribution function (c.d.f):
The cumulative distribution function is typically evaluated numerically due to the lack of a closed-form expression.
dist <- dist_gk(A = 0, B = 1, g = 0, k = 0.5) dist mean(dist) variance(dist) support(dist) generate(dist, 10) density(dist, 2) density(dist, 2, log = TRUE) cdf(dist, 4) quantile(dist, 0.7)dist <- dist_gk(A = 0, B = 1, g = 0, k = 0.5) dist mean(dist) variance(dist) support(dist) generate(dist, 10) density(dist, 2) density(dist, 2, log = TRUE) cdf(dist, 4) quantile(dist, 0.7)
The GPD distribution is commonly used to model the tails of distributions, particularly in extreme value theory.
The Pickands–Balkema–De Haan theorem states that for a large class of distributions, the tail (above some threshold) can be approximated by a GPD.
dist_gpd(location, scale, shape)dist_gpd(location, scale, shape)
location |
the location parameter |
scale |
the scale parameter |
shape |
the shape parameter |
We recommend reading this documentation on pkgdown which renders math nicely. https://pkg.mitchelloharawild.com/distributional/reference/dist_gpd.html
In the following, let be a Generalized Pareto random variable with
parameters location = , scale = , and
shape = .
Support:
if ,
if
Mean:
for
Variance:
for
Probability density function (p.d.f):
For :
For :
where
Cumulative distribution function (c.d.f):
For :
For :
where
Quantile function:
For :
For :
Median:
For :
For :
Skewness and Kurtosis: No closed-form expressions; approximated numerically.
dist <- dist_gpd(location = 0, scale = 1, shape = 0) dist mean(dist) variance(dist) generate(dist, 10) density(dist, 2) density(dist, 2, log = TRUE) cdf(dist, 4) quantile(dist, 0.7)dist <- dist_gpd(location = 0, scale = 1, shape = 0) dist mean(dist) variance(dist) generate(dist, 10) density(dist, 2) density(dist, 2, log = TRUE) cdf(dist, 4) quantile(dist, 0.7)
The Gumbel distribution is a special case of the Generalized Extreme Value
distribution, obtained when the GEV shape parameter is equal to 0.
It may be referred to as a type I extreme value distribution.
dist_gumbel(alpha, scale)dist_gumbel(alpha, scale)
alpha |
location parameter. |
scale |
parameter. Must be strictly positive. |
We recommend reading this documentation on pkgdown which renders math nicely. https://pkg.mitchelloharawild.com/distributional/reference/dist_gumbel.html
In the following, let be a Gumbel random variable with location
parameter alpha = and scale parameter scale = .
Support: , the set of all real numbers.
Mean:
where is the Euler-Mascheroni constant,
approximately equal to 0.5772157.
Variance:
Skewness:
where is Apery's constant,
approximately equal to 1.2020569. Note that skewness is independent
of the distribution parameters.
Kurtosis (excess):
Note that excess kurtosis is independent of the distribution parameters.
Median:
Probability density function (p.d.f):
for in , the set of all real numbers.
Cumulative distribution function (c.d.f):
for in , the set of all real numbers.
Quantile function (inverse c.d.f):
for in (0, 1).
Moment generating function (m.g.f):
for , where is the gamma function.
actuar::Gumbel, actuar::dgumbel(), actuar::pgumbel(),
actuar::qgumbel(), actuar::rgumbel(), actuar::mgumbel()
dist <- dist_gumbel(alpha = c(0.5, 1, 1.5, 3), scale = c(2, 2, 3, 4)) dist mean(dist) variance(dist) skewness(dist) kurtosis(dist) support(dist) generate(dist, 10) density(dist, 2) density(dist, 2, log = TRUE) cdf(dist, 4) quantile(dist, 0.7)dist <- dist_gumbel(alpha = c(0.5, 1, 1.5, 3), scale = c(2, 2, 3, 4)) dist mean(dist) variance(dist) skewness(dist) kurtosis(dist) support(dist) generate(dist, 10) density(dist, 2) density(dist, 2, log = TRUE) cdf(dist, 4) quantile(dist, 0.7)
The horseshoe distribution (Carvalho et al., 2008) is a heavy-tailed continuous distribution defined as a scale mixture of normals. It is primarily used as a shrinkage prior in sparse Bayesian regression, where it concentrates mass near zero while retaining heavy tails that leave large signals unshrunk.
dist_horseshoe(lambda, tau)dist_horseshoe(lambda, tau)
lambda |
A positive numeric vector of local scale parameters
|
tau |
A positive scalar global scale parameter |
We recommend reading this documentation on pkgdown which renders math nicely. https://pkg.mitchelloharawild.com/distributional/reference/dist_horseshoe.html
In the following, let be a horseshoe random variable with local
scale parameter lambda = and global scale parameter
tau = .
Support: , the set of all real numbers.
Mean: — not available in closed form.
Variance: — not available in closed form.
Probability density function (p.d.f):
The horseshoe density does not have a simple closed form but can be expressed as a scale mixture:
where the half-Cauchy hyperprior induces the
characteristic horseshoe shrinkage behaviour.
Carvalho, C.M., Polson, N.G., and Scott, J.G. (2008). "The Horseshoe Estimator for Sparse Signals". Discussion Paper 2008-31. Duke University Department of Statistical Science.
Carvalho, C.M., Polson, N.G., and Scott, J.G. (2009). "Handling Sparsity via the Horseshoe". Journal of Machine Learning Research, 5, p. 73–80.
LaplacesDemon::dhs(), LaplacesDemon::rhs()
dist <- dist_horseshoe(lambda = c(0.5, 1, 2), tau = 1) dist support(dist) generate(dist, 10) density(dist, 0) density(dist, 0, log = TRUE)dist <- dist_horseshoe(lambda = c(0.5, 1, 2), tau = 1) dist support(dist) generate(dist, 10) density(dist, 0) density(dist, 0, log = TRUE)
To understand the HyperGeometric distribution, consider a set of
objects, of which are of the type I and
are of the type II. A sample with size ()
with no replacement is randomly chosen. The number of observed
type I elements observed in this sample is set to be our random
variable .
dist_hypergeometric(m, n, k)dist_hypergeometric(m, n, k)
m |
The number of type I elements available. |
n |
The number of type II elements available. |
k |
The size of the sample taken. |
We recommend reading this documentation on pkgdown which renders math nicely. https://pkg.mitchelloharawild.com/distributional/reference/dist_hypergeometric.html
In the following, let be a HyperGeometric random variable with
success probability p = .
Support:
Mean:
Variance:
Probability mass function (p.m.f):
Cumulative distribution function (c.d.f):
Moment generating function (m.g.f):
where is the hypergeometric function.
Skewness:
dist <- dist_hypergeometric(m = rep(500, 3), n = c(50, 60, 70), k = c(100, 200, 300)) dist mean(dist) variance(dist) skewness(dist) kurtosis(dist) generate(dist, 10) density(dist, 2) density(dist, 2, log = TRUE) cdf(dist, 4) quantile(dist, 0.7)dist <- dist_hypergeometric(m = rep(500, 3), n = c(50, 60, 70), k = c(100, 200, 300)) dist mean(dist) variance(dist) skewness(dist) kurtosis(dist) generate(dist, 10) density(dist, 2) density(dist, 2, log = TRUE) cdf(dist, 4) quantile(dist, 0.7)
Inflated distributions add extra probability mass at a specific value, most commonly zero (zero-inflation). These distributions are useful for modeling data with excess observations at a particular value compared to what the base distribution would predict. Common applications include zero-inflated Poisson or negative binomial models for count data with many zeros.
dist_inflated(dist, prob, x = 0)dist_inflated(dist, prob, x = 0)
dist |
The distribution(s) to inflate. |
prob |
The added probability of observing |
x |
The value to inflate. The default of |
We recommend reading this documentation on pkgdown which renders math nicely. https://pkg.mitchelloharawild.com/distributional/reference/dist_inflated.html
In the following, let be an inflated random variable based on
a base distribution , with inflation value x = and
inflation probability prob = .
Support: Same as the base distribution, but with additional
probability mass at
Mean: (when x is numeric)
Variance: (when x = 0)
For non-zero inflation values, the variance is not computed in closed form.
Probability mass/density function (p.m.f/p.d.f):
For discrete distributions:
For continuous distributions:
Cumulative distribution function (c.d.f):
Quantile function:
The quantile function is computed numerically by inverting the inflated CDF, accounting for the jump in probability at the inflation point.
# Zero-inflated Poisson dist <- dist_inflated(dist_poisson(lambda = 2), prob = 0.3, x = 0) dist mean(dist) variance(dist) generate(dist, 10) density(dist, 0) density(dist, 1) cdf(dist, 2) quantile(dist, 0.5)# Zero-inflated Poisson dist <- dist_inflated(dist_poisson(lambda = 2), prob = 0.3, x = 0) dist mean(dist) variance(dist) generate(dist, 10) density(dist, 0) density(dist, 1) cdf(dist, 2) quantile(dist, 0.5)
The Inverse Exponential distribution is used to model the reciprocal of exponentially distributed variables.
dist_inverse_exponential(rate)dist_inverse_exponential(rate)
rate |
an alternative way to specify the scale. |
We recommend reading this documentation on pkgdown which renders math nicely. https://pkg.mitchelloharawild.com/distributional/reference/dist_inverse_exponential.html
In the following, let be an Inverse Exponential random variable
with parameter rate = .
Support:
Mean: Does not exist, returns NA
Variance: Does not exist, returns NA
Probability density function (p.d.f):
Cumulative distribution function (c.d.f):
Quantile function (inverse c.d.f):
Moment generating function (m.g.f):
Does not exist (divergent integral).
dist <- dist_inverse_exponential(rate = 1:5) dist mean(dist) variance(dist) support(dist) generate(dist, 10) density(dist, 2) density(dist, 2, log = TRUE) cdf(dist, 4) quantile(dist, 0.7)dist <- dist_inverse_exponential(rate = 1:5) dist mean(dist) variance(dist) support(dist) generate(dist, 10) density(dist, 2) density(dist, 2, log = TRUE) cdf(dist, 4) quantile(dist, 0.7)
The Inverse Gamma distribution is commonly used as a prior distribution in Bayesian statistics, particularly for variance parameters.
dist_inverse_gamma(shape, rate = 1/scale, scale)dist_inverse_gamma(shape, rate = 1/scale, scale)
shape, scale
|
parameters. Must be strictly positive. |
rate |
an alternative way to specify the scale. |
We recommend reading this documentation on pkgdown which renders math nicely. https://pkg.mitchelloharawild.com/distributional/reference/dist_inverse_gamma.html
In the following, let be an Inverse Gamma random variable with
shape parameter shape = and rate parameter
rate = (equivalently, scale = ).
Support:
Mean: for ,
otherwise undefined
Variance:
for , otherwise undefined
Probability density function (p.d.f):
Cumulative distribution function (c.d.f):
where is the upper incomplete gamma function and
is the regularized incomplete gamma function.
Moment generating function (m.g.f):
for , where is the modified Bessel function
of the second kind. The MGF does not exist for .
dist <- dist_inverse_gamma(shape = c(1,2,3,3), rate = c(1,1,1,2)) dist mean(dist) variance(dist) support(dist) generate(dist, 10) density(dist, 2) density(dist, 2, log = TRUE) cdf(dist, 4) quantile(dist, 0.7)dist <- dist_inverse_gamma(shape = c(1,2,3,3), rate = c(1,1,1,2)) dist mean(dist) variance(dist) support(dist) generate(dist, 10) density(dist, 2) density(dist, 2, log = TRUE) cdf(dist, 4) quantile(dist, 0.7)
dist_inverse_gaussian(mean, shape)dist_inverse_gaussian(mean, shape)
mean, shape
|
parameters. Must be strictly positive. Infinite values are supported. |
The inverse Gaussian distribution (also known as the Wald distribution) is commonly used to model positive-valued data, particularly in contexts involving first passage times and reliability analysis.
We recommend reading this documentation on pkgdown which renders math nicely. https://pkg.mitchelloharawild.com/distributional/reference/dist_inverse_gaussian.html
In the following, let be an Inverse Gaussian random variable with
parameters mean = and shape = .
Support:
Mean:
Variance:
Probability density function (p.d.f):
Cumulative distribution function (c.d.f):
where is the standard normal c.d.f.
Moment generating function (m.g.f):
for .
Skewness:
Excess Kurtosis:
Quantiles: No closed-form expression, approximated numerically.
dist <- dist_inverse_gaussian(mean = c(1,1,1,3,3), shape = c(0.2, 1, 3, 0.2, 1)) dist mean(dist) variance(dist) support(dist) generate(dist, 10) density(dist, 2) density(dist, 2, log = TRUE) cdf(dist, 4) quantile(dist, 0.7)dist <- dist_inverse_gaussian(mean = c(1,1,1,3,3), shape = c(0.2, 1, 3, 0.2, 1)) dist mean(dist) variance(dist) support(dist) generate(dist, 10) density(dist, 2) density(dist, 2, log = TRUE) cdf(dist, 4) quantile(dist, 0.7)
The Laplace distribution, also known as the double exponential distribution, is a continuous probability distribution that is symmetric around its location parameter.
dist_laplace(mu, sigma)dist_laplace(mu, sigma)
mu |
The location parameter (mean) of the Laplace distribution. |
sigma |
The positive scale parameter of the Laplace distribution. |
We recommend reading this documentation on pkgdown which renders math nicely. https://pkg.mitchelloharawild.com/distributional/reference/dist_laplace.html
In the following, let be a Laplace random variable with location
parameter mu = and scale parameter sigma = .
Support: , the set of all real numbers
Mean:
Variance:
Probability density function (p.d.f):
Cumulative distribution function (c.d.f):
Moment generating function (m.g.f):
dist <- dist_laplace(mu = c(0, 2, -1), sigma = c(1, 2, 0.5)) dist mean(dist) variance(dist) skewness(dist) kurtosis(dist) generate(dist, 10) density(dist, 0) density(dist, 0, log = TRUE) cdf(dist, 1) quantile(dist, 0.7)dist <- dist_laplace(mu = c(0, 2, -1), sigma = c(1, 2, 0.5)) dist mean(dist) variance(dist) skewness(dist) kurtosis(dist) generate(dist, 10) density(dist, 0) density(dist, 0, log = TRUE) cdf(dist, 1) quantile(dist, 0.7)
The Logarithmic distribution is a discrete probability distribution derived from the logarithmic series. It is useful in modeling the abundance of species and other phenomena where the frequency of an event follows a logarithmic pattern.
dist_logarithmic(prob)dist_logarithmic(prob)
prob |
parameter. |
We recommend reading this documentation on pkgdown which renders math nicely. https://pkg.mitchelloharawild.com/distributional/reference/dist_logarithmic.html
In the following, let be a Logarithmic random variable with
parameter prob = .
Support:
Mean:
Variance:
Probability mass function (p.m.f):
for
Cumulative distribution function (c.d.f):
The c.d.f. does not have a simple closed form. It is computed
using the recurrence relationship
starting from .
Moment generating function (m.g.f):
for
dist <- dist_logarithmic(prob = c(0.33, 0.66, 0.99)) dist mean(dist) variance(dist) support(dist) generate(dist, 10) density(dist, 2) density(dist, 2, log = TRUE) cdf(dist, 4) quantile(dist, 0.7)dist <- dist_logarithmic(prob = c(0.33, 0.66, 0.99)) dist mean(dist) variance(dist) support(dist) generate(dist, 10) density(dist, 2) density(dist, 2, log = TRUE) cdf(dist, 4) quantile(dist, 0.7)
A continuous distribution on the real line. For binary outcomes
the model given by where
is the Logistic cdf() is called logistic regression.
dist_logistic(location, scale)dist_logistic(location, scale)
location, scale
|
location and scale parameters. |
We recommend reading this documentation on pkgdown which renders math nicely. https://pkg.mitchelloharawild.com/distributional/reference/dist_logistic.html
In the following, let be a Logistic random variable with
location = and scale = .
Support: , the set of all real numbers
Mean:
Variance:
Probability density function (p.d.f):
Cumulative distribution function (c.d.f):
Moment generating function (m.g.f):
for , where is the Beta function.
dist <- dist_logistic(location = c(5,9,9,6,2), scale = c(2,3,4,2,1)) dist mean(dist) variance(dist) skewness(dist) kurtosis(dist) generate(dist, 10) density(dist, 2) density(dist, 2, log = TRUE) cdf(dist, 4) quantile(dist, 0.7)dist <- dist_logistic(location = c(5,9,9,6,2), scale = c(2,3,4,2,1)) dist mean(dist) variance(dist) skewness(dist) kurtosis(dist) generate(dist, 10) density(dist, 2) density(dist, 2, log = TRUE) cdf(dist, 4) quantile(dist, 0.7)
The log-normal distribution is a commonly used transformation of the Normal
distribution. If follows a log-normal distribution, then
would be characterised by a Normal distribution.
dist_lognormal(mu = 0, sigma = 1)dist_lognormal(mu = 0, sigma = 1)
mu |
The mean (location parameter) of the distribution, which is the mean of the associated Normal distribution. Can be any real number. |
sigma |
The standard deviation (scale parameter) of the distribution. Can be any positive number. |
We recommend reading this documentation on pkgdown which renders math nicely. https://pkg.mitchelloharawild.com/distributional/reference/dist_lognormal.html
In the following, let be a log-normal random variable with
mu = and sigma = .
Support: , the set of positive real numbers.
Mean:
Variance:
Skewness:
Excess Kurtosis:
Probability density function (p.d.f):
Cumulative distribution function (c.d.f):
where is the c.d.f. of the standard Normal distribution.
Moment generating function (m.g.f):
Does not exist in closed form.
dist <- dist_lognormal(mu = 1:5, sigma = 0.1) dist mean(dist) variance(dist) skewness(dist) kurtosis(dist) generate(dist, 10) density(dist, 2) density(dist, 2, log = TRUE) cdf(dist, 4) quantile(dist, 0.7) # A log-normal distribution X is exp(Y), where Y is a Normal distribution of # the same parameters. So log(X) will produce the Normal distribution Y. log(dist)dist <- dist_lognormal(mu = 1:5, sigma = 0.1) dist mean(dist) variance(dist) skewness(dist) kurtosis(dist) generate(dist, 10) density(dist, 2) density(dist, 2, log = TRUE) cdf(dist, 4) quantile(dist, 0.7) # A log-normal distribution X is exp(Y), where Y is a Normal distribution of # the same parameters. So log(X) will produce the Normal distribution Y. log(dist)
A placeholder distribution for handling missing values in a vector of distributions.
dist_missing(length = 1)dist_missing(length = 1)
length |
The number of missing distributions |
We recommend reading this documentation on pkgdown which renders math nicely. https://pkg.mitchelloharawild.com/distributional/reference/dist_missing.html
The missing distribution represents the absence of distributional
information. It is used as a placeholder when distribution values are
not available or not applicable, similar to how NA is used for missing
scalar values.
Support: Undefined
Mean:
Variance:
Skewness:
Kurtosis:
Probability density function (p.d.f): Undefined
Cumulative distribution function (c.d.f): Undefined
Quantile function: Undefined
Moment generating function (m.g.f): Undefined
All statistical operations on missing distributions return NA values
of appropriate length, propagating the missingness through calculations.
dist <- dist_missing(3L) dist mean(dist) variance(dist) generate(dist, 10) density(dist, 2) density(dist, 2, log = TRUE) cdf(dist, 4) quantile(dist, 0.7)dist <- dist_missing(3L) dist mean(dist) variance(dist) generate(dist, 10) density(dist, 2) density(dist, 2, log = TRUE) cdf(dist, 4) quantile(dist, 0.7)
A mixture distribution combines multiple component distributions with specified weights. The resulting distribution can model complex, multimodal data by representing it as a weighted sum of simpler distributions.
dist_mixture(..., weights = numeric())dist_mixture(..., weights = numeric())
... |
Distributions to be used in the mixture. Can be any distributional objects. |
weights |
A numeric vector of non-negative weights that sum to 1.
The length must match the number of distributions passed to |
In the following, let be a mixture random variable composed
of component distributions with
corresponding weights where
and for all .
Support: The union of the supports of all component distributions
Mean:
For univariate mixtures:
where is the mean of the -th component distribution.
For multivariate mixtures:
where is the mean vector of the -th
component distribution.
Variance:
For univariate mixtures:
where is the variance of the -th component
distribution.
Covariance:
For multivariate mixtures:
where
is the overall mean vector and is the
covariance matrix of the -th component distribution.
Probability density/mass function (p.d.f/p.m.f):
where is the density or mass function of the -th
component distribution.
Cumulative distribution function (c.d.f):
For univariate mixtures:
where is the c.d.f. of the -th component
distribution.
For multivariate mixtures, the c.d.f. is approximated numerically.
Quantile function:
For univariate mixtures, the quantile function has no closed form
and is computed numerically by inverting the c.d.f. using root-finding
(stats::uniroot()).
For multivariate mixtures, quantiles are not yet implemented.
stats::uniroot(), vctrs::vec_unique_count()
# Univariate mixture of two normal distributions dist <- dist_mixture(dist_normal(0, 1), dist_normal(5, 2), weights = c(0.3, 0.7)) dist mean(dist) variance(dist) density(dist, 2) cdf(dist, 2) quantile(dist, 0.5) generate(dist, 10)# Univariate mixture of two normal distributions dist <- dist_mixture(dist_normal(0, 1), dist_normal(5, 2), weights = c(0.3, 0.7)) dist mean(dist) variance(dist) density(dist, 2) cdf(dist, 2) quantile(dist, 0.5) generate(dist, 10)
The multinomial distribution is a generalization of the binomial
distribution to multiple categories. It is perhaps easiest to think
that we first extend a dist_bernoulli() distribution to include more
than two categories, resulting in a dist_categorical() distribution.
We then extend repeat the Categorical experiment several ()
times.
dist_multinomial(size, prob)dist_multinomial(size, prob)
size |
The number of draws from the Categorical distribution. |
prob |
The probability of an event occurring from each draw. |
We recommend reading this documentation on pkgdown which renders math nicely. https://pkg.mitchelloharawild.com/distributional/reference/dist_multinomial.html
In the following, let be a Multinomial
random variable with success probability prob = . Note that
is vector with elements that sum to one. Assume
that we repeat the Categorical experiment size = times.
Support: Each is in .
Mean: The mean of is .
Variance: The variance of is .
For , the covariance of and
is .
Probability mass function (p.m.f):
where and .
Cumulative distribution function (c.d.f):
The c.d.f. is computed as a finite sum of the p.m.f. over all integer vectors in the support that satisfy the componentwise inequalities.
Moment generating function (m.g.f):
where is a vector of the same dimension as .
Skewness: The skewness of is
Excess Kurtosis: The excess kurtosis of is
stats::dmultinom(), stats::rmultinom()
dist <- dist_multinomial(size = c(4, 3), prob = list(c(0.3, 0.5, 0.2), c(0.1, 0.5, 0.4))) dist mean(dist) variance(dist) generate(dist, 10) density(dist, list(d = rbind(cbind(1,2,1), cbind(0,2,1)))) density(dist, list(d = rbind(cbind(1,2,1), cbind(0,2,1))), log = TRUE) cdf(dist, cbind(1,2,1))dist <- dist_multinomial(size = c(4, 3), prob = list(c(0.3, 0.5, 0.2), c(0.1, 0.5, 0.4))) dist mean(dist) variance(dist) generate(dist, 10) density(dist, list(d = rbind(cbind(1,2,1), cbind(0,2,1)))) density(dist, list(d = rbind(cbind(1,2,1), cbind(0,2,1))), log = TRUE) cdf(dist, cbind(1,2,1))
The multivariate normal distribution is a generalization of the univariate normal distribution to higher dimensions. It is widely used in multivariate statistics and describes the joint distribution of multiple correlated continuous random variables.
dist_multivariate_normal(mu = 0, sigma = list(diag(1)))dist_multivariate_normal(mu = 0, sigma = list(diag(1)))
mu |
A list of numeric vectors for the distribution's mean. |
sigma |
A list of matrices for the distribution's variance-covariance matrix. |
We recommend reading this documentation on pkgdown which renders math nicely. https://pkg.mitchelloharawild.com/distributional/reference/dist_multivariate_normal.html
In the following, let be a -dimensional multivariate
normal random variable with mean vector mu = and
variance-covariance matrix sigma = .
Support:
Mean:
Variance-covariance matrix:
Probability density function (p.d.f):
where is the determinant of
.
Cumulative distribution function (c.d.f):
The c.d.f. does not have a closed-form expression and is computed numerically.
Moment generating function (m.g.f):
mvtnorm::dmvnorm(), mvtnorm::pmvnorm(), mvtnorm::qmvnorm(),
mvtnorm::rmvnorm()
dist <- dist_multivariate_normal(mu = list(c(1,2)), sigma = list(matrix(c(4,2,2,3), ncol=2))) dimnames(dist) <- c("x", "y") dist mean(dist) variance(dist) support(dist) generate(dist, 10) density(dist, cbind(2, 1)) density(dist, cbind(2, 1), log = TRUE) cdf(dist, 4) quantile(dist, 0.7, kind = "equicoordinate") quantile(dist, 0.7, kind = "marginal")dist <- dist_multivariate_normal(mu = list(c(1,2)), sigma = list(matrix(c(4,2,2,3), ncol=2))) dimnames(dist) <- c("x", "y") dist mean(dist) variance(dist) support(dist) generate(dist, 10) density(dist, cbind(2, 1)) density(dist, cbind(2, 1), log = TRUE) cdf(dist, 4) quantile(dist, 0.7, kind = "equicoordinate") quantile(dist, 0.7, kind = "marginal")
The multivariate t-distribution is a generalization of the univariate Student's t-distribution to multiple dimensions. It is commonly used for modeling heavy-tailed multivariate data and in robust statistics.
dist_multivariate_t(df = 1, mu = 0, sigma = diag(1))dist_multivariate_t(df = 1, mu = 0, sigma = diag(1))
df |
A numeric vector of degrees of freedom (must be positive). |
mu |
A list of numeric vectors for the distribution location parameter. |
sigma |
A list of matrices for the distribution scale matrix. |
We recommend reading this documentation on pkgdown which renders math nicely. https://pkg.mitchelloharawild.com/distributional/reference/dist_multivariate_t.html
In the following, let be a multivariate t random vector
with degrees of freedom df = , location parameter
mu = , and scale matrix
sigma = .
Support: , where is the
dimension of the distribution
Mean: for , undefined otherwise
Covariance matrix:
for , undefined otherwise
Probability density function (p.d.f):
where is the dimension of the distribution and is
the gamma function.
Cumulative distribution function (c.d.f):
This integral does not have a closed form solution and is approximated numerically.
Quantile function:
The equicoordinate quantile function finds such that:
This does not have a closed form solution and is approximated numerically.
The marginal quantile function for each dimension is:
where is the quantile function of the univariate
Student's t-distribution with degrees of freedom, and
is the -th diagonal element of sigma.
mvtnorm::dmvt, mvtnorm::pmvt, mvtnorm::qmvt, mvtnorm::rmvt
dist <- dist_multivariate_t( df = 5, mu = list(c(1, 2)), sigma = list(matrix(c(4, 2, 2, 3), ncol = 2)) ) dimnames(dist) <- c("x", "y") dist mean(dist) variance(dist) support(dist) generate(dist, 10) density(dist, cbind(2, 1)) density(dist, cbind(2, 1), log = TRUE) cdf(dist, 4) quantile(dist, 0.7) quantile(dist, 0.7, kind = "marginal")dist <- dist_multivariate_t( df = 5, mu = list(c(1, 2)), sigma = list(matrix(c(4, 2, 2, 3), ncol = 2)) ) dimnames(dist) <- c("x", "y") dist mean(dist) variance(dist) support(dist) generate(dist, 10) density(dist, cbind(2, 1)) density(dist, cbind(2, 1), log = TRUE) cdf(dist, 4) quantile(dist, 0.7) quantile(dist, 0.7, kind = "marginal")
A generalization of the geometric distribution. It is the number
of failures in a sequence of i.i.d. Bernoulli trials before
a specified number of successes (size) occur. The probability of success in
each trial is given by prob.
dist_negative_binomial(size, prob)dist_negative_binomial(size, prob)
size |
The number of successful trials (target number of successes). Must be a positive number. Also called the dispersion parameter. |
prob |
The probability of success in each trial. Must be between 0 and 1. |
We recommend reading this documentation on pkgdown which renders math nicely. https://pkg.mitchelloharawild.com/distributional/reference/dist_negative_binomial.html
In the following, let be a Negative Binomial random variable with
success probability prob = and the number of successes size =
.
Support:
Mean:
Variance:
Probability mass function (p.m.f):
Cumulative distribution function (c.d.f):
This can also be expressed in terms of the regularized incomplete beta function, and is computed numerically.
Moment generating function (m.g.f):
Skewness:
Excess Kurtosis:
dist <- dist_negative_binomial(size = 10, prob = 0.5) dist mean(dist) variance(dist) skewness(dist) kurtosis(dist) support(dist) generate(dist, 10) density(dist, 2) density(dist, 2, log = TRUE) cdf(dist, 4) quantile(dist, 0.7)dist <- dist_negative_binomial(size = 10, prob = 0.5) dist mean(dist) variance(dist) skewness(dist) kurtosis(dist) support(dist) generate(dist, 10) density(dist, 2) density(dist, 2, log = TRUE) cdf(dist, 4) quantile(dist, 0.7)
The Normal distribution is ubiquitous in statistics, partially because of the central limit theorem, which states that sums of i.i.d. random variables eventually become Normal. Linear transformations of Normal random variables result in new random variables that are also Normal. If you are taking an intro stats course, you'll likely use the Normal distribution for Z-tests and in simple linear regression. Under regularity conditions, maximum likelihood estimators are asymptotically Normal. The Normal distribution is also called the gaussian distribution.
dist_normal(mu = 0, sigma = 1, mean = mu, sd = sigma)dist_normal(mu = 0, sigma = 1, mean = mu, sd = sigma)
mu, mean
|
The mean (location parameter) of the distribution, which is also the mean of the distribution. Can be any real number. |
sigma, sd
|
The standard deviation (scale parameter) of the distribution.
Can be any positive number. If you would like a Normal distribution with
variance |
We recommend reading this documentation on pkgdown which renders math nicely. https://pkg.mitchelloharawild.com/distributional/reference/dist_normal.html
In the following, let be a Normal random variable with mean
mu = and standard deviation sigma = .
Support: , the set of all real numbers
Mean:
Variance:
Probability density function (p.d.f):
Cumulative distribution function (c.d.f):
This integral does not have a closed form solution and is
approximated numerically. The c.d.f. of a standard Normal is sometimes
called the "error function". The notation also stands
for the c.d.f. of a standard Normal evaluated at . Z-tables
list the value of for various .
Moment generating function (m.g.f):
dist <- dist_normal(mu = 1:5, sigma = 3) dist mean(dist) variance(dist) skewness(dist) kurtosis(dist) generate(dist, 10) density(dist, 2) density(dist, 2, log = TRUE) cdf(dist, 4) quantile(dist, 0.7)dist <- dist_normal(mu = 1:5, sigma = 3) dist mean(dist) variance(dist) skewness(dist) kurtosis(dist) generate(dist, 10) density(dist, 2) density(dist, 2, log = TRUE) cdf(dist, 4) quantile(dist, 0.7)
The Pareto distribution is a power-law probability distribution commonly used in actuarial science to model loss severity and in economics to model income distributions and firm sizes.
dist_pareto(shape, scale)dist_pareto(shape, scale)
shape, scale
|
parameters. Must be strictly positive. |
We recommend reading this documentation on pkgdown which renders math nicely. https://pkg.mitchelloharawild.com/distributional/reference/dist_pareto.html
In the following, let be a Pareto random variable with parameters
shape = and scale = .
Support:
Mean: for ,
undefined otherwise
Variance:
for , undefined otherwise
Probability density function (p.d.f):
for , and .
Cumulative distribution function (c.d.f):
for .
Moment generating function (m.g.f):
Does not exist in closed form, but the th raw moment exists
for .
There are many different definitions of the Pareto distribution in the literature; see Arnold (2015) or Kleiber and Kotz (2003). This implementation uses the Pareto distribution without a location parameter as described in actuar::Pareto.
Kleiber, C. and Kotz, S. (2003), Statistical Size Distributions in Economics and Actuarial Sciences, Wiley.
Klugman, S. A., Panjer, H. H. and Willmot, G. E. (2012), Loss Models, From Data to Decisions, Fourth Edition, Wiley.
dist <- dist_pareto(shape = c(10, 3, 2, 1), scale = rep(1, 4)) dist mean(dist) variance(dist) support(dist) generate(dist, 10) density(dist, 2) density(dist, 2, log = TRUE) cdf(dist, 4) quantile(dist, 0.7)dist <- dist_pareto(shape = c(10, 3, 2, 1), scale = rep(1, 4)) dist mean(dist) variance(dist) support(dist) generate(dist, 10) density(dist, 2) density(dist, 2, log = TRUE) cdf(dist, 4) quantile(dist, 0.7)
The Percentile distribution is a non-parametric distribution defined by a set of quantiles at specified percentile values. This distribution is useful for representing empirical distributions or elicited expert knowledge when only percentile information is available. The distribution uses linear interpolation between percentiles and can be used to approximate complex distributions that may not have simple parametric forms.
dist_percentile(x, percentile)dist_percentile(x, percentile)
x |
A list of values |
percentile |
A list of percentiles |
We recommend reading this documentation on pkgdown which renders math nicely. https://pkg.mitchelloharawild.com/distributional/reference/dist_percentile.html
In the following, let be a Percentile random variable defined by
values at percentiles
where .
Support: if or
, otherwise support is approximated from the
specified percentiles.
Mean: Approximated numerically using spline interpolation and numerical integration:
where is a spline function interpolating the percentile values.
Variance: Approximated numerically.
Probability density function (p.d.f): Approximated numerically using kernel density estimation from generated samples.
Cumulative distribution function (c.d.f): Defined by linear interpolation:
Quantile function: Defined by linear interpolation:
for .
dist <- dist_normal() percentiles <- seq(0.01, 0.99, by = 0.01) x <- vapply(percentiles, quantile, double(1L), x = dist) dist_percentile(list(x), list(percentiles*100))dist <- dist_normal() percentiles <- seq(0.01, 0.99, by = 0.01) x <- vapply(percentiles, quantile, double(1L), x = dist) dist_percentile(list(x), list(percentiles*100))
Poisson distributions are frequently used to model counts. The Poisson distribution is commonly used to model the number of events occurring in a fixed interval of time or space when these events occur with a known constant mean rate and independently of the time since the last event. Examples include the number of emails received per hour, the number of decay events per second from a radioactive source, or the number of customers arriving at a store per day.
dist_poisson(lambda)dist_poisson(lambda)
lambda |
The rate parameter (mean and variance) of the distribution. Can be any positive number. This represents the expected number of events in the given interval. |
We recommend reading this documentation on pkgdown which renders math nicely. https://pkg.mitchelloharawild.com/distributional/reference/dist_poisson.html
In the following, let be a Poisson random variable with parameter
lambda = .
Support:
Mean:
Variance:
Probability mass function (p.m.f):
Cumulative distribution function (c.d.f):
Moment generating function (m.g.f):
Skewness:
Excess kurtosis:
dist <- dist_poisson(lambda = c(1, 4, 10)) dist mean(dist) variance(dist) skewness(dist) kurtosis(dist) generate(dist, 10) density(dist, 2) density(dist, 2, log = TRUE) cdf(dist, 4) quantile(dist, 0.7)dist <- dist_poisson(lambda = c(1, 4, 10)) dist mean(dist) variance(dist) skewness(dist) kurtosis(dist) generate(dist, 10) density(dist, 2) density(dist, 2, log = TRUE) cdf(dist, 4) quantile(dist, 0.7)
The Poisson-Inverse Gaussian distribution is a compound Poisson distribution where the rate parameter follows an Inverse Gaussian distribution. It is useful for modeling overdispersed count data.
dist_poisson_inverse_gaussian(mean, shape)dist_poisson_inverse_gaussian(mean, shape)
mean, shape
|
parameters. Must be strictly positive. Infinite values are supported. |
We recommend reading this documentation on pkgdown which renders math nicely. https://pkg.mitchelloharawild.com/distributional/reference/dist_poisson_inverse_gaussian.html
In the following, let be a Poisson-Inverse Gaussian random variable
with parameters mean = and shape = .
Support:
Mean:
Variance:
Probability mass function (p.m.f):
for
Cumulative distribution function (c.d.f):
The c.d.f does not have a closed form and is approximated numerically.
Moment generating function (m.g.f):
for
actuar::PoissonInverseGaussian, actuar::dpoisinvgauss(),
actuar::ppoisinvgauss(), actuar::qpoisinvgauss(), actuar::rpoisinvgauss()
dist <- dist_poisson_inverse_gaussian(mean = rep(0.1, 3), shape = c(0.4, 0.8, 1)) dist mean(dist) variance(dist) support(dist) generate(dist, 10) density(dist, 2) density(dist, 2, log = TRUE) cdf(dist, 4) quantile(dist, 0.7)dist <- dist_poisson_inverse_gaussian(mean = rep(0.1, 3), shape = c(0.4, 0.8, 1)) dist mean(dist) variance(dist) support(dist) generate(dist, 10) density(dist, 2) density(dist, 2, log = TRUE) cdf(dist, 4) quantile(dist, 0.7)
The sampling distribution represents an empirical distribution based on observed samples. It is useful for bootstrapping, representing posterior distributions from Markov Chain Monte Carlo (MCMC) algorithms, or working with any empirical data where the parametric form is unknown. Unlike parametric distributions, the sampling distribution makes no assumptions about the underlying data-generating process and instead uses the sample itself to estimate distributional properties. The distribution can handle both univariate and multivariate samples.
dist_sample(x)dist_sample(x)
x |
A list of sampled values. For univariate distributions, each element should be a numeric vector. For multivariate distributions, each element should be a matrix where columns represent variables and rows represent observations. |
We recommend reading this documentation on pkgdown which renders math nicely. https://pkg.mitchelloharawild.com/distributional/reference/dist_sample.html
In the following, let be a random variable with sample
of size .
Support: The observed range of the sample
Mean (univariate):
Mean (multivariate): Computed independently for each variable.
Variance (univariate):
Covariance (multivariate): The sample covariance matrix.
Skewness (univariate):
Probability density function: Approximated numerically using kernel density estimation.
Cumulative distribution function (univariate):
where is the indicator function.
Cumulative distribution function (multivariate):
where the inequality is applied element-wise.
Quantile function (univariate): The sample quantile, computed using
the specified quantile type (see stats::quantile()).
Quantile function (multivariate): Marginal quantiles are computed independently for each variable.
Random generation: Bootstrap sampling with replacement from the empirical sample.
stats::density(), stats::quantile(), stats::cov()
# Univariate numeric samples dist <- dist_sample(x = list(rnorm(100), rnorm(100, 10))) dist mean(dist) variance(dist) skewness(dist) generate(dist, 10) density(dist, 1) # Multivariate numeric samples dist <- dist_sample(x = list(cbind(rnorm(100), rnorm(100, 10)))) dimnames(dist) <- c("x", "y") dist mean(dist) variance(dist) generate(dist, 10) quantile(dist, 0.4) # Returns the marginal quantiles cdf(dist, matrix(c(0.3,9), nrow = 1))# Univariate numeric samples dist <- dist_sample(x = list(rnorm(100), rnorm(100, 10))) dist mean(dist) variance(dist) skewness(dist) generate(dist, 10) density(dist, 1) # Multivariate numeric samples dist <- dist_sample(x = list(cbind(rnorm(100), rnorm(100, 10)))) dimnames(dist) <- c("x", "y") dist mean(dist) variance(dist) generate(dist, 10) quantile(dist, 0.4) # Returns the marginal quantiles cdf(dist, matrix(c(0.3,9), nrow = 1))
The Student's T distribution is closely related to the Normal()
distribution, but has heavier tails. As increases to ,
the Student's T converges to a Normal. The T distribution appears
repeatedly throughout classic frequentist hypothesis testing when
comparing group means.
dist_student_t(df, mu = 0, sigma = 1, ncp = NULL)dist_student_t(df, mu = 0, sigma = 1, ncp = NULL)
df |
degrees of freedom ( |
mu |
The location parameter of the distribution.
If |
sigma |
The scale parameter of the distribution. |
ncp |
non-centrality parameter |
We recommend reading this documentation on pkgdown which renders math nicely. https://pkg.mitchelloharawild.com/distributional/reference/dist_student_t.html
In the following, let be a location-scale Student's T random variable with
df = , mu = , sigma = , and
ncp = (non-centrality parameter).
If follows a standard Student's T distribution (with df =
and ncp = ), then .
Support: , the set of all real numbers
Mean:
For the central distribution (ncp = 0 or NULL):
for , and undefined otherwise.
For the non-central distribution (ncp 0):
for , and undefined otherwise.
Variance:
For the central distribution (ncp = 0 or NULL):
for . Undefined if , infinite when .
For the non-central distribution (ncp 0):
for . Undefined if , infinite when .
Probability density function (p.d.f):
For the central distribution (ncp = 0 or NULL), the standard
t distribution with df = has density:
The location-scale version with mu = and sigma =
has density:
For the non-central distribution (ncp 0), the density is
computed numerically via stats::dt().
Cumulative distribution function (c.d.f):
For the central distribution (ncp = 0 or NULL), the cumulative
distribution function is computed numerically via stats::pt(), which
uses the relationship to the incomplete beta function:
for , where and is
the incomplete beta function (stats::pbeta()). For :
The location-scale version is: .
For the non-central distribution (ncp 0), the cumulative
distribution function is computed numerically via stats::pt().
Moment generating function (m.g.f):
Does not exist in closed form. Moments are computed using the formulas for mean and variance above where available.
dist <- dist_student_t(df = c(1,2,5), mu = c(0,1,2), sigma = c(1,2,3)) dist mean(dist) variance(dist) generate(dist, 10) density(dist, 2) density(dist, 2, log = TRUE) cdf(dist, 4) quantile(dist, 0.7)dist <- dist_student_t(df = c(1,2,5), mu = c(0,1,2), sigma = c(1,2,3)) dist mean(dist) variance(dist) generate(dist, 10) density(dist, 2) density(dist, 2, log = TRUE) cdf(dist, 4) quantile(dist, 0.7)
Tukey's studentized range distribution, used for Tukey's honestly significant differences test in ANOVA.
dist_studentized_range(nmeans, df, nranges)dist_studentized_range(nmeans, df, nranges)
nmeans |
sample size for range (same for each group). |
df |
degrees of freedom for |
nranges |
number of groups whose maximum range is considered. |
We recommend reading this documentation on pkgdown which renders math nicely. https://pkg.mitchelloharawild.com/distributional/reference/dist_studentized_range.html
In the following, let be a Studentized Range random variable with
parameters nmeans = (number of groups), df = (degrees
of freedom), and nranges = (number of ranges).
Support: , the set of positive real numbers.
Mean: Approximated numerically.
Variance: Approximated numerically.
Probability density function (p.d.f): The density does not have a closed-form expression and is computed numerically.
Cumulative distribution function (c.d.f): The c.d.f does not have a
simple closed-form expression. For (single range), it involves
integration over the joint distribution of the sample range and an
independent chi-square variable. The general form is computed numerically
using algorithms described in the references for stats::ptukey().
Moment generating function (m.g.f): Does not exist in closed form.
dist <- dist_studentized_range(nmeans = c(6, 2), df = c(5, 4), nranges = c(1, 1)) dist cdf(dist, 4) quantile(dist, 0.7)dist <- dist_studentized_range(nmeans = c(6, 2), df = c(5, 4), nranges = c(1, 1)) dist cdf(dist, 4) quantile(dist, 0.7)
A transformed distribution applies a monotonic transformation to an existing distribution. This is useful for creating derived distributions such as log-normal (exponential transformation of normal), or other custom transformations of base distributions.
The density(), mean(), and variance() methods are approximate as
they are based on numerical derivatives.
dist_transformed(dist, transform, inverse)dist_transformed(dist, transform, inverse)
dist |
A univariate distribution vector. |
transform |
A function used to transform the distribution. This transformation should be monotonic over appropriate domain. |
inverse |
The inverse of the |
We recommend reading this documentation on pkgdown which renders math nicely. https://pkg.mitchelloharawild.com/distributional/reference/dist_transformed.html
Let where is the base distribution with
transformation function transform = and inverse = .
The transformation must be monotonic over the support of .
Support: where is the support of
Mean: Approximated numerically using a second-order Taylor expansion:
where and are the mean and variance of the
base distribution , and is the second derivative of the
transformation. The derivative is computed numerically using
numDeriv::hessian().
Variance: Approximated numerically using the delta method:
where is the first derivative (Jacobian) computed numerically
using numDeriv::jacobian().
Probability density function (p.d.f): Using the change of variables formula:
where is the p.d.f. of the base distribution and the Jacobian
is computed numerically using
numDeriv::jacobian().
Cumulative distribution function (c.d.f):
For monotonically increasing :
For monotonically decreasing :
where is the c.d.f. of the base distribution.
Quantile function: The inverse of the c.d.f.
For monotonically increasing :
For monotonically decreasing :
where is the quantile function of the base distribution.
numDeriv::jacobian(), numDeriv::hessian()
# Create a log normal distribution dist <- dist_transformed(dist_normal(0, 0.5), exp, log) density(dist, 1) # dlnorm(1, 0, 0.5) cdf(dist, 4) # plnorm(4, 0, 0.5) quantile(dist, 0.1) # qlnorm(0.1, 0, 0.5) generate(dist, 10) # rlnorm(10, 0, 0.5)# Create a log normal distribution dist <- dist_transformed(dist_normal(0, 0.5), exp, log) density(dist, 1) # dlnorm(1, 0, 0.5) cdf(dist, 4) # plnorm(4, 0, 0.5) quantile(dist, 0.1) # qlnorm(0.1, 0, 0.5) generate(dist, 10) # rlnorm(10, 0, 0.5)
Note that the samples are generated using inverse transform sampling, and the means and variances are estimated from samples.
dist_truncated(dist, lower = -Inf, upper = Inf)dist_truncated(dist, lower = -Inf, upper = Inf)
dist |
The distribution(s) to truncate. |
lower, upper
|
The range of values to keep from a distribution. |
We recommend reading this documentation on pkgdown which renders math nicely. https://pkg.mitchelloharawild.com/distributional/reference/dist_truncated.html
In the following, let be a truncated random variable with
underlying distribution , truncation bounds lower = and
upper = , where is the c.d.f. of and
is the p.d.f. of .
Support:
Mean: For the general case, the mean is approximated numerically.
For a truncated Normal distribution with underlying mean and
standard deviation , the mean is:
where , ,
is the standard Normal p.d.f., and is the
standard Normal c.d.f.
Variance: Approximated numerically for all distributions.
Probability density function (p.d.f):
Cumulative distribution function (c.d.f):
Quantile function:
clamped to the interval .
dist <- dist_truncated(dist_normal(2,1), lower = 0) dist mean(dist) variance(dist) generate(dist, 10) density(dist, 2) density(dist, 2, log = TRUE) cdf(dist, 4) quantile(dist, 0.7) if(requireNamespace("ggdist")) { library(ggplot2) ggplot() + ggdist::stat_dist_halfeye( aes(y = c("Normal", "Truncated"), dist = c(dist_normal(2,1), dist_truncated(dist_normal(2,1), lower = 0))) ) }dist <- dist_truncated(dist_normal(2,1), lower = 0) dist mean(dist) variance(dist) generate(dist, 10) density(dist, 2) density(dist, 2, log = TRUE) cdf(dist, 4) quantile(dist, 0.7) if(requireNamespace("ggdist")) { library(ggplot2) ggplot() + ggdist::stat_dist_halfeye( aes(y = c("Normal", "Truncated"), dist = c(dist_normal(2,1), dist_truncated(dist_normal(2,1), lower = 0))) ) }
A distribution with constant density on an interval.
dist_uniform(min, max)dist_uniform(min, max)
min, max
|
lower and upper limits of the distribution. Must be finite. |
We recommend reading this documentation on pkgdown which renders math nicely. https://pkg.mitchelloharawild.com/distributional/reference/dist_uniform.html
In the following, let be a Uniform random variable with parameters
min = and max = .
Support:
Mean:
Variance:
Probability density function (p.d.f):
for , and otherwise.
Cumulative distribution function (c.d.f):
for , with for
and for .
Moment generating function (m.g.f):
for , and for .
Skewness:
Excess Kurtosis:
dist <- dist_uniform(min = c(3, -2), max = c(5, 4)) dist mean(dist) variance(dist) skewness(dist) kurtosis(dist) generate(dist, 10) density(dist, 2) density(dist, 2, log = TRUE) cdf(dist, 4) quantile(dist, 0.7)dist <- dist_uniform(min = c(3, -2), max = c(5, 4)) dist mean(dist) variance(dist) skewness(dist) kurtosis(dist) generate(dist, 10) density(dist, 2) density(dist, 2, log = TRUE) cdf(dist, 4) quantile(dist, 0.7)
Generalization of the gamma distribution. Often used in survival and time-to-event analyses.
dist_weibull(shape, scale)dist_weibull(shape, scale)
shape, scale
|
shape and scale parameters, the latter defaulting to 1. |
We recommend reading this documentation on pkgdown which renders math nicely. https://pkg.mitchelloharawild.com/distributional/reference/dist_weibull.html
In the following, let be a Weibull random variable with
shape parameter shape = and scale parameter scale = .
Support:
Mean:
where is the gamma function.
Variance:
Probability density function (p.d.f):
Cumulative distribution function (c.d.f):
Moment generating function (m.g.f):
Skewness:
where , , and the third
raw moment is
Excess Kurtosis:
where the fourth raw moment is
dist <- dist_weibull(shape = c(0.5, 1, 1.5, 5), scale = rep(1, 4)) dist mean(dist) variance(dist) skewness(dist) kurtosis(dist) generate(dist, 10) density(dist, 2) density(dist, 2, log = TRUE) cdf(dist, 4) quantile(dist, 0.7)dist <- dist_weibull(shape = c(0.5, 1, 1.5, 5), scale = rep(1, 4)) dist mean(dist) variance(dist) skewness(dist) kurtosis(dist) generate(dist, 10) density(dist, 2) density(dist, 2, log = TRUE) cdf(dist, 4) quantile(dist, 0.7)
If a distribution is not yet supported, you can vectorise p/d/q/r functions
using this function. dist_wrap() stores the distributions parameters, and
provides wrappers which call the appropriate p/d/q/r functions.
Using this function to wrap a distribution should only be done if the distribution is not yet available in this package. If you need a distribution which isn't in the package yet, consider making a request at https://github.com/mitchelloharawild/distributional/issues.
dist_wrap(dist, ..., package = NULL)dist_wrap(dist, ..., package = NULL)
dist |
The name of the distribution used in the functions (name that is prefixed by p/d/q/r) |
... |
Named arguments used to parameterise the distribution. |
package |
The package from which the distribution is provided. If NULL, the calling environment's search path is used to find the distribution functions. Alternatively, an arbitrary environment can also be provided here. |
The dist_wrap() function provides a generic interface to create distribution
objects from any set of p/d/q/r style functions. The statistical properties
depend on the specific distribution being wrapped.
dist <- dist_wrap("norm", mean = 1:3, sd = c(3, 9, 2)) density(dist, 1) # dnorm() cdf(dist, 4) # pnorm() quantile(dist, 0.975) # qnorm() generate(dist, 10) # rnorm() library(actuar) dist <- dist_wrap("invparalogis", package = "actuar", shape = 2, rate = 2) density(dist, 1) # actuar::dinvparalogis() cdf(dist, 4) # actuar::pinvparalogis() quantile(dist, 0.975) # actuar::qinvparalogis() generate(dist, 10) # actuar::rinvparalogis()dist <- dist_wrap("norm", mean = 1:3, sd = c(3, 9, 2)) density(dist, 1) # dnorm() cdf(dist, 4) # pnorm() quantile(dist, 0.975) # qnorm() generate(dist, 10) # rnorm() library(actuar) dist <- dist_wrap("invparalogis", package = "actuar", shape = 2, rate = 2) density(dist, 1) # actuar::dinvparalogis() cdf(dist, 4) # actuar::pinvparalogis() quantile(dist, 0.975) # actuar::qinvparalogis() generate(dist, 10) # actuar::rinvparalogis()
## S3 method for class 'distribution' family(object, ...)## S3 method for class 'distribution' family(object, ...)
object |
The distribution(s). |
... |
Additional arguments used by methods. |
dist <- c( dist_normal(1:2), dist_poisson(3), dist_multinomial(size = c(4, 3), prob = list(c(0.3, 0.5, 0.2), c(0.1, 0.5, 0.4))) ) family(dist)dist <- c( dist_normal(1:2), dist_poisson(3), dist_multinomial(size = c(4, 3), prob = list(c(0.3, 0.5, 0.2), c(0.1, 0.5, 0.4))) ) family(dist)
Generate random samples from probability distributions.
## S3 method for class 'distribution' generate(x, times, ...)## S3 method for class 'distribution' generate(x, times, ...)
x |
The distribution(s). |
times |
The number of samples. |
... |
Additional arguments used by methods. |
Determines whether a probability distribution is symmetric around its center.
has_symmetry(x, ...)has_symmetry(x, ...)
x |
The distribution(s). |
... |
Additional arguments used by methods. |
A logical value indicating whether the distribution is symmetric.
# Normal distribution is symmetric has_symmetry(dist_normal(mu = 0, sigma = 1)) has_symmetry(dist_normal(mu = 5, sigma = 2)) # Beta distribution symmetry depends on parameters has_symmetry(dist_beta(shape1 = 2, shape2 = 2)) # symmetric has_symmetry(dist_beta(shape1 = 2, shape2 = 5)) # not symmetric# Normal distribution is symmetric has_symmetry(dist_normal(mu = 0, sigma = 1)) has_symmetry(dist_normal(mu = 5, sigma = 2)) # Beta distribution symmetry depends on parameters has_symmetry(dist_beta(shape1 = 2, shape2 = 2)) # symmetric has_symmetry(dist_beta(shape1 = 2, shape2 = 5)) # not symmetric
Used to extract a specified prediction interval at a particular confidence level from a distribution.
hdr(x, ...)hdr(x, ...)
x |
Object to create hilo from. |
... |
Additional arguments used by methods. |
This function is highly experimental and will change in the future. In particular, improved functionality for object classes and visualisation tools will be added in a future release.
Computes minimally sized probability intervals highest density regions.
## S3 method for class 'distribution' hdr(x, size = 95, n = 512, ...)## S3 method for class 'distribution' hdr(x, size = 95, n = 512, ...)
x |
The distribution(s). |
size |
The size of the interval (between 0 and 100). |
n |
The resolution used to estimate the distribution's density. |
... |
Additional arguments used by methods. |
Used to extract a specified prediction interval at a particular confidence level from a distribution.
The numeric lower and upper bounds can be extracted from the interval using
<hilo>$lower and <hilo>$upper as shown in the examples below.
hilo(x, ...)hilo(x, ...)
x |
Object to create hilo from. |
... |
Additional arguments used by methods. |
# 95% interval from a standard normal distribution interval <- hilo(dist_normal(0, 1), 95) interval # Extract the individual quantities with `$lower`, `$upper`, and `$level` interval$lower interval$upper interval$level# 95% interval from a standard normal distribution interval <- hilo(dist_normal(0, 1), 95) interval # Extract the individual quantities with `$lower`, `$upper`, and `$level` interval$lower interval$upper interval$level
Returns a hilo central probability interval with probability coverage of
size. By default, the distribution's quantile() will be used to compute
the lower and upper bound for a centered interval
## S3 method for class 'distribution' hilo(x, size = 95, ...)## S3 method for class 'distribution' hilo(x, size = 95, ...)
x |
The distribution(s). |
size |
The size of the interval (between 0 and 100). |
... |
Additional arguments used by methods. |
This function returns TRUE for distributions and FALSE for all other objects.
is_distribution(x)is_distribution(x)
x |
An object. |
TRUE if the object inherits from the distribution class.
dist <- dist_normal() is_distribution(dist) is_distribution("distributional")dist <- dist_normal() is_distribution(dist) is_distribution("distributional")
Is the object a hdr
is_hdr(x)is_hdr(x)
x |
An object. |
Is the object a hilo
is_hilo(x)is_hilo(x)
x |
An object. |
kurtosis(x, ...) ## S3 method for class 'distribution' kurtosis(x, ...)kurtosis(x, ...) ## S3 method for class 'distribution' kurtosis(x, ...)
x |
The distribution(s). |
... |
Additional arguments used by methods. |
likelihood(x, ...) ## S3 method for class 'distribution' likelihood(x, sample, ..., log = FALSE) log_likelihood(x, ...)likelihood(x, ...) ## S3 method for class 'distribution' likelihood(x, sample, ..., log = FALSE) log_likelihood(x, ...)
x |
The distribution(s). |
... |
Additional arguments used by methods. |
sample |
A list of sampled values to compare to distribution(s). |
log |
If |
Returns the empirical mean of the probability distribution. If the method does not exist, the mean of a random sample will be returned.
## S3 method for class 'distribution' mean(x, ...)## S3 method for class 'distribution' mean(x, ...)
x |
The distribution(s). |
... |
Additional arguments used by methods. |
Returns the median (50th percentile) of a probability distribution. This is
equivalent to quantile(x, p=0.5).
## S3 method for class 'distribution' median(x, na.rm = FALSE, ...)## S3 method for class 'distribution' median(x, na.rm = FALSE, ...)
x |
The distribution(s). |
na.rm |
Unused, included for consistency with the generic function. |
... |
Additional arguments used by methods. |
Allows extension package developers to define a new distribution class compatible with the distributional package.
new_dist(..., class = NULL, dimnames = NULL)new_dist(..., class = NULL, dimnames = NULL)
... |
Parameters of the distribution (named). |
class |
The class of the distribution for S3 dispatch. |
dimnames |
The names of the variables in the distribution (optional). |
Construct hdr intervals
new_hdr( lower = list_of(.ptype = double()), upper = list_of(.ptype = double()), size = double() )new_hdr( lower = list_of(.ptype = double()), upper = list_of(.ptype = double()), size = double() )
lower, upper
|
A list of numeric vectors specifying the region's lower and upper bounds. |
size |
A numeric vector specifying the coverage size of the region. |
A "hdr" vector
Mitchell O'Hara-Wild
new_hdr(lower = list(1, c(3,6)), upper = list(10, c(5, 8)), size = c(80, 95))new_hdr(lower = list(1, c(3,6)), upper = list(10, c(5, 8)), size = c(80, 95))
Class constructor function to help with manually creating hilo interval objects.
new_hilo(lower = double(), upper = double(), size = double())new_hilo(lower = double(), upper = double(), size = double())
lower, upper
|
A numeric vector of values for lower and upper limits. |
size |
Size of the interval between [0, 100]. |
A "hilo" vector
Earo Wang & Mitchell O'Hara-Wild
new_hilo(lower = rnorm(10), upper = rnorm(10) + 5, size = 95)new_hilo(lower = rnorm(10), upper = rnorm(10) + 5, size = 95)
Construct support regions
new_support_region(x = numeric(), limits = list(), closed = list())new_support_region(x = numeric(), limits = list(), closed = list())
x |
A list of prototype vectors defining the distribution type. |
limits |
A list of value limits for the distribution. |
closed |
A list of logical(2L) indicating whether the limits are closed. |
parameters(x, ...) ## S3 method for class 'distribution' parameters(x, ...)parameters(x, ...) ## S3 method for class 'distribution' parameters(x, ...)
x |
The distribution(s). |
... |
Additional arguments used by methods. |
dist <- c( dist_normal(1:2), dist_poisson(3), dist_multinomial(size = c(4, 3), prob = list(c(0.3, 0.5, 0.2), c(0.1, 0.5, 0.4))) ) parameters(dist)dist <- c( dist_normal(1:2), dist_poisson(3), dist_multinomial(size = c(4, 3), prob = list(c(0.3, 0.5, 0.2), c(0.1, 0.5, 0.4))) ) parameters(dist)
Computes the quantiles of a distribution.
## S3 method for class 'distribution' quantile(x, p, ..., log = FALSE)## S3 method for class 'distribution' quantile(x, p, ..., log = FALSE)
x |
The distribution(s). |
p |
The probability of the quantile. |
... |
Additional arguments passed to methods. |
log |
If |
skewness(x, ...) ## S3 method for class 'distribution' skewness(x, ...)skewness(x, ...) ## S3 method for class 'distribution' skewness(x, ...)
x |
The distribution(s). |
... |
Additional arguments used by methods. |
support(x, ...) ## S3 method for class 'distribution' support(x, ...)support(x, ...) ## S3 method for class 'distribution' support(x, ...)
x |
The distribution(s). |
... |
Additional arguments used by methods. |
A generic function for computing the variance of an object.
variance(x, ...) ## S3 method for class 'numeric' variance(x, ...) ## S3 method for class 'matrix' variance(x, ...) ## S3 method for class 'numeric' covariance(x, ...)variance(x, ...) ## S3 method for class 'numeric' variance(x, ...) ## S3 method for class 'matrix' variance(x, ...) ## S3 method for class 'numeric' covariance(x, ...)
x |
An object. |
... |
Additional arguments used by methods. |
The implementation of variance() for numeric variables coerces the input to
a vector then uses stats::var() to compute the variance. This means that,
unlike stats::var(), if variance() is passed a matrix or a 2-dimensional
array, it will still return the variance (stats::var() returns the
covariance matrix in that case).
variance.distribution(), covariance()
Returns the empirical variance of the probability distribution. If the method does not exist, the variance of a random sample will be returned.
## S3 method for class 'distribution' variance(x, ...)## S3 method for class 'distribution' variance(x, ...)
x |
The distribution(s). |
... |
Additional arguments used by methods. |