Package 'hdrcde'

Title:	Highest Density Regions and Conditional Density Estimation
Description:	Computation of highest density regions in one and two dimensions, kernel estimation of univariate density functions conditional on one covariate,and multimodal regression.
Authors:	Rob Hyndman [aut, cre, cph] , Jochen Einbeck [aut] , Matthew Wand [aut] , Simon Carrignon [ctb] , Fan Cheng [ctb]
Maintainer:	Rob Hyndman <[email protected]>
License:	GPL-3
Version:	3.4
Built:	2025-03-23 04:23:23 UTC
Source:	https://github.com/robjhyndman/hdrcde

Help Index

Box Cox Transformation
Conditional Density Estimation
Bandwidth calculation for conditional density estimation
Highest Density Regions
Bivariate Highest Density Regions
Highest Density Region Boxplots
Calculate highest density regions continously over some conditioned variable.
Density plot with Highest Density Regions
Highest Density Region Bandwidth
HDRs with confidence intervals
Scatterplot showing bivariate highest density regions
Speed-Flow data for Californian Freeway
Daily maximum temperatures in Melbourne, Australia
Nonparametric Multimodal Regression
Plots conditional densities
Plot HDRs with confidence intervals

Box Cox Transformation

Description

BoxCox() returns a transformation of the input variable using a Box-Cox transformation. InvBoxCox() reverses the transformation.

Usage

BoxCox(x, lambda)
BoxCox(x, lambda)

Arguments

`x`	a numeric vector or time series
`lambda`	transformation parameter

Details

The Box-Cox transformation is given by

$f_\lambda(x) =\frac{x^\lambda - 1}{\lambda}$

if $\lambda\ne0$ . For $\lambda=0$ ,

$f_0(x) = \log(x).$

Value

a numeric vector of the same length as x.

Author(s)

Rob J Hyndman

References

Box, G. E. P. and Cox, D. R. (1964) An analysis of transformations. JRSS B 26 211–246.

Conditional Density Estimation

Description

Calculates kernel conditional density estimate using local polynomial estimation.

Usage

cde(
  x,
  y,
  deg = 0,
  link = "identity",
  a,
  b,
  mean = NULL,
  x.margin,
  y.margin,
  x.name,
  y.name,
  use.locfit = FALSE,
  fw = TRUE,
  rescale = TRUE,
  nxmargin = 15,
  nymargin = 100,
  a.nndefault = 0.3,
  ...
)
cde(
  x,
  y,
  deg = 0,
  link = "identity",
  a,
  b,
  mean = NULL,
  x.margin,
  y.margin,
  x.name,
  y.name,
  use.locfit = FALSE,
  fw = TRUE,
  rescale = TRUE,
  nxmargin = 15,
  nymargin = 100,
  a.nndefault = 0.3,
  ...
)

Arguments

`x`	Numerical vector or matrix: the conditioning variable(s).
`y`	Numerical vector: the response variable.
`deg`	Degree of local polynomial used in estimation.
`link`	Link function used in estimation. Default "identity". The other possibility is "log" which is recommended if degree > 0.
`a`	Optional bandwidth in x direction.
`b`	Optional bandwidth in y direction.
`mean`	Estimated mean of y\|x. If present, it will adjust conditional density to have this mean.
`x.margin`	Values in x-space on which conditional density is calculated. If not specified, an equi-spaced grid of `nxmargin` values over the range of x is used. If x is a matrix, x.margin should be a list of two numerical vectors.
`y.margin`	Values in y-space on which conditional density is calculated. If not specified, an equi-spaced grid of `nymargin` values over the range of y is used.
`x.name`	Optional name of x variable used in plots.
`y.name`	Optional name of y variable used in plots.
`use.locfit`	If TRUE, will use `locfit` for estimation. Otherwise `ksmooth` is used. `locfit` is used if degree>0 or link not the identity or the dimension of x is greater than 1 even if `use.locfit=FALSE`.
`fw`	If TRUE (default), will use fixed window width estimation. Otherwise nearest neighbourhood estimation is used. If the dimension of x is greater than 1, nearest neighbourhood must be used.
`rescale`	If TRUE (default), will rescale the conditional densities to integrate to one.
`nxmargin`	Number of values used in `x.margin` by default.
`nymargin`	Number of values used in `y.margin` by default.
`a.nndefault`	Default nearest neighbour bandwidth (used only if `fw=FALSE` and `a` is missing.).
`...`	Additional arguments are passed to locfit.

Details

If bandwidths are omitted, they are computed using normal reference rules described in Bashtannyk and Hyndman (2001) and Hyndman and Yao (2002). Bias adjustment uses the method described in Hyndman, Bashtannyk and Grunwald (1996). If deg>1 then estimation is based on the local parametric estimator of Hyndman and Yao (2002).

Value

A list with the following components:

`x`	grid in x direction on which density evaluated. Equal to x.margin if specified.
`y`	grid in y direction on which density is evaluated. Equal to y.margin if specified.
`z`	value of conditional density estimate returned as a matrix.
`a`	window width in x direction.
`b`	window width in y direction.
`x.name`	Name of x variable to be used in plots.
`y.name`	Name of y variable to be used in plots.

Author(s)

Rob J Hyndman

References

Hyndman, R.J., Bashtannyk, D.M. and Grunwald, G.K. (1996) "Estimating and visualizing conditional densities". Journal of Computational and Graphical Statistics, 5, 315-336.

Bashtannyk, D.M., and Hyndman, R.J. (2001) "Bandwidth selection for kernel conditional density estimation". Computational statistics and data analysis, 36(3), 279-298.

Hyndman, R.J. and Yao, Q. (2002) "Nonparametric estimation and symmetry tests for conditional density functions". Journal of Nonparametric Statistics, 14(3), 259-278.

Examples

# Old faithful data
faithful.cde <- cde(faithful$waiting, faithful$eruptions,
  x.name="Waiting time", y.name="Duration time")
plot(faithful.cde)
plot(faithful.cde, plot.fn="hdr")

# Melbourne maximum temperatures with bias adjustment
x <- maxtemp[1:3649]
y <- maxtemp[2:3650]
maxtemp.cde <- cde(x, y,
  x.name="Today's max temperature", y.name="Tomorrow's max temperature")
# Assume linear mean
fit <- lm(y~x)
fit.mean <- list(x=6:45,y=fit$coef[1]+fit$coef[2]*(6:45))
maxtemp.cde2 <- cde(x, y, mean=fit.mean,
	 x.name="Today's max temperature", y.name="Tomorrow's max temperature")
plot(maxtemp.cde)
# Old faithful data
faithful.cde <- cde(faithful$waiting, faithful$eruptions,
  x.name="Waiting time", y.name="Duration time")
plot(faithful.cde)
plot(faithful.cde, plot.fn="hdr")

# Melbourne maximum temperatures with bias adjustment
x <- maxtemp[1:3649]
y <- maxtemp[2:3650]
maxtemp.cde <- cde(x, y,
  x.name="Today's max temperature", y.name="Tomorrow's max temperature")
# Assume linear mean
fit <- lm(y~x)
fit.mean <- list(x=6:45,y=fit$coef[1]+fit$coef[2]*(6:45))
maxtemp.cde2 <- cde(x, y, mean=fit.mean,
	 x.name="Today's max temperature", y.name="Tomorrow's max temperature")
plot(maxtemp.cde)

Bandwidth calculation for conditional density estimation

Description

Calculates bandwidths for kernel conditional density estimates. Methods described in Bashtannyk and Hyndman (2001) and Hyndman and Yao (2002).

Usage

cde.bandwidths(
  x,
  y,
  deg = 0,
  link = "identity",
  method = 1,
  y.margin,
  passes = 2,
  ngrid = 8,
  min.a = NULL,
  ny = 25,
  use.sample = FALSE,
  GCV = TRUE,
  b = NULL,
  ...
)
cde.bandwidths(
  x,
  y,
  deg = 0,
  link = "identity",
  method = 1,
  y.margin,
  passes = 2,
  ngrid = 8,
  min.a = NULL,
  ny = 25,
  use.sample = FALSE,
  GCV = TRUE,
  b = NULL,
  ...
)

Arguments

`x`	Numerical vector: the conditioning variable.
`y`	Numerical vector: the response variable.
`deg`	Degree of local polynomial used in estimation.
`link`	Link function used in estimation. Default "identity". The other possibility is "log" which is recommended if degree > 0.
`method`	method = 1: Hyndman-Yao algorithm if deg>0; Bashtannyk-Hyndman algorithm if deg=0; method = 2: Normal reference rules; method = 3: Bashtannyk-Hyndman regression method if deg=0; method = 4: Bashtannyk-Hyndman bootstrap method if deg=0.
`y.margin`	Values in y-space on which conditional density is calculated. If not specified, an equi-spaced grid of 50 values over the range of y is used.
`passes`	Number of passes through Bashtannyk-Hyndman algorithm.
`ngrid`	Number of values of smoothing parameter in grid.
`min.a`	Smallest value of a to consider if method=1.
`ny`	Number of values to use for y margin if `y.margin` is missing.
`use.sample`	Used when regression method (3) is chosen.
`GCV`	Generalized cross-validation. Used only if method=1 and deg>0. If GCV=FALSE, method=1 and deg=0, then the AIC is used instead. The argument is ignored if deg=0 or method>1.
`b`	Value of b can be specified only if method=1 and deg>0. For deg=0 or method>1, this argument is ignored.
`...`	Other arguments control details for individual methods.

Details

Details of the various algorithms are in Bashtannyk and Hyndman (2001) and Hyndman and Yao (2002).

Value

`a`	Window width in `x` direction.
`b`	Window width in `y` direction.

Author(s)

Rob J Hyndman

References

Hyndman, R.J., Bashtannyk, D.M. and Grunwald, G.K. (1996) "Estimating and visualizing conditional densities". Journal of Computational and Graphical Statistics, 5, 315-336.

Bashtannyk, D.M., and Hyndman, R.J. (2001) "Bandwidth selection for kernel conditional density estimation". Computational statistics and data analysis, 36(3), 279-298.

Hyndman, R.J. and Yao, Q. (2002) "Nonparametric estimation and symmetry tests for conditional density functions". Journal of Nonparametric Statistics, 14(3), 259-278.

Examples

bands <- cde.bandwidths(faithful$waiting,faithful$eruptions,method=2)
plot(cde(faithful$waiting,faithful$eruptions,a=bands$a,b=bands$b))
bands <- cde.bandwidths(faithful$waiting,faithful$eruptions,method=2)
plot(cde(faithful$waiting,faithful$eruptions,a=bands$a,b=bands$b))

Highest Density Regions

Description

Calculates highest density regions in one dimension

Usage

hdr(
  x = NULL,
  prob = c(50, 95, 99),
  den = NULL,
  h = hdrbw(BoxCox(x, lambda), mean(prob)),
  lambda = 1,
  nn = 5000,
  all.modes = FALSE
)
hdr(
  x = NULL,
  prob = c(50, 95, 99),
  den = NULL,
  h = hdrbw(BoxCox(x, lambda), mean(prob)),
  lambda = 1,
  nn = 5000,
  all.modes = FALSE
)

Arguments

`x`	Numeric vector containing data. If `x` is missing then `den` must be provided, and the HDR is computed from the given density.
`prob`	Probability coverage required for HDRs
`den`	Density of data as list with components `x` and `y`. If omitted, the density is estimated from `x` using `density`.
`h`	Optional bandwidth for calculation of density.
`lambda`	Box-Cox transformation parameter where `0 <= lambda <= 1`.
`nn`	Number of random numbers used in computing f-alpha quantiles.
`all.modes`	Return all local modes or just the global mode?

Details

Either x or den must be provided. When x is provided, the density is estimated using kernel density estimation. A Box-Cox transformation is used if lambda!=1, as described in Wand, Marron and Ruppert (1991). This allows the density estimate to be non-zero only on the positive real line. The default kernel bandwidth h is selected using the algorithm of Samworth and Wand (2010).

Hyndman's (1996) density quantile algorithm is used for calculation.

Value

A list of three components:

`hdr`	The endpoints of each interval in each HDR
`mode`	The estimated mode of the density.
`falpha`	The value of the density at the boundaries of each HDR.

Author(s)

Rob J Hyndman

References

Hyndman, R.J. (1996) Computing and graphing highest density regions. American Statistician, 50, 120-126.

Samworth, R.J. and Wand, M.P. (2010). Asymptotics and optimal bandwidth selection for highest density region estimation. The Annals of Statistics, 38, 1767-1792.

Wand, M.P., Marron, J S., Ruppert, D. (1991) Transformations in density estimation. Journal of the American Statistical Association, 86, 343-353.

Examples

# Old faithful eruption duration times
hdr(faithful$eruptions)
# Old faithful eruption duration times
hdr(faithful$eruptions)

Bivariate Highest Density Regions

Description

Calculates and plots highest density regions in two dimensions, including the bivariate HDR boxplot.

Usage

hdr.2d(
  x,
  y,
  prob = c(50, 95, 99),
  den = NULL,
  kde.package = c("ash", "ks"),
  h = NULL,
  xextend = 0.15,
  yextend = 0.15
)

hdr.boxplot.2d(
  x,
  y,
  prob = c(50, 99),
  kde.package = c("ash", "ks"),
  h = NULL,
  xextend = 0.15,
  yextend = 0.15,
  xlab = "",
  ylab = "",
  shadecols = "darkgray",
  pointcol = 1,
  outside.points = TRUE,
  ...
)

## S3 method for class 'hdr2d'
plot(
  x,
  shaded = TRUE,
  show.points = FALSE,
  outside.points = FALSE,
  pch = 20,
  shadecols = gray((length(x$alpha):1)/(length(x$alpha) + 1)),
  pointcol = 1,
  ...
)
hdr.2d(
  x,
  y,
  prob = c(50, 95, 99),
  den = NULL,
  kde.package = c("ash", "ks"),
  h = NULL,
  xextend = 0.15,
  yextend = 0.15
)

hdr.boxplot.2d(
  x,
  y,
  prob = c(50, 99),
  kde.package = c("ash", "ks"),
  h = NULL,
  xextend = 0.15,
  yextend = 0.15,
  xlab = "",
  ylab = "",
  shadecols = "darkgray",
  pointcol = 1,
  outside.points = TRUE,
  ...
)

## S3 method for class 'hdr2d'
plot(
  x,
  shaded = TRUE,
  show.points = FALSE,
  outside.points = FALSE,
  pch = 20,
  shadecols = gray((length(x$alpha):1)/(length(x$alpha) + 1)),
  pointcol = 1,
  ...
)

Arguments

`x`	Numeric vector
`y`	Numeric vector of same length as `x`.
`prob`	Probability coverage required for HDRs
`den`	Bivariate density estimate (a list with elements x, y and z where x and y are grid values and z is a matrix of density values). If `NULL`, the density is estimated.
`kde.package`	Package to be used in calculating the kernel density estimate when `den=NULL`.
`h`	Pair of bandwidths passed to either `ash2` or `kde`. If NULL, a reasonable default is used. Ignored if `den` is not `NULL`.
`xextend`	Proportion of range of `x`. The density is estimated on a grid extended by `xextend` beyond the range of `x`.
`yextend`	Proportion of range of `y`. The density is estimated on a grid extended by `yextend` beyond the range of `y`.
`xlab`	Label for x-axis.
`ylab`	Label for y-axis.
`shadecols`	Colors for shaded regions
`pointcol`	Color for outliers and mode
`outside.points`	If `TRUE`, the observations lying outside the largest HDR are shown.
`...`	Other arguments to be passed to plot.
`shaded`	If `TRUE`, the HDR contours are shown as shaded regions.
`show.points`	If `TRUE`, the observations are plotted over the top of the HDR contours.
`pch`	The plotting character used for observations.

Details

The density is estimated using kernel density estimation. Either ash2 or kde is used to do the calculations. Then Hyndman's (1996) density quantile algorithm is used to compute the HDRs.

hdr.2d returns an object of class hdr2d containing all the information needed to compute the HDR contours. This object can be plotted using plot.hdr2d.

hdr.boxplot.2d produces a bivariate HDR boxplot. This is a special case of applying plot.hdr2d to an object computed using hdr.2d.

Value

Some information about the HDRs is returned. See code for details.

Author(s)

Rob J Hyndman

References

Hyndman, R.J. (1996) Computing and graphing highest density regions American Statistician, 50, 120-126.

Examples

x <- c(rnorm(200,0,1),rnorm(200,4,1))
y <- c(rnorm(200,0,1),rnorm(200,4,1))
hdr.boxplot.2d(x,y)

hdrinfo <- hdr.2d(x,y)
plot(hdrinfo, pointcol="red", show.points=TRUE, pch=3)
x <- c(rnorm(200,0,1),rnorm(200,4,1))
y <- c(rnorm(200,0,1),rnorm(200,4,1))
hdr.boxplot.2d(x,y)

hdrinfo <- hdr.2d(x,y)
plot(hdrinfo, pointcol="red", show.points=TRUE, pch=3)

Highest Density Region Boxplots

Description

Calculates and plots a univariate highest density regions boxplot.

Usage

hdr.boxplot(
  x,
  prob = c(99, 50),
  h = hdrbw(BoxCox(x, lambda), mean(prob)),
  lambda = 1,
  boxlabels = "",
  col = gray((9:1)/10),
  main = "",
  xlab = "",
  ylab = "",
  pch = 1,
  border = 1,
  outline = TRUE,
  space = 0.25,
  ...
)
hdr.boxplot(
  x,
  prob = c(99, 50),
  h = hdrbw(BoxCox(x, lambda), mean(prob)),
  lambda = 1,
  boxlabels = "",
  col = gray((9:1)/10),
  main = "",
  xlab = "",
  ylab = "",
  pch = 1,
  border = 1,
  outline = TRUE,
  space = 0.25,
  ...
)

Arguments

`x`	Numeric vector containing data or a list containing several vectors.
`prob`	Probability coverage required for HDRs `density`.
`h`	Optional bandwidth for calculation of density.
`lambda`	Box-Cox transformation parameter where `0 <= lambda <= 1`.
`boxlabels`	Label for each box plotted.
`col`	Colours for regions of each box.
`main`	Overall title for the plot.
`xlab`	Label for x-axis.
`ylab`	Label for y-axis.
`pch`	Plotting character.
`border`	Width of border of box.
`outline`	If not <code>TRUE</code>, the outliers are not drawn.
`space`	The space between each box, between 0 and 0.5.
`...`	Other arguments passed to plot.

Details

The density is estimated using kernel density estimation. A Box-Cox transformation is used if lambda!=1, as described in Wand, Marron and Ruppert (1991). This allows the density estimate to be non-zero only on the positive real line. The default kernel bandwidth h is selected using the algorithm of Samworth and Wand (2010).

Hyndman's (1996) density quantile algorithm is used for calculation.

Value

nothing.

Author(s)

Rob J Hyndman

References

Hyndman, R.J. (1996) Computing and graphing highest density regions. American Statistician, 50, 120-126.

Samworth, R.J. and Wand, M.P. (2010). Asymptotics and optimal bandwidth selection for highest density region estimation. The Annals of Statistics, 38, 1767-1792.

Wand, M.P., Marron, J S., Ruppert, D. (1991) Transformations in density estimation. Journal of the American Statistical Association, 86, 343-353.

Examples

# Old faithful eruption duration times
hdr.boxplot(faithful$eruptions)

# Simple bimodal example
x <- c(rnorm(100,0,1), rnorm(100,5,1))
par(mfrow=c(1,2))
boxplot(x)
hdr.boxplot(x)

# Highly skewed example
x <- exp(rnorm(100,0,1))
par(mfrow=c(1,2))
boxplot(x)
hdr.boxplot(x,lambda=0)

# Old faithful eruption duration times
hdr.boxplot(faithful$eruptions)

# Simple bimodal example
x <- c(rnorm(100,0,1), rnorm(100,5,1))
par(mfrow=c(1,2))
boxplot(x)
hdr.boxplot(x)

# Highly skewed example
x <- exp(rnorm(100,0,1))
par(mfrow=c(1,2))
boxplot(x)
hdr.boxplot(x,lambda=0)

Calculate highest density regions continously over some conditioned variable.

Description

Calculates and plots highest density regions for a conditional density estimate. Uses output from cde.

Usage

hdr.cde(
  den,
  prob = c(50, 95, 99),
  plot = TRUE,
  plot.modes = TRUE,
  mden = rep(1, length(den$x)),
  threshold = 0.05,
  nn = 1000,
  xlim,
  ylim,
  xlab,
  ylab,
  border = TRUE,
  font = 1,
  cex = 1,
  ...
)
hdr.cde(
  den,
  prob = c(50, 95, 99),
  plot = TRUE,
  plot.modes = TRUE,
  mden = rep(1, length(den$x)),
  threshold = 0.05,
  nn = 1000,
  xlim,
  ylim,
  xlab,
  ylab,
  border = TRUE,
  font = 1,
  cex = 1,
  ...
)

Arguments

`den`	Conditional density in the same format as the output from `cde`.
`prob`	Probability coverage level for HDRs
`plot`	Should HDRs be plotted? If FALSE, results are returned.
`plot.modes`	Should modes be plotted as well as HDRs?
`mden`	Marginal density in the `x` direction. When small, the HDRs won't be plotted. Default is uniform so all HDRs are plotted.
`threshold`	Threshold for margin density. HDRs are not plotted if the margin density `mden` is lower than this value.
`nn`	Number of points to be sampled from each density when estimating the HDRs.
`xlim`	Limits for x-axis.
`ylim`	Limits for y-axis.
`xlab`	Label for x-axis.
`ylab`	Label for y-axis.
`border`	Show border of polygons
`font`	Font to be used in plot.
`cex`	Size of characters.
`...`	Other arguments passed to plotting functions.

Value

`hdr`	array (a,b,c) where where a specifies conditioning value, b gives the HDR endpoints and c gives the probability coverage.
`modes`	estimated mode of each conditional density

Author(s)

Rob J Hyndman

References

Hyndman, R.J., Bashtannyk, D.M. and Grunwald, G.K. (1996) "Estimating and visualizing conditional densities". Journal of Computational and Graphical Statistics, 5, 315-336.

Examples

faithful.cde <- cde(faithful$waiting,faithful$eruptions)
plot(faithful.cde,xlab="Waiting time",ylab="Duration time",plot.fn="hdr")
faithful.cde <- cde(faithful$waiting,faithful$eruptions)
plot(faithful.cde,xlab="Waiting time",ylab="Duration time",plot.fn="hdr")

Density plot with Highest Density Regions

Description

Plots univariate density with highest density regions displayed

Usage

hdr.den(
  x,
  prob = c(50, 95, 99),
  den,
  h = hdrbw(BoxCox(x, lambda), mean(prob)),
  lambda = 1,
  xlab = NULL,
  ylab = "Density",
  ylim = NULL,
  plot.lines = TRUE,
  col = 2:8,
  bgcol = "gray",
  legend = FALSE,
  ...
)
hdr.den(
  x,
  prob = c(50, 95, 99),
  den,
  h = hdrbw(BoxCox(x, lambda), mean(prob)),
  lambda = 1,
  xlab = NULL,
  ylab = "Density",
  ylim = NULL,
  plot.lines = TRUE,
  col = 2:8,
  bgcol = "gray",
  legend = FALSE,
  ...
)

Arguments

`x`	Numeric vector containing data. If `x` is missing then `den` must be provided, and the HDR is computed from the given density.
`prob`	Probability coverage required for HDRs
`den`	Density of data as list with components `x` and `y`. If omitted, the density is estimated from `x` using `density`.
`h`	Optional bandwidth for calculation of density.
`lambda`	Box-Cox transformation parameter where `0 <= lambda <= 1`.
`xlab`	Label for x-axis.
`ylab`	Label for y-axis.
`ylim`	Limits for y-axis.
`plot.lines`	If `TRUE`, will show how the HDRs are determined using lines.
`col`	Colours for regions.
`bgcol`	Colours for the background behind the boxes. Default `"gray"`, if `NULL` no box is drawn.
`legend`	If `TRUE` add a legend on the right of the boxes.
`...`	Other arguments passed to plot.

Details

Hyndman's (1996) density quantile algorithm is used for calculation.

Value

a list of three components:

`hdr`	The endpoints of each interval in each HDR
`mode`	The estimated mode of the density.
`falpha`	The value of the density at the boundaries of each HDR.

Author(s)

Rob J Hyndman

References

Hyndman, R.J. (1996) Computing and graphing highest density regions. American Statistician, 50, 120-126.

Samworth, R.J. and Wand, M.P. (2010). Asymptotics and optimal bandwidth selection for highest density region estimation. The Annals of Statistics, 38, 1767-1792.

Wand, M.P., Marron, J S., Ruppert, D. (1991) Transformations in density estimation. Journal of the American Statistical Association, 86, 343-353.

Examples

# Old faithful eruption duration times
hdr.den(faithful$eruptions)

# Simple bimodal example
x <- c(rnorm(100,0,1), rnorm(100,5,1))
hdr.den(x)
# Old faithful eruption duration times
hdr.den(faithful$eruptions)

# Simple bimodal example
x <- c(rnorm(100,0,1), rnorm(100,5,1))
hdr.den(x)

Highest Density Region Bandwidth

Description

Estimates the optimal bandwidth for 1-dimensional highest density regions

Usage

hdrbw(x, HDRlevel, gridsize = 801, nMChdr = 1e+06, graphProgress = FALSE)
hdrbw(x, HDRlevel, gridsize = 801, nMChdr = 1e+06, graphProgress = FALSE)

Arguments

`x`	Numerical vector containing data.
`HDRlevel`	HDR-level as defined in Hyndman (1996). Setting ‘HDRlevel’ equal to p (0<p<1) corresponds to a probability of 1-p of inclusion in the highest density region.
`gridsize`	the number of equally spaced points used for binned kernel density estimation.
`nMChdr`	the size of the Monte Carlo sample used for density quantile approximation of the highest density region, as described in Hyndman (1996).
`graphProgress`	logical flag: if ‘TRUE’ then plots showing the progress of the bandwidth selection algorithm are produced.

Details

This is a plug-in rule for bandwidth selection tailored to highest density region estimation

Value

A numerical vector of length 1.

Author(s)

Matt Wand

References

Hyndman, R.J. (1996). Computing and graphing highest density regions. The American Statistician, 50, 120-126.

Samworth, R.J. and Wand, M.P. (2010). Asymptotics and optimal bandwidth selection for highest density region estimation. The Annals of Statistics, 38, 1767-1792.

Examples

HDRlevelVal <- 0.55
x <- faithful$eruptions
hHDR <- hdrbw(x,HDRlevelVal)
HDRhat <- hdr.den(x,prob=100*(1-HDRlevelVal),h=hHDR)
HDRlevelVal <- 0.55
x <- faithful$eruptions
hHDR <- hdrbw(x,HDRlevelVal)
HDRhat <- hdr.den(x,prob=100*(1-HDRlevelVal),h=hHDR)

HDRs with confidence intervals

Description

Calculates Highest Density Regions with confidence intervals.

Usage

hdrconf(x, den, prob = 95, conf = 95)
hdrconf(x, den, prob = 95, conf = 95)

Arguments

`x`	Numeric vector containing data.
`den`	Density of data as list with components `x` and `y`.
`prob`	Probability coverage for for HDRs.
`conf`	Confidence for limits on HDR.

Value

hdrconf returns list containing the following components:

`hdr`	Highest density regions
`hdr.lo`	Highest density regions corresponding to lower confidence limit.
`hdr.hi`	Highest density regions corresponding to upper confidence limit.
`falpha`	Values of $f_\alpha$ corresponding to HDRs.
`falpha.ci`	Values of $f_\alpha$ corresponding to lower and upper limits.

Author(s)

Rob J Hyndman

References

Hyndman, R.J. (1996) Computing and graphing highest density regions American Statistician, 50, 120-126.

Examples

x <- c(rnorm(100,0,1),rnorm(100,4,1))
den <- density(x,bw=hdrbw(x,50))
trueden <- den
trueden$y <- 0.5*(exp(-0.5*(den$x*den$x)) + exp(-0.5*(den$x-4)^2))/sqrt(2*pi)
sortx <- sort(x)

par(mfcol=c(2,2))
for(conf in c(50,95))
{
  m <- hdrconf(sortx,trueden,conf=conf)
  plot(m,trueden,main=paste(conf,"% HDR from true density"))
  m <- hdrconf(sortx,den,conf=conf)
  plot(m,den,main=paste(conf,"% HDR from empirical density\n(n=200)"))
}
x <- c(rnorm(100,0,1),rnorm(100,4,1))
den <- density(x,bw=hdrbw(x,50))
trueden <- den
trueden$y <- 0.5*(exp(-0.5*(den$x*den$x)) + exp(-0.5*(den$x-4)^2))/sqrt(2*pi)
sortx <- sort(x)

par(mfcol=c(2,2))
for(conf in c(50,95))
{
  m <- hdrconf(sortx,trueden,conf=conf)
  plot(m,trueden,main=paste(conf,"% HDR from true density"))
  m <- hdrconf(sortx,den,conf=conf)
  plot(m,den,main=paste(conf,"% HDR from empirical density\n(n=200)"))
}

Scatterplot showing bivariate highest density regions

Description

Produces a scatterplot where the points are coloured according to the bivariate HDRs in which they fall.

Usage

hdrscatterplot(
  x,
  y,
  levels = c(1, 50, 99),
  kde.package = c("ash", "ks"),
  noutliers = NULL,
  label = NULL
)
hdrscatterplot(
  x,
  y,
  levels = c(1, 50, 99),
  kde.package = c("ash", "ks"),
  noutliers = NULL,
  label = NULL
)

Arguments

`x`	Numeric vector or matrix with 2 columns.
`y`	Numeric vector of same length as `x`.
`levels`	Percentage coverage for HDRs
`kde.package`	Package to be used in calculating the kernel density estimate when `den=NULL`.
`noutliers`	Number of outliers to be labelled. By default, all points outside the largest HDR are labelled.
`label`	Label of outliers of same length as `x` and `y`. By default, all outliers are labelled as the row index of the point `(x, y)`.

Details

The bivariate density is estimated using kernel density estimation. Either ash2 or kde is used to do the calculations. Then Hyndman's (1996) density quantile algorithm is used to compute the HDRs. The scatterplot of (x,y) is created where the points are coloured according to which HDR they fall. A ggplot object is returned.

Author(s)

Rob J Hyndman

Examples

x <- c(rnorm(200, 0, 1), rnorm(200, 4, 1))
y <- c(rnorm(200, 0, 1), rnorm(200, 4, 1))
hdrscatterplot(x, y)
hdrscatterplot(x, y, label = paste0("p", 1:length(x)))
x <- c(rnorm(200, 0, 1), rnorm(200, 4, 1))
y <- c(rnorm(200, 0, 1), rnorm(200, 4, 1))
hdrscatterplot(x, y)
hdrscatterplot(x, y, label = paste0("p", 1:length(x)))

Speed-Flow data for Californian Freeway

Description

These are two data sets collected in 1993 on two individual lanes (lane 2 and lane 3) of the 4-lane Californian freeway I-880. The data were collected by loop detectors, and the time units are 30 seconds per observation (see Petty et al., 1996, for details).

Usage

lane2; lane3
lane2; lane3

Format

Two data frames (lane2 and lane3) each with 1318 observations on the following two variables:

flow: a numeric vector giving the traffic flow in vehicles per lane per hour.
speed: a numeric vector giving the speed in miles per hour.

Details

The data is examined in Einbeck and Tutz (2006), using a nonparametric approach to multi-valued regression based on conditional mean shift.

Source

Petty, K.F., Noeimi, H., Sanwal, K., Rydzewski, D., Skabardonis, A., Varaiya, P., and Al-Deek, H. (1996). “The Freeway Service Patrol Evaluation Project: Database Support Programs, and Accessibility”. Transportation Research Part C: Emerging Technologies, 4, 71-85.

The data is provided by courtesy of CALIFORNIA PATH, Institute of Transportation Studies, University of California, Berkeley.

References

Einbeck, J., and Tutz, G. (2006). “Modelling beyond regression functions: an application of multimodal regression to speed-flow data”. Journal of the Royal Statistical Society, Series C (Applied Statistics), 55, 461-475.

Examples

plot(lane2)
plot(lane3)
plot(lane2)
plot(lane3)

Daily maximum temperatures in Melbourne, Australia

Description

Daily maximum temperatures in Melbourne, Australia, from 1981-1990. Leap days have been omitted.

Usage

maxtemp
maxtemp

Format

Time series of frequency 365.

Source

Hyndman, R.J., Bashtannyk, D.M. and Grunwald, G.K. (1996) "Estimating and visualizing conditional densities". Journal of Computational and Graphical Statistics, 5, 315-336.

Examples

plot(maxtemp)
plot(maxtemp)

Nonparametric Multimodal Regression

Description

Nonparametric multi-valued regression based on the modes of conditional density estimates.

Usage

modalreg(
  x,
  y,
  xfix = seq(min(x), max(x), l = 50),
  a,
  b,
  deg = 0,
  iter = 30,
  P = 2,
  start = "e",
  prun = TRUE,
  prun.const = 10,
  plot.type = c("p", 1),
  labels = c("", "x", "y"),
  pch = 20,
  ...
)
modalreg(
  x,
  y,
  xfix = seq(min(x), max(x), l = 50),
  a,
  b,
  deg = 0,
  iter = 30,
  P = 2,
  start = "e",
  prun = TRUE,
  prun.const = 10,
  plot.type = c("p", 1),
  labels = c("", "x", "y"),
  pch = 20,
  ...
)

Arguments

`x`	Numerical vector: the conditioning variable.
`y`	Numerical vector: the response variable.
`xfix`	Numerical vector corresponding to the input values of which the fitted values shall be calculated.
`a`	Optional bandwidth in $x$ -direction.
`b`	Optional bandwidth in $y$ -direction.
`deg`	Degree of local polynomial used in estimation (0 or 1).
`iter`	Positive integer giving the number of mean shift iterations per point and branch.
`P`	Maximal number of branches.
`start`	Character determining how the starting points are selected. `"q"`: proportional to quantiles; `"e"`: equidistant; `"r"`: random. All, `"q"`, `"e"`, and `"r"`, give starting points which are constant over `x`. As an alternative, the choice `"v"` gives variable starting points, which are equal to `"q"` for the smallest `x`, and equal to the previously fitted values for all subsequent `x`.
`prun`	Boolean. If TRUE, parts of branches are dismissed (in the plotted output) where their associated kernel density value falls below the threshold `1/(prun.const(max(x)-min(x))(max(y)-min(y)))`.
`prun.const`	Numerical value giving the constant used above (the higher, the less pruning)
`plot.type`	Vector with two elements. The first one is character-valued, with possible values `"p"`, `"l"`, and `"n"`. If equal to `"n"`, no plotted output is given at all. If equal to `"p"`, fitted curves are symbolized as points in the graphical output, otherwise as lines. The second vector component is a numerical value either being 0 or 1. If 1, the position of the starting points is depicted in the plot, otherwise omitted.
`labels`	Vector of three character strings. The first one is the "main" title of the graphical output, the second one is the label of the $x$ axis, and the third one the label of the $y$ axis.
`pch`	Plotting character. The default corresponds to small bullets.
`...`	Other arguments passed to `cde.bandwidths`.

Details

Computes multi-modal nonparametric regression curves based on the maxima of conditional density estimates. The tool for the estimation is the conditional mean shift as outlined in Einbeck and Tutz (2006). Estimates of the conditional modes might fluctuate highly if deg=1. Hence, deg=0 is recommended. For bandwidth selection, the hybrid rule introduced by Bashtannyk and Hyndman (2001) is employed if deg=0. This corresponds to the setting method=1 in function cde.bandwidths. For deg=1 automatic bandwidth selection is not supported.

Value

A list with the following components:

`xfix`	Grid of predictor values at which the fitted values are calculated.
`fitted.values`	A `[P x length(xfix)]`- matrix with fitted j-th branch in the j-th row ( $1 \le j \le P$ )
`bandwidths`	A vector with bandwidths `a` and `b`.
`density`	A `[P x length(xfix)]`- matrix with estimated kernel densities. This will only be computed if `prun=TRUE`.
`threshold`	The pruning threshold.

Author(s)

Jochen Einbeck (2007)

References

Einbeck, J., and Tutz, G. (2006) "Modelling beyond regression functions: an application of multimodal regression to speed-flow data". Journal of the Royal Statistical Society, Series C (Applied Statistics), 55, 461-475.

Bashtannyk, D.M., and Hyndman, R.J. (2001) "Bandwidth selection for kernel conditional density estimation". Computational Statistics and Data Analysis, 36(3), 279-298.

Examples

lane2.fit <- modalreg(lane2$flow, lane2$speed, xfix=(1:55)*40, a=100, b=4)
lane2.fit <- modalreg(lane2$flow, lane2$speed, xfix=(1:55)*40, a=100, b=4)

Plots conditional densities

Description

Produces stacked density plots or highest density region plots for a univariate density conditional on one covariate.

Usage

## S3 method for class 'cde'
plot(
  x,
  firstvar = 1,
  mfrow = n2mfrow(dim(x$z)[firstvar]),
  plot.fn = "stacked",
  x.name,
  margin = NULL,
  ...
)
## S3 method for class 'cde'
plot(
  x,
  firstvar = 1,
  mfrow = n2mfrow(dim(x$z)[firstvar]),
  plot.fn = "stacked",
  x.name,
  margin = NULL,
  ...
)

Arguments

`x`	Output from `cde`.
`firstvar`	If there is more than one conditioning variable, `firstvar` specifies which variable to fix first.
`mfrow`	If there is more than one conditioning variable, `mfrow` is passed to `par` before plotting.
`plot.fn`	Specifies which plotting function to use: "stacked" results in stacked conditional densities and "hdr" results in highest density regions.
`x.name`	Name of x (conditioning) variable for use on x-axis.
`margin`	Marginal density of conditioning variable. If present, only conditional densities corresponding to non-negligible marginal densities will be plotted.
`...`	Additional arguments to plot.

Value

If plot.fn=="stacked" and there is only one conditioning variable, the function returns the output from persp. If plot.fn=="hdr" and there is only one conditioning variable, the function returns the output from hdr.cde. When there is more than one conditioning variable, nothing is returned.

Author(s)

Rob J Hyndman

References

Hyndman, R.J., Bashtannyk, D.M. and Grunwald, G.K. (1996) "Estimating and visualizing conditional densities". Journal of Computational and Graphical Statistics, 5, 315-336.

Examples

faithful.cde <- cde(faithful$waiting,faithful$eruptions,
	 x.name="Waiting time", y.name="Duration time")
plot(faithful.cde)
plot(faithful.cde,plot.fn="hdr")
faithful.cde <- cde(faithful$waiting,faithful$eruptions,
	 x.name="Waiting time", y.name="Duration time")
plot(faithful.cde)
plot(faithful.cde,plot.fn="hdr")

Plot HDRs with confidence intervals

Description

Plots Highest Density Regions with confidence intervals.

Usage

## S3 method for class 'hdrconf'
plot(x, den, ...)
## S3 method for class 'hdrconf'
plot(x, den, ...)

Arguments

`x`	Output from `hdrconf`.
`den`	Density of data as list with components `x` and `y`.
`...`	Other arguments are passed to plot.

Value

None

Author(s)

Rob J Hyndman

References

Hyndman, R.J. (1996) Computing and graphing highest density regions American Statistician, 50, 120-126.

Examples

x <- c(rnorm(100,0,1),rnorm(100,4,1))
den <- density(x,bw=bw.SJ(x))
trueden <- den
trueden$y <- 0.5*(exp(-0.5*(den$x*den$x)) + exp(-0.5*(den$x-4)^2))/sqrt(2*pi)
sortx <- sort(x)

par(mfcol=c(2,2))
for(conf in c(50,95))
{
  m <- hdrconf(sortx,trueden,conf=conf)
  plot(m,trueden,main=paste(conf,"% HDR from true density"))
  m <- hdrconf(sortx,den,conf=conf)
  plot(m,den,main=paste(conf,"% HDR from empirical density\n(n=200)"))
}
x <- c(rnorm(100,0,1),rnorm(100,4,1))
den <- density(x,bw=bw.SJ(x))
trueden <- den
trueden$y <- 0.5*(exp(-0.5*(den$x*den$x)) + exp(-0.5*(den$x-4)^2))/sqrt(2*pi)
sortx <- sort(x)

par(mfcol=c(2,2))
for(conf in c(50,95))
{
  m <- hdrconf(sortx,trueden,conf=conf)
  plot(m,trueden,main=paste(conf,"% HDR from true density"))
  m <- hdrconf(sortx,den,conf=conf)
  plot(m,den,main=paste(conf,"% HDR from empirical density\n(n=200)"))
}

Package 'hdrcde'

Help Index

Box Cox Transformation

Description

Usage

Arguments

Details

Value

Author(s)

References

Conditional Density Estimation

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Bandwidth calculation for conditional density estimation

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Highest Density Regions

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Bivariate Highest Density Regions

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Highest Density Region Boxplots

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Calculate highest density regions continously over some conditioned variable.

Description

Usage

Arguments

Value

Author(s)

References

See Also

Examples

Density plot with Highest Density Regions

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Highest Density Region Bandwidth