Title: | Highest Density Regions and Conditional Density Estimation |
---|---|
Description: | Computation of highest density regions in one and two dimensions, kernel estimation of univariate density functions conditional on one covariate,and multimodal regression. |
Authors: | Rob Hyndman [aut, cre, cph] , Jochen Einbeck [aut] , Matthew Wand [aut] , Simon Carrignon [ctb] , Fan Cheng [ctb] |
Maintainer: | Rob Hyndman <[email protected]> |
License: | GPL-3 |
Version: | 3.4 |
Built: | 2024-11-23 04:02:42 UTC |
Source: | https://github.com/robjhyndman/hdrcde |
BoxCox() returns a transformation of the input variable using a Box-Cox transformation. InvBoxCox() reverses the transformation.
BoxCox(x, lambda)
BoxCox(x, lambda)
x |
a numeric vector or time series |
lambda |
transformation parameter |
The Box-Cox transformation is given by
if
. For
,
a numeric vector of the same length as x.
Rob J Hyndman
Box, G. E. P. and Cox, D. R. (1964) An analysis of transformations. JRSS B 26 211–246.
Calculates kernel conditional density estimate using local polynomial estimation.
cde( x, y, deg = 0, link = "identity", a, b, mean = NULL, x.margin, y.margin, x.name, y.name, use.locfit = FALSE, fw = TRUE, rescale = TRUE, nxmargin = 15, nymargin = 100, a.nndefault = 0.3, ... )
cde( x, y, deg = 0, link = "identity", a, b, mean = NULL, x.margin, y.margin, x.name, y.name, use.locfit = FALSE, fw = TRUE, rescale = TRUE, nxmargin = 15, nymargin = 100, a.nndefault = 0.3, ... )
x |
Numerical vector or matrix: the conditioning variable(s). |
y |
Numerical vector: the response variable. |
deg |
Degree of local polynomial used in estimation. |
link |
Link function used in estimation. Default "identity". The other possibility is "log" which is recommended if degree > 0. |
a |
Optional bandwidth in x direction. |
b |
Optional bandwidth in y direction. |
mean |
Estimated mean of y|x. If present, it will adjust conditional density to have this mean. |
x.margin |
Values in x-space on which conditional density is
calculated. If not specified, an equi-spaced grid of |
y.margin |
Values in y-space on which conditional density is
calculated. If not specified, an equi-spaced grid of |
x.name |
Optional name of x variable used in plots. |
y.name |
Optional name of y variable used in plots. |
use.locfit |
If TRUE, will use |
fw |
If TRUE (default), will use fixed window width estimation. Otherwise nearest neighbourhood estimation is used. If the dimension of x is greater than 1, nearest neighbourhood must be used. |
rescale |
If TRUE (default), will rescale the conditional densities to integrate to one. |
nxmargin |
Number of values used in |
nymargin |
Number of values used in |
a.nndefault |
Default nearest neighbour bandwidth (used only if
|
... |
Additional arguments are passed to locfit. |
If bandwidths are omitted, they are computed using normal reference rules described in Bashtannyk and Hyndman (2001) and Hyndman and Yao (2002). Bias adjustment uses the method described in Hyndman, Bashtannyk and Grunwald (1996). If deg>1 then estimation is based on the local parametric estimator of Hyndman and Yao (2002).
A list with the following components:
x |
grid in x direction on which density evaluated. Equal to x.margin if specified. |
y |
grid in y direction on which density is evaluated. Equal to y.margin if specified. |
z |
value of conditional density estimate returned as a matrix. |
a |
window width in x direction. |
b |
window width in y direction. |
x.name |
Name of x variable to be used in plots. |
y.name |
Name of y variable to be used in plots. |
Rob J Hyndman
Hyndman, R.J., Bashtannyk, D.M. and Grunwald, G.K. (1996) "Estimating and visualizing conditional densities". Journal of Computational and Graphical Statistics, 5, 315-336.
Bashtannyk, D.M., and Hyndman, R.J. (2001) "Bandwidth selection for kernel conditional density estimation". Computational statistics and data analysis, 36(3), 279-298.
Hyndman, R.J. and Yao, Q. (2002) "Nonparametric estimation and symmetry tests for conditional density functions". Journal of Nonparametric Statistics, 14(3), 259-278.
# Old faithful data faithful.cde <- cde(faithful$waiting, faithful$eruptions, x.name="Waiting time", y.name="Duration time") plot(faithful.cde) plot(faithful.cde, plot.fn="hdr") # Melbourne maximum temperatures with bias adjustment x <- maxtemp[1:3649] y <- maxtemp[2:3650] maxtemp.cde <- cde(x, y, x.name="Today's max temperature", y.name="Tomorrow's max temperature") # Assume linear mean fit <- lm(y~x) fit.mean <- list(x=6:45,y=fit$coef[1]+fit$coef[2]*(6:45)) maxtemp.cde2 <- cde(x, y, mean=fit.mean, x.name="Today's max temperature", y.name="Tomorrow's max temperature") plot(maxtemp.cde)
# Old faithful data faithful.cde <- cde(faithful$waiting, faithful$eruptions, x.name="Waiting time", y.name="Duration time") plot(faithful.cde) plot(faithful.cde, plot.fn="hdr") # Melbourne maximum temperatures with bias adjustment x <- maxtemp[1:3649] y <- maxtemp[2:3650] maxtemp.cde <- cde(x, y, x.name="Today's max temperature", y.name="Tomorrow's max temperature") # Assume linear mean fit <- lm(y~x) fit.mean <- list(x=6:45,y=fit$coef[1]+fit$coef[2]*(6:45)) maxtemp.cde2 <- cde(x, y, mean=fit.mean, x.name="Today's max temperature", y.name="Tomorrow's max temperature") plot(maxtemp.cde)
Calculates bandwidths for kernel conditional density estimates. Methods described in Bashtannyk and Hyndman (2001) and Hyndman and Yao (2002).
cde.bandwidths( x, y, deg = 0, link = "identity", method = 1, y.margin, passes = 2, ngrid = 8, min.a = NULL, ny = 25, use.sample = FALSE, GCV = TRUE, b = NULL, ... )
cde.bandwidths( x, y, deg = 0, link = "identity", method = 1, y.margin, passes = 2, ngrid = 8, min.a = NULL, ny = 25, use.sample = FALSE, GCV = TRUE, b = NULL, ... )
x |
Numerical vector: the conditioning variable. |
y |
Numerical vector: the response variable. |
deg |
Degree of local polynomial used in estimation. |
link |
Link function used in estimation. Default "identity". The other possibility is "log" which is recommended if degree > 0. |
method |
|
y.margin |
Values in y-space on which conditional density is calculated. If not specified, an equi-spaced grid of 50 values over the range of y is used. |
passes |
Number of passes through Bashtannyk-Hyndman algorithm. |
ngrid |
Number of values of smoothing parameter in grid. |
min.a |
Smallest value of a to consider if method=1. |
ny |
Number of values to use for y margin if |
use.sample |
Used when regression method (3) is chosen. |
GCV |
Generalized cross-validation. Used only if method=1 and deg>0. If GCV=FALSE, method=1 and deg=0, then the AIC is used instead. The argument is ignored if deg=0 or method>1. |
b |
Value of b can be specified only if method=1 and deg>0. For deg=0 or method>1, this argument is ignored. |
... |
Other arguments control details for individual methods. |
Details of the various algorithms are in Bashtannyk and Hyndman (2001) and Hyndman and Yao (2002).
a |
Window width in |
b |
Window width
in |
Rob J Hyndman
Hyndman, R.J., Bashtannyk, D.M. and Grunwald, G.K. (1996) "Estimating and visualizing conditional densities". Journal of Computational and Graphical Statistics, 5, 315-336.
Bashtannyk, D.M., and Hyndman, R.J. (2001) "Bandwidth selection for kernel conditional density estimation". Computational statistics and data analysis, 36(3), 279-298.
Hyndman, R.J. and Yao, Q. (2002) "Nonparametric estimation and symmetry tests for conditional density functions". Journal of Nonparametric Statistics, 14(3), 259-278.
bands <- cde.bandwidths(faithful$waiting,faithful$eruptions,method=2) plot(cde(faithful$waiting,faithful$eruptions,a=bands$a,b=bands$b))
bands <- cde.bandwidths(faithful$waiting,faithful$eruptions,method=2) plot(cde(faithful$waiting,faithful$eruptions,a=bands$a,b=bands$b))
Calculates highest density regions in one dimension
hdr( x = NULL, prob = c(50, 95, 99), den = NULL, h = hdrbw(BoxCox(x, lambda), mean(prob)), lambda = 1, nn = 5000, all.modes = FALSE )
hdr( x = NULL, prob = c(50, 95, 99), den = NULL, h = hdrbw(BoxCox(x, lambda), mean(prob)), lambda = 1, nn = 5000, all.modes = FALSE )
x |
Numeric vector containing data. If |
prob |
Probability coverage required for HDRs |
den |
Density of data as list with components |
h |
Optional bandwidth for calculation of density. |
lambda |
Box-Cox transformation parameter where |
nn |
Number of random numbers used in computing f-alpha quantiles. |
all.modes |
Return all local modes or just the global mode? |
Either x
or den
must be provided. When x
is provided,
the density is estimated using kernel density estimation. A Box-Cox
transformation is used if lambda!=1
, as described in Wand, Marron and
Ruppert (1991). This allows the density estimate to be non-zero only on the
positive real line. The default kernel bandwidth h
is selected using
the algorithm of Samworth and Wand (2010).
Hyndman's (1996) density quantile algorithm is used for calculation.
A list of three components:
hdr |
The endpoints of each interval in each HDR |
mode |
The estimated mode of the density. |
falpha |
The value of the density at the boundaries of each HDR. |
Rob J Hyndman
Hyndman, R.J. (1996) Computing and graphing highest density regions. American Statistician, 50, 120-126.
Samworth, R.J. and Wand, M.P. (2010). Asymptotics and optimal bandwidth selection for highest density region estimation. The Annals of Statistics, 38, 1767-1792.
Wand, M.P., Marron, J S., Ruppert, D. (1991) Transformations in density estimation. Journal of the American Statistical Association, 86, 343-353.
# Old faithful eruption duration times hdr(faithful$eruptions)
# Old faithful eruption duration times hdr(faithful$eruptions)
Calculates and plots highest density regions in two dimensions, including the bivariate HDR boxplot.
hdr.2d( x, y, prob = c(50, 95, 99), den = NULL, kde.package = c("ash", "ks"), h = NULL, xextend = 0.15, yextend = 0.15 ) hdr.boxplot.2d( x, y, prob = c(50, 99), kde.package = c("ash", "ks"), h = NULL, xextend = 0.15, yextend = 0.15, xlab = "", ylab = "", shadecols = "darkgray", pointcol = 1, outside.points = TRUE, ... ) ## S3 method for class 'hdr2d' plot( x, shaded = TRUE, show.points = FALSE, outside.points = FALSE, pch = 20, shadecols = gray((length(x$alpha):1)/(length(x$alpha) + 1)), pointcol = 1, ... )
hdr.2d( x, y, prob = c(50, 95, 99), den = NULL, kde.package = c("ash", "ks"), h = NULL, xextend = 0.15, yextend = 0.15 ) hdr.boxplot.2d( x, y, prob = c(50, 99), kde.package = c("ash", "ks"), h = NULL, xextend = 0.15, yextend = 0.15, xlab = "", ylab = "", shadecols = "darkgray", pointcol = 1, outside.points = TRUE, ... ) ## S3 method for class 'hdr2d' plot( x, shaded = TRUE, show.points = FALSE, outside.points = FALSE, pch = 20, shadecols = gray((length(x$alpha):1)/(length(x$alpha) + 1)), pointcol = 1, ... )
x |
Numeric vector |
y |
Numeric vector of same length as |
prob |
Probability coverage required for HDRs |
den |
Bivariate density estimate (a list with elements x, y and z where
x and y are grid values and z is a matrix of density values). If
|
kde.package |
Package to be used in calculating the kernel density
estimate when |
h |
Pair of bandwidths passed to either |
xextend |
Proportion of range of |
yextend |
Proportion of range of |
xlab |
Label for x-axis. |
ylab |
Label for y-axis. |
shadecols |
Colors for shaded regions |
pointcol |
Color for outliers and mode |
outside.points |
If |
... |
Other arguments to be passed to plot. |
shaded |
If |
show.points |
If |
pch |
The plotting character used for observations. |
The density is estimated using kernel density estimation. Either
ash2
or kde
is used to do the
calculations. Then Hyndman's (1996) density quantile algorithm is used to
compute the HDRs.
hdr.2d
returns an object of class hdr2d
containing all the
information needed to compute the HDR contours. This object can be plotted
using plot.hdr2d
.
hdr.boxplot.2d
produces a bivariate HDR boxplot. This is a special
case of applying plot.hdr2d
to an object computed using
hdr.2d
.
Some information about the HDRs is returned. See code for details.
Rob J Hyndman
Hyndman, R.J. (1996) Computing and graphing highest density regions American Statistician, 50, 120-126.
x <- c(rnorm(200,0,1),rnorm(200,4,1)) y <- c(rnorm(200,0,1),rnorm(200,4,1)) hdr.boxplot.2d(x,y) hdrinfo <- hdr.2d(x,y) plot(hdrinfo, pointcol="red", show.points=TRUE, pch=3)
x <- c(rnorm(200,0,1),rnorm(200,4,1)) y <- c(rnorm(200,0,1),rnorm(200,4,1)) hdr.boxplot.2d(x,y) hdrinfo <- hdr.2d(x,y) plot(hdrinfo, pointcol="red", show.points=TRUE, pch=3)
Calculates and plots a univariate highest density regions boxplot.
hdr.boxplot( x, prob = c(99, 50), h = hdrbw(BoxCox(x, lambda), mean(prob)), lambda = 1, boxlabels = "", col = gray((9:1)/10), main = "", xlab = "", ylab = "", pch = 1, border = 1, outline = TRUE, space = 0.25, ... )
hdr.boxplot( x, prob = c(99, 50), h = hdrbw(BoxCox(x, lambda), mean(prob)), lambda = 1, boxlabels = "", col = gray((9:1)/10), main = "", xlab = "", ylab = "", pch = 1, border = 1, outline = TRUE, space = 0.25, ... )
x |
Numeric vector containing data or a list containing several vectors. |
prob |
Probability coverage required for HDRs
|
h |
Optional bandwidth for calculation of density. |
lambda |
Box-Cox transformation parameter where |
boxlabels |
Label for each box plotted. |
col |
Colours for regions of each box. |
main |
Overall title for the plot. |
xlab |
Label for x-axis. |
ylab |
Label for y-axis. |
pch |
Plotting character. |
border |
Width of border of box. |
outline |
If not <code>TRUE</code>, the outliers are not drawn. |
space |
The space between each box, between 0 and 0.5. |
... |
Other arguments passed to plot. |
The density is estimated using kernel density estimation. A Box-Cox
transformation is used if lambda!=1
, as described in Wand, Marron and
Ruppert (1991). This allows the density estimate to be non-zero only on the
positive real line. The default kernel bandwidth h
is selected using
the algorithm of Samworth and Wand (2010).
Hyndman's (1996) density quantile algorithm is used for calculation.
nothing.
Rob J Hyndman
Hyndman, R.J. (1996) Computing and graphing highest density regions. American Statistician, 50, 120-126.
Samworth, R.J. and Wand, M.P. (2010). Asymptotics and optimal bandwidth selection for highest density region estimation. The Annals of Statistics, 38, 1767-1792.
Wand, M.P., Marron, J S., Ruppert, D. (1991) Transformations in density estimation. Journal of the American Statistical Association, 86, 343-353.
# Old faithful eruption duration times hdr.boxplot(faithful$eruptions) # Simple bimodal example x <- c(rnorm(100,0,1), rnorm(100,5,1)) par(mfrow=c(1,2)) boxplot(x) hdr.boxplot(x) # Highly skewed example x <- exp(rnorm(100,0,1)) par(mfrow=c(1,2)) boxplot(x) hdr.boxplot(x,lambda=0)
# Old faithful eruption duration times hdr.boxplot(faithful$eruptions) # Simple bimodal example x <- c(rnorm(100,0,1), rnorm(100,5,1)) par(mfrow=c(1,2)) boxplot(x) hdr.boxplot(x) # Highly skewed example x <- exp(rnorm(100,0,1)) par(mfrow=c(1,2)) boxplot(x) hdr.boxplot(x,lambda=0)
Calculates and plots highest density regions for a conditional density
estimate. Uses output from cde
.
hdr.cde( den, prob = c(50, 95, 99), plot = TRUE, plot.modes = TRUE, mden = rep(1, length(den$x)), threshold = 0.05, nn = 1000, xlim, ylim, xlab, ylab, border = TRUE, font = 1, cex = 1, ... )
hdr.cde( den, prob = c(50, 95, 99), plot = TRUE, plot.modes = TRUE, mden = rep(1, length(den$x)), threshold = 0.05, nn = 1000, xlim, ylim, xlab, ylab, border = TRUE, font = 1, cex = 1, ... )
den |
Conditional density in the same format as the output from
|
prob |
Probability coverage level for HDRs |
plot |
Should HDRs be plotted? If FALSE, results are returned. |
plot.modes |
Should modes be plotted as well as HDRs? |
mden |
Marginal density in the |
threshold |
Threshold for margin density. HDRs are not plotted if the
margin density |
nn |
Number of points to be sampled from each density when estimating the HDRs. |
xlim |
Limits for x-axis. |
ylim |
Limits for y-axis. |
xlab |
Label for x-axis. |
ylab |
Label for y-axis. |
border |
Show border of polygons |
font |
Font to be used in plot. |
cex |
Size of characters. |
... |
Other arguments passed to plotting functions. |
hdr |
array (a,b,c) where where a specifies conditioning value, b gives the HDR endpoints and c gives the probability coverage. |
modes |
estimated mode of each conditional density |
Rob J Hyndman
Hyndman, R.J., Bashtannyk, D.M. and Grunwald, G.K. (1996) "Estimating and visualizing conditional densities". Journal of Computational and Graphical Statistics, 5, 315-336.
faithful.cde <- cde(faithful$waiting,faithful$eruptions) plot(faithful.cde,xlab="Waiting time",ylab="Duration time",plot.fn="hdr")
faithful.cde <- cde(faithful$waiting,faithful$eruptions) plot(faithful.cde,xlab="Waiting time",ylab="Duration time",plot.fn="hdr")
Plots univariate density with highest density regions displayed
hdr.den( x, prob = c(50, 95, 99), den, h = hdrbw(BoxCox(x, lambda), mean(prob)), lambda = 1, xlab = NULL, ylab = "Density", ylim = NULL, plot.lines = TRUE, col = 2:8, bgcol = "gray", legend = FALSE, ... )
hdr.den( x, prob = c(50, 95, 99), den, h = hdrbw(BoxCox(x, lambda), mean(prob)), lambda = 1, xlab = NULL, ylab = "Density", ylim = NULL, plot.lines = TRUE, col = 2:8, bgcol = "gray", legend = FALSE, ... )
x |
Numeric vector containing data. If |
prob |
Probability coverage required for HDRs |
den |
Density of data as list with components |
h |
Optional bandwidth for calculation of density. |
lambda |
Box-Cox transformation parameter where |
xlab |
Label for x-axis. |
ylab |
Label for y-axis. |
ylim |
Limits for y-axis. |
plot.lines |
If |
col |
Colours for regions. |
bgcol |
Colours for the background behind the boxes. Default |
legend |
If |
... |
Other arguments passed to plot. |
Either x
or den
must be provided. When x
is provided,
the density is estimated using kernel density estimation. A Box-Cox
transformation is used if lambda!=1
, as described in Wand, Marron and
Ruppert (1991). This allows the density estimate to be non-zero only on the
positive real line. The default kernel bandwidth h
is selected using
the algorithm of Samworth and Wand (2010).
Hyndman's (1996) density quantile algorithm is used for calculation.
a list of three components:
hdr |
The endpoints of each interval in each HDR |
mode |
The estimated mode of the density. |
falpha |
The value of the density at the boundaries of each HDR. |
Rob J Hyndman
Hyndman, R.J. (1996) Computing and graphing highest density regions. American Statistician, 50, 120-126.
Samworth, R.J. and Wand, M.P. (2010). Asymptotics and optimal bandwidth selection for highest density region estimation. The Annals of Statistics, 38, 1767-1792.
Wand, M.P., Marron, J S., Ruppert, D. (1991) Transformations in density estimation. Journal of the American Statistical Association, 86, 343-353.
# Old faithful eruption duration times hdr.den(faithful$eruptions) # Simple bimodal example x <- c(rnorm(100,0,1), rnorm(100,5,1)) hdr.den(x)
# Old faithful eruption duration times hdr.den(faithful$eruptions) # Simple bimodal example x <- c(rnorm(100,0,1), rnorm(100,5,1)) hdr.den(x)
Estimates the optimal bandwidth for 1-dimensional highest density regions
hdrbw(x, HDRlevel, gridsize = 801, nMChdr = 1e+06, graphProgress = FALSE)
hdrbw(x, HDRlevel, gridsize = 801, nMChdr = 1e+06, graphProgress = FALSE)
x |
Numerical vector containing data. |
HDRlevel |
HDR-level as defined in Hyndman (1996). Setting ‘HDRlevel’ equal to p (0<p<1) corresponds to a probability of 1-p of inclusion in the highest density region. |
gridsize |
the number of equally spaced points used for binned kernel density estimation. |
nMChdr |
the size of the Monte Carlo sample used for density quantile approximation of the highest density region, as described in Hyndman (1996). |
graphProgress |
logical flag: if ‘TRUE’ then plots showing the progress of the bandwidth selection algorithm are produced. |
This is a plug-in rule for bandwidth selection tailored to highest density region estimation
A numerical vector of length 1.
Matt Wand
Hyndman, R.J. (1996). Computing and graphing highest density regions. The American Statistician, 50, 120-126.
Samworth, R.J. and Wand, M.P. (2010). Asymptotics and optimal bandwidth selection for highest density region estimation. The Annals of Statistics, 38, 1767-1792.
HDRlevelVal <- 0.55 x <- faithful$eruptions hHDR <- hdrbw(x,HDRlevelVal) HDRhat <- hdr.den(x,prob=100*(1-HDRlevelVal),h=hHDR)
HDRlevelVal <- 0.55 x <- faithful$eruptions hHDR <- hdrbw(x,HDRlevelVal) HDRhat <- hdr.den(x,prob=100*(1-HDRlevelVal),h=hHDR)
Calculates Highest Density Regions with confidence intervals.
hdrconf(x, den, prob = 95, conf = 95)
hdrconf(x, den, prob = 95, conf = 95)
x |
Numeric vector containing data. |
den |
Density of data as list with components |
prob |
Probability coverage for for HDRs. |
conf |
Confidence for limits on HDR. |
hdrconf
returns list containing the following components:
hdr |
Highest density regions |
hdr.lo |
Highest density regions corresponding to lower confidence limit. |
hdr.hi |
Highest density regions corresponding to upper confidence limit. |
falpha |
Values of
|
falpha.ci |
Values of
|
Rob J Hyndman
Hyndman, R.J. (1996) Computing and graphing highest density regions American Statistician, 50, 120-126.
x <- c(rnorm(100,0,1),rnorm(100,4,1)) den <- density(x,bw=hdrbw(x,50)) trueden <- den trueden$y <- 0.5*(exp(-0.5*(den$x*den$x)) + exp(-0.5*(den$x-4)^2))/sqrt(2*pi) sortx <- sort(x) par(mfcol=c(2,2)) for(conf in c(50,95)) { m <- hdrconf(sortx,trueden,conf=conf) plot(m,trueden,main=paste(conf,"% HDR from true density")) m <- hdrconf(sortx,den,conf=conf) plot(m,den,main=paste(conf,"% HDR from empirical density\n(n=200)")) }
x <- c(rnorm(100,0,1),rnorm(100,4,1)) den <- density(x,bw=hdrbw(x,50)) trueden <- den trueden$y <- 0.5*(exp(-0.5*(den$x*den$x)) + exp(-0.5*(den$x-4)^2))/sqrt(2*pi) sortx <- sort(x) par(mfcol=c(2,2)) for(conf in c(50,95)) { m <- hdrconf(sortx,trueden,conf=conf) plot(m,trueden,main=paste(conf,"% HDR from true density")) m <- hdrconf(sortx,den,conf=conf) plot(m,den,main=paste(conf,"% HDR from empirical density\n(n=200)")) }
Produces a scatterplot where the points are coloured according to the bivariate HDRs in which they fall.
hdrscatterplot( x, y, levels = c(1, 50, 99), kde.package = c("ash", "ks"), noutliers = NULL, label = NULL )
hdrscatterplot( x, y, levels = c(1, 50, 99), kde.package = c("ash", "ks"), noutliers = NULL, label = NULL )
x |
Numeric vector or matrix with 2 columns. |
y |
Numeric vector of same length as |
levels |
Percentage coverage for HDRs |
kde.package |
Package to be used in calculating the kernel density
estimate when |
noutliers |
Number of outliers to be labelled. By default, all points outside the largest HDR are labelled. |
label |
Label of outliers of same length as |
The bivariate density is estimated using kernel density estimation. Either
ash2
or kde
is used to do the
calculations. Then Hyndman's (1996) density quantile algorithm is used to
compute the HDRs. The scatterplot of (x,y) is created where the points are
coloured according to which HDR they fall. A ggplot object is returned.
Rob J Hyndman
x <- c(rnorm(200, 0, 1), rnorm(200, 4, 1)) y <- c(rnorm(200, 0, 1), rnorm(200, 4, 1)) hdrscatterplot(x, y) hdrscatterplot(x, y, label = paste0("p", 1:length(x)))
x <- c(rnorm(200, 0, 1), rnorm(200, 4, 1)) y <- c(rnorm(200, 0, 1), rnorm(200, 4, 1)) hdrscatterplot(x, y) hdrscatterplot(x, y, label = paste0("p", 1:length(x)))
These are two data sets collected in 1993 on two individual lanes (lane 2 and lane 3) of the 4-lane Californian freeway I-880. The data were collected by loop detectors, and the time units are 30 seconds per observation (see Petty et al., 1996, for details).
lane2; lane3
lane2; lane3
Two data frames (lane2
and lane3
) each with 1318
observations on the following two variables:
a numeric vector giving the traffic flow in vehicles per lane per hour.
a numeric vector giving the speed in miles per hour.
The data is examined in Einbeck and Tutz (2006), using a nonparametric approach to multi-valued regression based on conditional mean shift.
Petty, K.F., Noeimi, H., Sanwal, K., Rydzewski, D., Skabardonis, A., Varaiya, P., and Al-Deek, H. (1996). “The Freeway Service Patrol Evaluation Project: Database Support Programs, and Accessibility”. Transportation Research Part C: Emerging Technologies, 4, 71-85.
The data is provided by courtesy of CALIFORNIA PATH, Institute of Transportation Studies, University of California, Berkeley.
Einbeck, J., and Tutz, G. (2006). “Modelling beyond regression functions: an application of multimodal regression to speed-flow data”. Journal of the Royal Statistical Society, Series C (Applied Statistics), 55, 461-475.
plot(lane2) plot(lane3)
plot(lane2) plot(lane3)
Daily maximum temperatures in Melbourne, Australia, from 1981-1990. Leap days have been omitted.
maxtemp
maxtemp
Time series of frequency 365.
Hyndman, R.J., Bashtannyk, D.M. and Grunwald, G.K. (1996) "Estimating and visualizing conditional densities". Journal of Computational and Graphical Statistics, 5, 315-336.
plot(maxtemp)
plot(maxtemp)
Nonparametric multi-valued regression based on the modes of conditional density estimates.
modalreg( x, y, xfix = seq(min(x), max(x), l = 50), a, b, deg = 0, iter = 30, P = 2, start = "e", prun = TRUE, prun.const = 10, plot.type = c("p", 1), labels = c("", "x", "y"), pch = 20, ... )
modalreg( x, y, xfix = seq(min(x), max(x), l = 50), a, b, deg = 0, iter = 30, P = 2, start = "e", prun = TRUE, prun.const = 10, plot.type = c("p", 1), labels = c("", "x", "y"), pch = 20, ... )
x |
Numerical vector: the conditioning variable. |
y |
Numerical vector: the response variable. |
xfix |
Numerical vector corresponding to the input values of which the fitted values shall be calculated. |
a |
Optional bandwidth in |
b |
Optional bandwidth in |
deg |
Degree of local polynomial used in estimation (0 or 1). |
iter |
Positive integer giving the number of mean shift iterations per point and branch. |
P |
Maximal number of branches. |
start |
Character determining how the starting points are selected.
|
prun |
Boolean. If TRUE, parts of branches are dismissed (in the
plotted output) where their associated kernel density value falls below the
threshold |
prun.const |
Numerical value giving the constant used above (the higher, the less pruning) |
plot.type |
Vector with two elements. The first one is
character-valued, with possible values |
labels |
Vector of three character strings. The first one is the
"main" title of the graphical output, the second one is the label of the
|
pch |
Plotting character. The default corresponds to small bullets. |
... |
Other arguments passed to |
Computes multi-modal nonparametric regression curves based on the maxima of
conditional density estimates. The tool for the estimation is the
conditional mean shift as outlined in Einbeck and Tutz (2006). Estimates of
the conditional modes might fluctuate highly if deg=1
. Hence,
deg=0
is recommended. For bandwidth selection, the hybrid rule
introduced by Bashtannyk and Hyndman (2001) is employed if deg=0
.
This corresponds to the setting method=1
in function
cde.bandwidths
. For deg=1
automatic bandwidth selection is not
supported.
A list with the following components:
xfix |
Grid of predictor values at which the fitted values are calculated. |
fitted.values |
A
|
bandwidths |
A vector with
bandwidths |
density |
A |
threshold |
The pruning threshold. |
Jochen Einbeck (2007)
Einbeck, J., and Tutz, G. (2006) "Modelling beyond regression functions: an application of multimodal regression to speed-flow data". Journal of the Royal Statistical Society, Series C (Applied Statistics), 55, 461-475.
Bashtannyk, D.M., and Hyndman, R.J. (2001) "Bandwidth selection for kernel conditional density estimation". Computational Statistics and Data Analysis, 36(3), 279-298.
lane2.fit <- modalreg(lane2$flow, lane2$speed, xfix=(1:55)*40, a=100, b=4)
lane2.fit <- modalreg(lane2$flow, lane2$speed, xfix=(1:55)*40, a=100, b=4)
Produces stacked density plots or highest density region plots for a univariate density conditional on one covariate.
## S3 method for class 'cde' plot( x, firstvar = 1, mfrow = n2mfrow(dim(x$z)[firstvar]), plot.fn = "stacked", x.name, margin = NULL, ... )
## S3 method for class 'cde' plot( x, firstvar = 1, mfrow = n2mfrow(dim(x$z)[firstvar]), plot.fn = "stacked", x.name, margin = NULL, ... )
x |
Output from |
firstvar |
If there is more than one conditioning variable,
|
mfrow |
If there is more than one conditioning variable, |
plot.fn |
Specifies which plotting function to use: "stacked" results in stacked conditional densities and "hdr" results in highest density regions. |
x.name |
Name of x (conditioning) variable for use on x-axis. |
margin |
Marginal density of conditioning variable. If present, only conditional densities corresponding to non-negligible marginal densities will be plotted. |
... |
Additional arguments to plot. |
If plot.fn=="stacked"
and there is only one conditioning
variable, the function returns the output from
persp
. If plot.fn=="hdr"
and there is only
one conditioning variable, the function returns the output from
hdr.cde
. When there is more than one conditioning variable,
nothing is returned.
Rob J Hyndman
Hyndman, R.J., Bashtannyk, D.M. and Grunwald, G.K. (1996) "Estimating and visualizing conditional densities". Journal of Computational and Graphical Statistics, 5, 315-336.
faithful.cde <- cde(faithful$waiting,faithful$eruptions, x.name="Waiting time", y.name="Duration time") plot(faithful.cde) plot(faithful.cde,plot.fn="hdr")
faithful.cde <- cde(faithful$waiting,faithful$eruptions, x.name="Waiting time", y.name="Duration time") plot(faithful.cde) plot(faithful.cde,plot.fn="hdr")
Plots Highest Density Regions with confidence intervals.
## S3 method for class 'hdrconf' plot(x, den, ...)
## S3 method for class 'hdrconf' plot(x, den, ...)
x |
Output from |
den |
Density of data as list with components |
... |
Other arguments are passed to plot. |
None
Rob J Hyndman
Hyndman, R.J. (1996) Computing and graphing highest density regions American Statistician, 50, 120-126.
x <- c(rnorm(100,0,1),rnorm(100,4,1)) den <- density(x,bw=bw.SJ(x)) trueden <- den trueden$y <- 0.5*(exp(-0.5*(den$x*den$x)) + exp(-0.5*(den$x-4)^2))/sqrt(2*pi) sortx <- sort(x) par(mfcol=c(2,2)) for(conf in c(50,95)) { m <- hdrconf(sortx,trueden,conf=conf) plot(m,trueden,main=paste(conf,"% HDR from true density")) m <- hdrconf(sortx,den,conf=conf) plot(m,den,main=paste(conf,"% HDR from empirical density\n(n=200)")) }
x <- c(rnorm(100,0,1),rnorm(100,4,1)) den <- density(x,bw=bw.SJ(x)) trueden <- den trueden$y <- 0.5*(exp(-0.5*(den$x*den$x)) + exp(-0.5*(den$x-4)^2))/sqrt(2*pi) sortx <- sort(x) par(mfcol=c(2,2)) for(conf in c(50,95)) { m <- hdrconf(sortx,trueden,conf=conf) plot(m,trueden,main=paste(conf,"% HDR from true density")) m <- hdrconf(sortx,den,conf=conf) plot(m,den,main=paste(conf,"% HDR from empirical density\n(n=200)")) }