Title: | Feature-Based Forecast Model Selection |
---|---|
Description: | A novel meta-learning framework for forecast model selection using time series features. Many applications require a large number of time series to be forecast. Providing better forecasts for these time series is important in decision and policy making. We propose a classification framework which selects forecast models based on features calculated from the time series. We call this framework FFORMS (Feature-based FORecast Model Selection). FFORMS builds a mapping that relates the features of time series to the best forecast model using a random forest. 'seer' package is the implementation of the FFORMS algorithm. For more details see our paper at <https://www.monash.edu/business/econometrics-and-business-statistics/research/publications/ebs/wp06-2018.pdf>. |
Authors: | Thiyanga Talagala [aut, cre] , Rob J Hyndman [ths, aut] , George Athanasopoulos [ths, aut] |
Maintainer: | Thiyanga Talagala <[email protected]> |
License: | GPL-3 |
Version: | 1.1.8 |
Built: | 2024-10-31 21:13:19 UTC |
Source: | https://github.com/thiyangt/seer |
Calculate accuracy measue based on ARIMA models
accuracy_arima(ts_info, function_name, length_out)
accuracy_arima(ts_info, function_name, length_out)
ts_info |
list containing training and test part of a time series |
function_name |
function to calculate the accuracy function, the arguments of this function should be forecast, training and test set of the time series |
length_out |
number of measures calculated by the function |
a list which contains the accuracy and name of the specific ARIMA model.
Calculate accuracy measure based on ETS models
accuracy_ets(ts_info, function_name, length_out)
accuracy_ets(ts_info, function_name, length_out)
ts_info |
list containing training and test part of a time series |
function_name |
function to calculate the accuracy function, the arguments of this function should be forecast, training and test set of the time series |
length_out |
number of measures calculated by the function |
a list which contains the accuracy and name of the specific ETS model.
Calculate accuracy based on MSTL
accuracy_mstl(ts_info, function_name, length_out, mtd)
accuracy_mstl(ts_info, function_name, length_out, mtd)
ts_info |
list containing training and test part of a time series |
function_name |
function to calculate the accuracy function, the arguments of this function should be forecast, training and test set of the time series |
length_out |
number of measures calculated by the function |
mtd |
Method to use for forecasting the seasonally adjusted series |
accuracy measure calculated based on multiple seasonal decomposition
Calculate accuracy measure calculated based on neural network forecasts
accuracy_nn(ts_info, function_name, length_out)
accuracy_nn(ts_info, function_name, length_out)
ts_info |
list containing training and test part of a time series |
function_name |
function to calculate the accuracy function, the arguments of this function should be forecast, training and test set of the time series |
length_out |
number of measures calculated by the function |
accuracy measure calculated based on neural network forecasts
Calculate accuracy measure based on random walk models
accuracy_rw(ts_info, function_name, length_out)
accuracy_rw(ts_info, function_name, length_out)
ts_info |
list containing training and test part of a time series |
function_name |
function to calculate the accuracy function, the arguments of this function should be forecast, training and test set of the time series |
length_out |
number of measures calculated by the function |
returns accuracy measure calculated baded on random walk model
Calculate accuracy measure based on random walk with drift
accuracy_rwd(ts_info, function_name, length_out)
accuracy_rwd(ts_info, function_name, length_out)
ts_info |
list containing training and test part of a time series |
function_name |
function to calculate the accuracy function, the arguments of this function should be forecast, training and test set of the time series |
length_out |
number of measures calculated by the function |
accuracy measure calculated baded on random walk with drift model
Calculate accuracy measure based on snaive method
accuracy_snaive(ts_info, function_name, length_out)
accuracy_snaive(ts_info, function_name, length_out)
ts_info |
list containing training and test part of a time series |
function_name |
function to calculate the accuracy function, the arguments of this function should be forecast, training and test set of the time series |
length_out |
number of measures calculated by the function |
accuracy measure calculated based on snaive method
Calculate accuracy measure based on STL-AR method
accuracy_stlar(ts_info, function_name, length_out)
accuracy_stlar(ts_info, function_name, length_out)
ts_info |
list containing training and test part of a time series |
function_name |
function to calculate the accuracy function, the arguments of this function should be forecast, training and test set of the time series |
length_out |
number of measures calculated by the function |
accuracy measure calculated based on stlar method
Calculate accuracy measure based on TBATS
accuracy_tbats(ts_info, function_name, length_out)
accuracy_tbats(ts_info, function_name, length_out)
ts_info |
list containing training and test part of a time series |
function_name |
function to calculate the accuracy function, the arguments of this function should be forecast, training and test set of the time series |
length_out |
number of measures calculated by the function |
accuracy measure calculated based on TBATS models
Calculate accuracy measure based on Theta method
accuracy_theta(ts_info, function_name, length_out)
accuracy_theta(ts_info, function_name, length_out)
ts_info |
list containing training and test part of a time series |
function_name |
function to calculate the accuracy function, the arguments of this function should be forecast, training and test set of the time series |
length_out |
number of measures calculated by the function |
returns accuracy measure calculated based on theta method
Calculate accuracy measure based on white noise process
accuracy_wn(ts_info, function_name, length_out)
accuracy_wn(ts_info, function_name, length_out)
ts_info |
list containing training and test part of a time series |
function_name |
function to calculate the accuracy function, the arguments of this function should be forecast, training and test set of the time series |
length_out |
number of measures calculated by the function |
returns accuracy measure calculated based on white noise process
Autocorrelation coefficients based on seasonally differenced series
acf_seasonalDiff(y, m, lagmax)
acf_seasonalDiff(y, m, lagmax)
y |
a univariate time series |
m |
frequency of the time series |
lagmax |
maximum lag at which to calculate the acf |
A vector of 3 values: first ACF value of seasonally-differenced series, ACF value at the first seasonal lag of seasonally-differenced series, sum of squares of first 5 autocorrelation coefficients of seasonally-differenced series.
Thiyanga Talagala
Computes various measures based on autocorrelation coefficients of the original series, first-differenced series and second-differenced series
acf5(y)
acf5(y)
y |
a univariate time series |
A vector of 3 values: sum of squared of first five autocorrelation coefficients of original series, first-differenced series, and twice-differenced series.
Thiyanga Talagala
train a random forest model and predict forecast-models for new series
build_rf( training_set, testset = FALSE, rf_type = c("ru", "rcp"), ntree, seed, import = FALSE, mtry = 8 )
build_rf( training_set, testset = FALSE, rf_type = c("ru", "rcp"), ntree, seed, import = FALSE, mtry = 8 )
training_set |
data frame of features and class labels |
testset |
features of new time series, default FALSE if a testset is not available |
rf_type |
whether ru(random forest based on unbiased sample) or rcp(random forest based on class priors) |
ntree |
number of trees in the forest |
seed |
a value for seed |
import |
Should importance of predictors be assessed?, TRUE of FALSE |
mtry |
number of features to be selected at each node |
a list containing the random forest and forecast-models for new series
Computes relevant time series features before applying them to the model
cal_features( tslist, seasonal = FALSE, m = 1, lagmax = 2L, database, h, highfreq )
cal_features( tslist, seasonal = FALSE, m = 1, lagmax = 2L, database, h, highfreq )
tslist |
a list of univariate time series |
seasonal |
if FALSE, restricts to features suitable for non-seasonal data |
m |
frequency of the time series or minimum frequency in the case of msts objects |
lagmax |
maximum lag at which to calculate the acf (quarterly series-5L, monthly-13L, weekly-53L, daily-8L, hourly-25L) |
database |
whether the time series is from mcomp or other |
h |
forecast horizon |
highfreq |
whether the time series is weekly, daily or hourly |
dataframe: each column represent a feature and each row represent a time series
Thiyanga Talagala
Calculate MASE and sMAPE for an individual time series
cal_m4measures(training, test, forecast)
cal_m4measures(training, test, forecast)
training |
training period of a time series |
test |
test peiod of a time series |
forecast |
forecast obtained from a fitted to the training period |
returns a single value: mean on MASE and sMAPE
Thiyanga Talagala
require(Mcomp) require(magrittr) ts <- Mcomp::M3[[1]]$x fcast_arima <- auto.arima(ts) %>% forecast(h=6) cal_m4measures(M3[[1]]$x, M3[[1]]$xx, fcast_arima$mean)
require(Mcomp) require(magrittr) ts <- Mcomp::M3[[1]]$x fcast_arima <- auto.arima(ts) %>% forecast(h=6) cal_m4measures(M3[[1]]$x, M3[[1]]$xx, fcast_arima$mean)
Calculation of mean absolute scaled error
cal_MASE(training, test, forecast)
cal_MASE(training, test, forecast)
training |
training peiod of the time series |
test |
test period of the time series |
forecast |
forecast values of the series |
returns a single value
Thiyanga Talagala
Given a matrix of MASE and sMAPE for each forecasting method and scaled by median and take the mean of MASE-scaled by median and sMAPE-scaled by median as the forecast accuracy measure to identify the class labels
cal_medianscaled(x)
cal_medianscaled(x)
x |
output form the function fcast_accuracy, where the parameter accuracyFun = cal_m4measures |
a list with accuracy matrix, vector of arima models and vector of ets models the accuracy for each forecast-method is average of scaled-MASE and scaled-sMAPE. Median of MASE and sMAPE calculated based on forecast produced from different models for a given series.
Calculation of symmetric mean absolute percentage error
cal_sMAPE(training, test, forecast)
cal_sMAPE(training, test, forecast)
training |
training peiod of the time series |
test |
test period of the time series |
forecast |
forecast values of the series |
returns a single value
Thiyanga Talagala
Weighted Average(WA) calculated based on MASE, sMAPE for an individual time series
cal_WA(training, test, forecast)
cal_WA(training, test, forecast)
training |
training period of a time series |
test |
test peiod of a time series |
forecast |
forecast obtained from a fitted to the training period |
returns a single value: WA based on MASE and sMAPE
Thiyanga Talagala
This function further classify class labels as in FFORMS framework
classify_labels(df_final)
classify_labels(df_final)
df_final |
a dataframe: output from split_names function |
a vector of class labels in FFORMS framewok
identify the best forecasting method according to the forecast accuacy measure
classlabel(accuracy_mat)
classlabel(accuracy_mat)
accuracy_mat |
matrix of forecast accuracy measures (rows: time series, columns: forecasting method) |
a vector: best forecasting method for each series corresponding to the rows of accuracy_mat
Thiyanga Talagala
Given weights and time series in a two seperate vectors calculate combination forecast
combination_forecast_inside(x, y, h)
combination_forecast_inside(x, y, h)
x |
weights and names of models (output based on fforms.ensemble) |
y |
time series values |
h |
forecast horizon |
list of combination forecasts corresponds to point, lower and upper
Thiyanga Talagala
Convert multiple frequency(daily, hourly, half-hourly, minutes, seconds) time series into msts object.
convert_msts(y, category)
convert_msts(y, category)
y |
univariate time series |
category |
frequency data have been collected |
a ts object or msts object
Computes the first order autocorrelation of the residual series of the deterministic trend model
e_acf1(y)
e_acf1(y)
y |
a univariate time series |
A numeric value.
Thiyanga Talagala
Calculate forecast accuracy on test set according to a specified criterion
fcast_accuracy( tslist, models = c("ets", "arima", "rw", "rwd", "wn", "theta", "stlar", "nn", "snaive", "mstlarima", "mstlets", "tbats"), database, accuracyFun, h, length_out, fcast_save )
fcast_accuracy( tslist, models = c("ets", "arima", "rw", "rwd", "wn", "theta", "stlar", "nn", "snaive", "mstlarima", "mstlets", "tbats"), database, accuracyFun, h, length_out, fcast_save )
tslist |
a list of time series |
models |
a vector of models to compute |
database |
whether the time series is from mcomp or other |
accuracyFun |
function to calculate the accuracy measure, the arguments for the accuracy function should be training, test and forecast |
h |
forecast horizon |
length_out |
number of measures calculated by a single function |
fcast_save |
if the argument is TRUE, forecasts from each series are saved |
a list with accuracy matrix, vector of arima models and vector of ets models
Thiyanga Talagala
Compute combination forecast based on the vote matrix probabilities
fforms_combinationforecast( fforms.ensemble, tslist, database, h, holdout = TRUE, parallel = FALSE, multiprocess = future::multisession )
fforms_combinationforecast( fforms.ensemble, tslist, database, h, holdout = TRUE, parallel = FALSE, multiprocess = future::multisession )
fforms.ensemble |
a list output from fforms_ensemble function |
tslist |
list of new time series |
database |
whethe the time series is from mcom or other |
h |
length of the forecast horizon |
holdout |
if holdout=TRUE take a holdout sample from your data to caldulate forecast accuracy measure, if FALSE all of the data will be used for forecasting. Default is TRUE |
parallel |
If TRUE, multiple cores (or multiple sessions) will be used. This only speeds things up when there are a large number of time series. |
multiprocess |
The function from the |
a list containing, point forecast, confidence interval, accuracy measure
Thiyanga Talagala
This function identify models to be use in producing combination forecast
fforms_ensemble(votematrix, threshold = 0.5)
fforms_ensemble(votematrix, threshold = 0.5)
votematrix |
a matrix of votes of probabilities based of fforms random forest classifier |
threshold |
threshold value for sum of probabilities of votes, default is 0.5 |
a list containing the names of the forecast models
Thiyanga Talagala
Estimate the smoothing parameter for the level-alpha and the smoothing parameter for the trend-beta, and seasonality-gamma
holtWinter_parameters(y)
holtWinter_parameters(y)
y |
a univariate time series |
A vector of 3 values: alpha, beta, gamma
Thiyanga Talagala
Preparation of a training set for random forest training
prepare_trainingset(accuracy_set, feature_set)
prepare_trainingset(accuracy_set, feature_set)
accuracy_set |
output from the fcast_accuracy |
feature_set |
output from the cal_features |
dataframe consisting features and classlabels
Given the prediction results of random forest calculate point forecast, 95% confidence intervals, forecast-accuracy for the test set
rf_forecast( predictions, tslist, database, function_name, h, accuracy, holdout = TRUE )
rf_forecast( predictions, tslist, database, function_name, h, accuracy, holdout = TRUE )
predictions |
prediction results obtained from random forest classifier |
tslist |
list of new time series |
database |
whethe the time series is from mcom or other |
function_name |
specify the name of the accuracy function (for eg., cal_MASE, etc.) to calculate accuracy measure, ( if a user written function the arguments for the accuracy function should be training period, test period and forecast). |
h |
length of the forecast horizon |
accuracy |
if true a accuaracy measure will be calculated |
holdout |
if holdout=TRUE take a holdout sample from your data to caldulate forecast accuracy measure, if FALSE all of the data will be used for forecasting. Default is TRUE |
a list containing, point forecast, confidence interval, accuracy measure
Thiyanga Talagala
simulate multiple time series for a given series based on ARIMA models
sim_arimabased( y, Nsim, Combine = TRUE, M = TRUE, Future = FALSE, Length = NA, extralength = NA )
sim_arimabased( y, Nsim, Combine = TRUE, M = TRUE, Future = FALSE, Length = NA, extralength = NA )
y |
a time series or M-competition data time series (Mcomp) |
Nsim |
number of time series to simulate |
Combine |
if TRUE, training and test data in the M-competition data are combined and generate a time series corresponds to the full length of the series. Otherwise, it generate a time series based on the training period of the series. |
M |
if TRUE, y is considered to be a Mcomp data object |
Future |
if future=TRUE, the simulated observations are conditional on the historical observations. In other words, they are possible future sample paths of the time series. But if future=FALSE, the historical data are ignored, and the simulations are possible realizations of the time series model that are not connected to the original data. |
Length |
length of the simulated time series. If future = FALSE, the Length agument should be NA. |
extralength |
extra length need to be added for simulated time series |
A list of time series.
Thiyanga Talagala
simulate multiple time series for a given series based on ETS models
sim_etsbased( y, Nsim, Combine = TRUE, M = TRUE, Future = FALSE, Length = NA, extralength = NA )
sim_etsbased( y, Nsim, Combine = TRUE, M = TRUE, Future = FALSE, Length = NA, extralength = NA )
y |
a time series or M-competition data time series (Mcomp) |
Nsim |
number of time series to simulate |
Combine |
if TRUE, training and test data in the M-competition data are combined and generate a time series corresponds to the full length of the series. Otherwise, it generate a time series based on the training period of the series. |
M |
if TRUE, y is considered to be a Mcomp data object |
Future |
if future=TRUE, the simulated observations are conditional on the historical observations. In other words, they are possible future sample paths of the time series. But if future=FALSE, the historical data are ignored, and the simulations are possible realizations of the time series model that are not connected to the original data. |
Length |
length of the simulated time series. If future = FALSE, the Length agument should be NA. |
extralength |
extra length need to be added for simulated time series |
A list of time series.
Thiyanga Talagala
simulate multiple time series based a given series using multiple seasonal decomposition
sim_mstlbased( y, Nsim, Combine = TRUE, M = TRUE, Future = FALSE, Length = NA, extralength = NA, mtd = "ets" )
sim_mstlbased( y, Nsim, Combine = TRUE, M = TRUE, Future = FALSE, Length = NA, extralength = NA, mtd = "ets" )
y |
a time series or M-competition data time series (Mcomp object) |
Nsim |
number of time series to simulate |
Combine |
if TRUE, training and test data in the M-competition data are combined and generate a time series corresponds to the full length of the series. Otherwise, it generate a time series based on the training period of the series. |
M |
if TRUE, y is considered to be a Mcomp data object |
Future |
if future=TRUE, the simulated observations are conditional on the historical observations. In other words, they are possible future sample paths of the time series. But if future=FALSE, the historical data are ignored, and the simulations are possible realizations of the time series model that are not connected to the original data. |
Length |
length of the simulated time series. If future = FALSE, the Length agument should be NA. |
extralength |
extra length need to be added for simulated time series |
mtd |
method to use for forecasting seasonally adjusted time series |
A list of time series.
Thiyanga Talagala
split the names of ARIMA, ETS models to model name, different number of parameters in each case.
split_names(models)
split_names(models)
models |
vector of model names |
a dataframe where columns gives the description of model components
STL decomposition method applied to the time series, then an AR model is used to forecast seasonally adjusted data, while the seasonal naive method is used to forecast the seasonal component
stlar(y, h = 10, s.window = 11, robust = FALSE)
stlar(y, h = 10, s.window = 11, robust = FALSE)
y |
a univariate time series |
h |
forecast horizon |
s.window |
Either the character string “periodic” or the span (in lags) of the loess window for seasonal extraction |
robust |
logical indicating if robust fitting be used in the loess procedue |
return object of class forecast
Thiyanga Talagala
Computes the test statistics based on unit root tests Phillips–Perron test and KPSS test
unitroot(y)
unitroot(y)
y |
a univariate time series |
A vector of 3 values: test statistic based on PP-test and KPSS-test
Thiyanga Talagala