What is an ensemble technique?
In nutshell, ensemble techniques combine multiple individual techniques to build a more powerful technique, which allows us to make a better prediction and improve results compared to a single model. In ML courses, we might have read about bagging and boosting where many decision trees are combined to build powerful Random forest and XGb models.
In the above examples, you are combining similar kinds of models, like we are combining multiple decision trees to build a stronger model. But what if we want to combine different kinds of models like UCM, GAM, NN, Random forest, etc. One way will be to give weight to each of them, but it will be less flexibility.
To overcome these gaps, we are introducing the OPERA package which might be useful in many scenarios.
OPERA – How does it work?
- Experts: Set of individual technique
- Mixture: to build the algorithm object
- Predict: to make the prediction by using the algorithm
- Oracle: to evaluate the performance of the experts and compare the performance of the combining algorithm
mixture( Y = NULL, experts = NULL, model = "MLpol", loss.type = "square", loss.gradient = TRUE, coefficients = "Uniform", awake = NULL, parameters = list() )
• Y: data stream to predict
• Experts: the set of experts
• Model: aggregation methods (EWA, FS, Ridge, MLpol, OGD)
• Loss.type: the loss function (absolute, square, percentage,…)
oracle( Y, experts, model = "convex", loss.type = "square", awake = NULL, lambda = NULL, niter = NULL, ... )
• ‘expert’: The best fixed (constant over time) expert oracle.
• ‘convex’: The best fixed convex combination (vector of non-negative weights that sum to 1)
• ‘linear’: The best fixed linear combination of expert
• ‘shifting’: It computes for all number $m$ of switches the sequence of experts with at most $m$ shifts that would have performed the best to predict the sequence of observations in Y.
In the below image, you can see how important to experts varies across time frame.
OPERA Implementation using Real Data
## Using Tourist data dt <- read.csv("ts_visitors.csv") dt <- ts(dt$United.Kingdom, start = c(1998,4), frequency = 4) train <- window(dt, end = c(2009,4), start = c(1999,1)) test <- window(dt,start = c(2010,1)) forecast.period <- length(test)
You can use the attached R code for detail code. Here I am giving a glance to important steps. We are using Tourist data which you can download from here. Download data
########### Example 2: COMPLETE MODEL - TRAINING WITH 3 EXPERTS expert.1 <- forecast(auto.arima(train), h = forecast.period) expert.2 <- forecast(ets(train), h = forecast.period) expert.3 <- forecast(tbats(train), forecast.period)
########### Building Oracle funciton MLpol <- mixture(Y = train, experts = train.experts, loss.type = "square", model = "MLpol") oracle.convex <- oracle(Y = train, experts = train.experts, loss.type = "square", model = "convex")
########### Prediciting using above trained MLpol function z <- ts(predict(MLpol, test.experts, y = null, online = F, type = "response"), start = c(2010,1), frequency = 4)
########### Calculating MAPE MAPE <- abs(sum(qc.data$z) - sum(qc.data$test))/sum(qc.data$test) MAPE
Limitation and drawbacks
1. As this is an ensemble technique, finding feature importance will be challenging
2. The model tends to overfit on training data and might not perform well on testing data
3. The model weights are static in sequential order and might not work in the same order in testing data
Git Link: Download code from here.