Pages

Monday, September 15, 2014

What is the difference between bagging, boosting and stacking ?

These are different approaches to improve the performance of your model (so-called meta-algorithms):
  1. Bagging (stands for Bootstrap Aggregation) is the way decrease the variance of your prediction by generating additional data for training from your original dataset usingcombinations with repetitions to produce multisets of the same cardinality/size as your original data. By increasing the size of your training set you can't improve the model predictive force, but just decrease the variance, narrowly tuning the prediction to expected outcome.
  2. Bagging reduces variance by averaging and bagging has little effect on bias. Way to reduce variance and bias both - boosting
  3. Boosting 
    • Train model on train set
    • Compute error of model on train set
    • Increase weights on train cases model gets wrong
    • Train new model on re-weighted train set ( Draw a bootstrap sample from the data with the probability of drawing each example is proportional to its weight  - resampling is easier to implement than reweighting)
    • Re-compute errors on weighted train set
    • Increase weights again on cases model gets wrong
    • Repeat until tired (100+ iteraations)
    • Final model: weighted prediction of each model
  4. Boosting can hurt with noisy data sets but bagging wont hurt with noisy datasets
  5. As a thumb rule, bagging almost always helps, boosting helps more than bagging and is supposed to hurt as well. So, you have to remove the noise first before you can use boosting. Bagging you can keep the noise and it will still get you the results.

  6. Stacking is a similar to boosting: you also apply several models to you original data. The difference here is, however, that you don't have just an empirical formula for your weight function, rather you introduce a meta-level and use another model/approach to estimate the input together with outputs of every model to estimate the weights or, in other words, to determine what models perform well and what badly given these input data.

As you see, these all are different approaches to combine several models into a better one, and there is no single winner here: everything depends upon your domain and what you're going to do. You can still treat stacking as a sort of more advances boosting, however, the difficulty of finding a good approach for your meta-level makes it difficult to apply this approach in practice.
Short examples of each:
  1. BaggingOzone data.
  2. Boosting: is used to improve optical character recognition (OCR) accuracy.
  3. Stacking: is used in K-fold cross validation algorithms.
Excellent references from which this post is compiled : 

No comments:

Post a Comment