Disadvantages
- Ensembles of decision trees (such as Random Forests, which is a trademarked term for one particular implementation) are very fast to train, but quite slow to create predictions once trained. More accurate ensembles require more trees, which means using the model becomes slower. In most practical situations this approach is fast enough, but there can certainly be situations where run-time performance is important and therefore other approaches would be preferred.
- Results of learning are incomprehensible. Compared to a single decision tree, or to a set of rules, they don't give you a lot of insight.
- They are hard to make incremental. It can be done, but there's no natural algorithm to do so (in the same way, for instance, that it's easy to tweak the parameters of naive bayes after you add an instance, or to add a new instance to a nearest neighbour classifier).
- Random Forest is usually less accurate than Boosting/GBM on wide range of tasks, and usually slower in the runtime.
- Overfitting : Broadly the reason is the same as for any algorithm: fitting noise instead of signal. In decision trees, this happens when the trees are too deep. If you just kept going, eventually the tree would have a node for every distinct point and it turns into a form of 1-nearest-neighbor classifier. This fits the training data too closely and is unlikely to generalize. This is why there is usually a stopping or pruning criteria. It could be a minimum node size, such that nodes with <= N examples are not split further, or a minimum information gain, such that nodes that have no decision that decreases entropy more than a trivial amount are not split.
Advantages of Random Forests :
- Random forest is robust to outliers
- And a large amount of classifiers is very likely to happen to generate a good classifiers by finding a decent subset of features.
References :
No comments:
Post a Comment