- Works better on high dimensional data than logistic regression (features >> samples)
- Creates a maximum margin classifier - hence its prone to outliers
- Outlier removal can give a better model
- C works reverse than lamba as a regularization parameter
- High Value of C means the model error term has to be zero and only the regularization part will stay - so this will be a high variance model
- On extremely high dimensional data sometimes, PCA followed by SVM works well.
- My rank 57 on African Soil Challenge competition in kaggle was an ensemble of PCA SVM, SVM on the data and I used Box Cox(log) transformation for P value. P was the hardest to predict and showed that the distribution was left skewed. So a log transform helped. This was also used as an input to the blend.
This blog is about my learnings in big data, product management and digital advertising.
Saturday, September 27, 2014
Support Vector Machine tips
Labels:
SVM
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment