Pages

Saturday, September 27, 2014

Support Vector Machine tips


  1. Works better on high dimensional data than logistic regression (features >> samples)
  2. Creates a maximum margin classifier - hence its prone to outliers 
  3. Outlier removal can give a better model
  4. C works reverse than lamba as a regularization parameter 
  5. High Value of C means the model error term has to be zero and only the regularization part will stay - so this will be a high variance model
  6. On extremely high dimensional data sometimes, PCA followed by SVM works well.
  7. My rank 57 on African Soil Challenge competition in kaggle was an ensemble of PCA SVM, SVM on the data and I used Box Cox(log) transformation for P value. P was the hardest to predict and showed that the distribution was left skewed. So a log transform helped. This was also used as an input to the blend. 

No comments:

Post a Comment