Pages

Tuesday, September 30, 2014

Criteolab kaggle challenge


  1. Take a random sample of the train data
  2. Decision trees work worse than even random solution in this case 
  3. Logistic regression with only the independent variables work better
  4. The data is huge and we cant load the whole data into memory and probably we dont need the whole data to learn a model, but we need more insights into the categorical variables.
  5. Perl script to find statistics of the categorical variables : https://github.com/novieq/kaggle/blob/master/test/stats.pl
  6. Convert unknown values into NA so that they are treated as missing values or Convert all the categorical variables to CTR values
  7. sd

Categorical Variables




No comments:

Post a Comment