- Take a random sample of the train data
- Decision trees work worse than even random solution in this case
- Logistic regression with only the independent variables work better
- The data is huge and we cant load the whole data into memory and probably we dont need the whole data to learn a model, but we need more insights into the categorical variables.
- Perl script to find statistics of the categorical variables : https://github.com/novieq/kaggle/blob/master/test/stats.pl
- Convert unknown values into NA so that they are treated as missing values or Convert all the categorical variables to CTR values
- sd
Categorical Variables
No comments:
Post a Comment