- If variable's distribution has a long tail(left skewed distribution), apply Box-Cox transformation (taking log() is a quick & dirty way). Standardize all variables when in doubt (does not hurt anyway)
- We want to turn categorical features into count feature because one-hot-encoding would curse us with dimensionality rendering tree-based models unmanageable. We store the counts from the train set for every unique hash inside a dictionary, and use this to replace hashes with their count occurrence.
References :
Thank you for the nice article here. Really nice and keep update to explore more ideas.
ReplyDeleteGame QA Company
Game Functionality Testing
Game Compatibility Testing
Game Compliance Testing