Pages

Wednesday, October 29, 2014

Feature Hashing

Adaptive Learning rate in gradient descent

Depending on the cost function F that we will select, we might face different problems. When the Sum of Squared Errors is selected as our cost function then the value of θF(Wj)/θWj gets larger and larger as we increase the size of the training dataset. Thus the λ must be adapted to significantly smaller values.
One way to resolve this problem is to divide the λ with 1/N, where N is the size of the training data. So the update step of the algorithm can be rewritten as:
1
Wj = Wj - (λ/N)*θF(Wj)/θWj
You can read more about this on Wilson et al. paper “The general inefficiency of batch training for gradient descent learning”.
Finally another way to resolve this problem is by selecting a cost function that is not affected by the number of train examples that we use, such as the Mean Squared Errors.
This technique was used in the online gradient descent code by tingrtu in Criteo Ad Click Competition organized by Kaggle.
Reference : http://blog.datumbox.com/tuning-the-learning-rate-in-gradient-descent/

Tuesday, October 28, 2014

How to score your model using different scoring functions in Python

The scoring parameter can be a callable that takes model predictions and ground truth.
However, if you want to use a scoring function that takes additional parameters, such as fbeta_score, you need to generate an appropriate scoring object. The simplest way to generate a callable object for scoring is by using make_scorer. That function converts score functions (discussed below in Function for prediction-error metrics) into callables that can be used for model evaluation.
One typical use case is to wrap an existing scoring function from the library with non default value for its parameters such as the beta parameter for the fbeta_score function:
>>>
>>> from sklearn.metrics import fbeta_score, make_scorer
>>> ftwo_scorer = make_scorer(fbeta_score, beta=2)
>>> from sklearn.grid_search import GridSearchCV
>>> from sklearn.svm import LinearSVC
>>> grid = GridSearchCV(LinearSVC(), param_grid={'C': [1, 10]}, scoring=ftwo_scorer)
The second use case is to build a completely new and custom scorer object from a simple python function:
>>>
>>> def my_custom_loss_func(ground_truth, predictions):
...     diff = np.abs(ground_truth - predictions).max()
...     return np.log(1 + diff)
...
>>> my_custom_scorer = make_scorer(my_custom_loss_func, greater_is_better=False)
>>> grid = GridSearchCV(LinearSVC(), param_grid={'C': [1, 10]}, scoring=my_custom_scorer)
make_scorer takes as parameters:
  • the function you want to use
  • whether it is a score (greater_is_better=True) or a loss (greater_is_better=False),
  • whether the function you provided takes predictions as input (needs_threshold=False) or needs confidence scores (needs_threshold=True)
  • any additional parameters, such as beta in an f1_score.