Instacart - Model Evaluation

3 minute read

In this notebook we will load the Random Forest model fitted on all the training data and evaluate its F1-measure performance on the validation set, using a probability threshold of 0.7 as determined in the Instacart Model Fitting notebook.

The validation set was extracted by splitting the Instacart training data as available in the Instacart Market Basket Analysis dataset provided by Kaggle.

Support Functions

from IPython.display import Markdown, display
%matplotlib inline
import pandas as pd
import numpy as np
#-------------------------------------------------------------------------
# Functions to load datasets into memory using space efficient data types.

def load_validation_data(path):
    training = pd.read_csv(path, dtype={'user_id': np.uint32,
                                        'product_id': np.uint32,
                                        'prod_prob': np.float16,
                                        'aisle_prob': np.float16,
                                        'department_prob': np.float16,
                                        'reorder_prob': np.float16,
                                        'recency_prob': np.float16,
                                        'DTOP_prob': np.float16,
                                        'reordered': np.uint8},
                           usecols=['prod_prob','aisle_prob','department_prob',
                                    'reorder_prob','recency_prob','DTOP_prob',
                                    'reordered'])

    return training

#----------------------------------------------------------------------
# Function to generate markdown output
# Ref: https://stackoverflow.com/a/32035217
def printmd(string):
    display(Markdown(string))

Load Validation Data

validation_data = load_validation_data('validation.csv')
X_val = validation_data.loc[:,['reorder_prob','recency_prob']].as_matrix()
y_val = validation_data.loc[:,'reordered'].as_matrix()

Load Random Forest Model

from sklearn.metrics import f1_score
from sklearn.metrics import accuracy_score
from sklearn.ensemble import RandomForestClassifier
from sklearn.externals import joblib

clf = joblib.load('randomforest-all-training-data.pkl')
printmd("Baseline accuracy if predicting all zeros: ** {0:.2f}% **".format((1 - (np.sum(y_val) / len(y_val)))*100))

Baseline accuracy if predicting all zeros: ** 90.18% **

If the model predicted just zeros, the F1-meaure would be 0 since the True Positive (TP) rate would be zero.

y_hat_probs = clf.predict_proba(X_val)
y_hat = (y_hat_probs[:,1] > 0.7).astype(int)
#print(y_hat)
print("F1: {0:.4f} - Acc: {1:.4f}".format(f1_score(y_val, y_hat), accuracy_score(y_val, y_hat)))
[Parallel(n_jobs=2)]: Done   1 tasks      | elapsed:    0.4s
[Parallel(n_jobs=2)]: Done   4 tasks      | elapsed:    0.8s
[Parallel(n_jobs=2)]: Done   9 tasks      | elapsed:    1.7s
[Parallel(n_jobs=2)]: Done  14 tasks      | elapsed:    2.3s
[Parallel(n_jobs=2)]: Done  21 tasks      | elapsed:    3.5s
[Parallel(n_jobs=2)]: Done  28 tasks      | elapsed:    4.5s
[Parallel(n_jobs=2)]: Done  37 tasks      | elapsed:    6.0s
[Parallel(n_jobs=2)]: Done  46 tasks      | elapsed:    7.2s
[Parallel(n_jobs=2)]: Done  57 tasks      | elapsed:    9.0s
[Parallel(n_jobs=2)]: Done  68 tasks      | elapsed:   10.6s
[Parallel(n_jobs=2)]: Done  81 tasks      | elapsed:   12.8s
[Parallel(n_jobs=2)]: Done  94 tasks      | elapsed:   14.8s
[Parallel(n_jobs=2)]: Done 109 tasks      | elapsed:   17.1s
[Parallel(n_jobs=2)]: Done 124 tasks      | elapsed:   19.4s
[Parallel(n_jobs=2)]: Done 140 out of 140 | elapsed:   23.7s finished


F1: 0.4125 - Acc: 0.8659