Stefan Fiott

Instacart - Model Evaluation

First published: 28 Aug 2017
Last updated: 28 Aug 2017


In this notebook we will load the Random Forest model fitted on all the training data and evaluate its F1-measure performance on the validation set, using a probability threshold of 0.7 as determined in the Instacart Model Fitting notebook.

The validation set was extracted by splitting the Instacart training data as available in the Instacart Market Basket Analysis dataset provided by Kaggle.

Support Functions

In [1]:
from IPython.display import Markdown, display
%matplotlib inline
import pandas as pd
import numpy as np
In [2]:
# Functions to load datasets into memory using space efficient data types.

def load_validation_data(path):
    training = pd.read_csv(path, dtype={'user_id': np.uint32,
                                        'product_id': np.uint32,
                                        'prod_prob': np.float16,
                                        'aisle_prob': np.float16,
                                        'department_prob': np.float16,
                                        'reorder_prob': np.float16,
                                        'recency_prob': np.float16,
                                        'DTOP_prob': np.float16,
                                        'reordered': np.uint8},

    return training

# Function to generate markdown output
# Ref:
def printmd(string):

Load Validation Data

In [3]:
validation_data = load_validation_data('validation.csv')
In [5]:
X_val = validation_data.loc[:,['reorder_prob','recency_prob']].as_matrix()
y_val = validation_data.loc[:,'reordered'].as_matrix()

Load Random Forest Model

In [4]:
from sklearn.metrics import f1_score
from sklearn.metrics import accuracy_score
from sklearn.ensemble import RandomForestClassifier
from sklearn.externals import joblib

clf = joblib.load('randomforest-all-training-data.pkl')
In [7]:
printmd("Baseline accuracy if predicting all zeros: ** {0:.2f}% **".format((1 - (np.sum(y_val) / len(y_val)))*100))

Baseline accuracy if predicting all zeros: 90.18%

If the model predicted just zeros, the F1-meaure would be 0 since the True Positive (TP) rate would be zero.

In [9]:
y_hat_probs = clf.predict_proba(X_val)
y_hat = (y_hat_probs[:,1] > 0.7).astype(int)
print("F1: {0:.4f} - Acc: {1:.4f}".format(f1_score(y_val, y_hat), accuracy_score(y_val, y_hat)))
[Parallel(n_jobs=2)]: Done   1 tasks      | elapsed:    0.4s
[Parallel(n_jobs=2)]: Done   4 tasks      | elapsed:    0.8s
[Parallel(n_jobs=2)]: Done   9 tasks      | elapsed:    1.7s
[Parallel(n_jobs=2)]: Done  14 tasks      | elapsed:    2.3s
[Parallel(n_jobs=2)]: Done  21 tasks      | elapsed:    3.5s
[Parallel(n_jobs=2)]: Done  28 tasks      | elapsed:    4.5s
[Parallel(n_jobs=2)]: Done  37 tasks      | elapsed:    6.0s
[Parallel(n_jobs=2)]: Done  46 tasks      | elapsed:    7.2s
[Parallel(n_jobs=2)]: Done  57 tasks      | elapsed:    9.0s
[Parallel(n_jobs=2)]: Done  68 tasks      | elapsed:   10.6s
[Parallel(n_jobs=2)]: Done  81 tasks      | elapsed:   12.8s
[Parallel(n_jobs=2)]: Done  94 tasks      | elapsed:   14.8s
[Parallel(n_jobs=2)]: Done 109 tasks      | elapsed:   17.1s
[Parallel(n_jobs=2)]: Done 124 tasks      | elapsed:   19.4s
[Parallel(n_jobs=2)]: Done 140 out of 140 | elapsed:   23.7s finished
F1: 0.4125 - Acc: 0.8659