Concrete Compressive Strength Regression Model Using Keras

10 minute read

In this notebook, we will build a simple three-layer feed-forward neural network model using Keras, running on top of TensorFlow. The sequential model will be trained using the Concrete Compressive Strength Data Set to learn to predict the compressive strength of concrete samples based on the material used to make them.

Loading the Required Libraries

import pandas as pd
from pathlib import Path
from sklearn import model_selection
from sklearn import preprocessing
import matplotlib.pyplot as plt
from keras import models, layers, metrics

# ensure same network weights at initialisation between notebook runs
import numpy as np
np.random.seed(22) 
Using TensorFlow backend.

Loading the Concrete Data Set

concrete_data_file = Path('concrete_data.csv')
if concrete_data_file.is_file():
    print('Reading concrete_data.csv...')
    concrete_data = pd.read_csv('concrete_data.csv')
    print('Done.')
else:
    print('Downloading concrete data CSV file...')
    # Original Source: http://archive.ics.uci.edu/ml/datasets/concrete+compressive+strength
    root_path = 'https://www.stefanfiott.com/machine-learning/'
    folder = 'concrete-compressive-strength-regression-model-using-keras/'
    concrete_data = pd.read_csv(root_path + folder + 'concrete_data.csv')
    print('Done.')
    print('Saving to concrete_data.csv file...')
    concrete_data.to_csv('concrete_data.csv', index=False)
    print('Done.')
    
print(concrete_data.head())
Reading concrete_data.csv...
Done.
   Cement  Blast Furnace Slag  Fly Ash  Water  Superplasticizer  \
0   540.0                 0.0      0.0  162.0               2.5   
1   540.0                 0.0      0.0  162.0               2.5   
2   332.5               142.5      0.0  228.0               0.0   
3   332.5               142.5      0.0  228.0               0.0   
4   198.6               132.4      0.0  192.0               0.0   

   Coarse Aggregate  Fine Aggregate  Age  Strength  
0            1040.0           676.0   28     79.99  
1            1055.0           676.0   28     61.89  
2             932.0           594.0  270     40.27  
3             932.0           594.0  365     41.05  
4             978.4           825.5  360     44.30  

Preparing the Training and Testing Data Sets

We will now separate the data into independent (predictor) variables and dependent (outcome) variables.

predictors = concrete_data.iloc[:,0:8].values
outcomes = concrete_data.iloc[:,8].values

Next, we scale the predictor variables to reduce variance, so all values are in the [0,1] interval. This should help the model train faster and better.

min_max_scaler = preprocessing.MinMaxScaler()
predictors_scaled = min_max_scaler.fit_transform(predictors)
predictors_scaled[:5,]
array([[1.        , 0.        , 0.        , 0.32108626, 0.07763975,
        0.69476744, 0.20572002, 0.07417582],
       [1.        , 0.        , 0.        , 0.32108626, 0.07763975,
        0.73837209, 0.20572002, 0.07417582],
       [0.52625571, 0.39649416, 0.        , 0.84824281, 0.        ,
        0.38081395, 0.        , 0.73901099],
       [0.52625571, 0.39649416, 0.        , 0.84824281, 0.        ,
        0.38081395, 0.        , 1.        ],
       [0.22054795, 0.36839176, 0.        , 0.56070288, 0.        ,
        0.51569767, 0.58078274, 0.98626374]])

Finally, we split the data into a training set and a testing set. These will in turn be used to train and evaluate the regression model respectively.

X_train, X_test, y_train, y_test = model_selection.train_test_split(predictors, outcomes, test_size=0.33, random_state=22)
print('X_train {0}, y_train {1}'.format(X_train.shape, y_train.shape))
print('X_test {0}, y_test {1}'.format(X_test.shape, y_test.shape))
X_train (690, 8), y_train (690,)
X_test (340, 8), y_test (340,)

Network Architecture

Three fully connected (dense) layers, with the first and second layer using ReLU for activation and the third (last/output) layer is a linear combination of its inputs.

network = models.Sequential()
network.add(layers.Dense(10, activation='relu', input_shape=(X_train.shape[1], )))
network.add(layers.Dense(5, activation='relu'))
network.add(layers.Dense(1))
network.compile(optimizer='adam',
                loss='mean_squared_error')

Training and Testing the Network

We will now fit the model to the training data using 50 epochs and a batch size of 128. Then we evaluate the performance of the trained network against the testing data. We will draw a scatter plot of predicted values against actual values. The better the predictions the tighter together the plot will be and in a perfect model the points would line-up along the minor diagonal, i.e., a line from bottom left to top right. To observe how the model starts fitting the data better with more training, we will repeat this process 10 times, each time plotting a new predictions scatter plot.

By the end of the training the model will have gone through 500 epochs.

fig, axes = plt.subplots(2, 5, figsize=(16,8), sharex=True, sharey=True)
losses = []
for i in range(2):
    for j in range(5):
        network.fit(X_train, y_train, epochs=50, batch_size=128, verbose=0);
        pred_loss = network.evaluate(X_test, y_test, verbose=0)
        losses.append(pred_loss)
        preds = network.predict(X_test)
        axes[i,j].scatter(preds, y_test, alpha=0.2)
        axes[i,j].set_title('{0} epochs'.format((5*i+j+1)*50))
        axes[i,j].set_ylabel('Actual')
        axes[i,j].set_xlabel('Predicted')

png

fig, ax = plt.subplots(1, 1, figsize=(10, 5))
ax.plot(losses)
ax.set_title('Concrete Compressive Strength Regression Model Loss')
epochs = [str(i*50) for i in range(1, len(losses)+1)]
ax.set_xticks(range(len(losses)))
ax.set_xticklabels(epochs)
ax.set_xlabel('Epochs')
ax.set_ylabel('Mean Squared Error')
ax.text(len(losses)-2, losses[len(losses)-1]+10, 'Min MSE: {0:.2f}'.format(losses[len(losses)-1]));

png

Comparing Performance to Ordinary Least Squares (OLS) Regression Model

We will now model the same training data using an ordinary least squares (OLS) regression model using Scikit-learn and then evaluate it on the same testing data set we used before.

from sklearn import linear_model
from sklearn.metrics import mean_squared_error

regr = linear_model.LinearRegression()
regr.fit(X_train, y_train)
ols_y_pred = regr.predict(X_test)

fig, ax = plt.subplots(1, 1, figsize=(3,4))
ax.scatter(ols_y_pred, y_test, alpha=0.2)
ax.set_title('OLS Model'.format((5*i+j+1)*50))
ax.set_ylabel('Actual')
ax.set_xlabel('Predicted')

print("Mean squared error: {0:.2f}".format(mean_squared_error(y_test, ols_y_pred)))
Mean squared error: 105.37

png

Comparing Performance to Elastic Net Model

We will now model the same training data using an Elastic Net model using Scikit-learn. Elastic net uses both lasso (L1) and ridge regression (L2), resulting in a sparser model while still penalizing large coefficients. Finally, we evaluate it on the same testing data set we used with the other models.

from sklearn.linear_model import ElasticNetCV

regr = ElasticNetCV(cv=5)
regr.fit(X_train, y_train) 

elastic_y_pred = regr.predict(X_test)

fig, ax = plt.subplots(1, 1, figsize=(3,4))
ax.scatter(elastic_y_pred, y_test, alpha=0.2)
ax.set_title('Elastic Net Model'.format((5*i+j+1)*50))
ax.set_ylabel('Actual')
ax.set_xlabel('Predicted')

print("Mean squared error: {0:.2f}".format(mean_squared_error(y_test, elastic_y_pred)))
Mean squared error: 105.63

png

Comparing Performance to Support Vector Regression (SVR) Model

We will now model the same training data using a support vector regression model using Scikit-learn. First, we will tune the model to find optimal hyper-parameters using grid search and cross-validation. Finally, we evaluate the fitted model on the same testing data set we used so far.

from sklearn.svm import SVR
from sklearn.model_selection import GridSearchCV

svr = GridSearchCV(SVR(kernel='rbf', gamma=0.1), 
                   cv=5,
                   param_grid={'C': [1e0, 1e1, 1e2, 1e3],
                               'gamma': np.logspace(-2, 2, 5),
                               'epsilon': np.arange(0.1, 0.5, 0.1)})
svr.fit(X_train, y_train)
svr_y_pred = svr.best_estimator_.predict(X_test)
fig, ax = plt.subplots(1, 1, figsize=(3,4))
ax.scatter(svr_y_pred, y_test, alpha=0.2)
ax.set_title('SVR Model'.format((5*i+j+1)*50))
ax.set_ylabel('Actual')
ax.set_xlabel('Predicted')

print("Mean squared error: {0:.2f}".format(mean_squared_error(y_test, svr_y_pred)))
Mean squared error: 190.75

png

Conclusion

In this notebook we built various models to predict the compressive strength in megapascals (MPa) of concrete depending on the concrete mixture properties. We first created a simple three-layer dense sequential network, achieving a mean-squared error of 49.62. Next, we built an ordinary least squares (OLS) model, an Elastic Net model and finally an SVR model, achieving a mean-squared error of 105.37, 105.63 and 190.75 respectively.

Without any tuning the simple neural network achieves the best mean-squared error. The other models, even after tuning, cannot get below a mean-squared error of 100. Clearly, this is a highly non-linear problem. Therefore, the neural network’s non-linearity and three-layers are better at transforming the data into a representation that can be mapped more accurately to the compressive strength.