Debugging a Keras Neural Network

Learning outcomes:

  • How to get the weights and bias values of the layers.
  • How to get the values between the hidden layers (before and after the activation function)

The goal of this post is to learn how to debug a neural network in Keras. This is extremely important due to a variety of reasons.

  1. Knowing how to debug increases the understanding of the underlying structure of the network and its theoretical background.
  2. Learning what’s going on at each level of the network translates into a better understanding of the outcome.
  3. Knowing about each layer’s outcome can be valuable for research purposes.
  4. Meticulous analyses and splits of the network allow us to easily replace and experiment with some parts of it.

Obtaining general information

Obtaining general information can give us an overview of the model to check whether its components are the ones we initially planned to add. We can simply print the layers of the model or retrieve a more human-friendly summary. Note that the layer of the neural network (input, hidden, output) are not the same as the layers of the Keras model. Our model’s layers are more abstract operations such that transformations, convolutions, activations, etc.

print(model.layers)

Output:

[<keras.layers.convolutional.Conv2D at 0x7faf0c4c9c90>,
 <keras.layers.convolutional.Conv2D at 0x7faf0c4de050>,
 <keras.layers.pooling.MaxPooling2D at 0x7faf0c46bc10>,
 <keras.layers.core.Flatten at 0x7faf0c4de450>,
 <keras.layers.core.Dense at 0x7faf0c46b690>,
 <keras.layers.core.Dense at 0x7faf0e3cf710>]
print(model.summary())

Output:

_________________________________________________________________
Layer (type)                 Output Shape              Param #  
=================================================================
conv2d_1 (Conv2D)            (None, 26, 26, 32)        320      
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 24, 24, 64)        18496    
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 12, 12, 64)        0        
_________________________________________________________________
flatten_1 (Flatten)          (None, 9216)              0        
_________________________________________________________________
dense_1 (Dense)              (None, 128)               1179776  
_________________________________________________________________
dense_2 (Dense)              (None, 10)                1290      
=================================================================
Total params: 1,199,882
Trainable params: 1,199,882
Non-trainable params: 0
_________________________________________________________________

We can also retrieve each layer’s input and output size.

for layer in model.layers:
    print("Input shape: "+str(layer.input_shape)+". Output shape: "+str(layer.output_shape))

Output:

Input shape: (None, 28, 28, 1). Output shape: (None, 26, 26, 32)
Input shape: (None, 26, 26, 32). Output shape: (None, 24, 24, 64)
Input shape: (None, 24, 24, 64). Output shape: (None, 12, 12, 64)
Input shape: (None, 12, 12, 64). Output shape: (None, 9216)
Input shape: (None, 9216). Output shape: (None, 128)
Input shape: (None, 128). Output shape: (None, 10)

Obtaining the output of a specific layer after its activation function

This model is a modified example from the original Keras repository.

model = Sequential()
model.add(Conv2D(32, kernel_size=(3, 3),
                 activation='relu',
                 input_shape=input_shape,
         kernel_initializer=keras.initializers.Ones()))

model.add(Conv2D(64, (3, 3), activation='relu', kernel_initializer=keras.initializers.Ones()))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Flatten())
model.add(Dense(128, activation='relu', kernel_initializer=keras.initializers.Ones()))
model.add(Dense(num_classes, activation='softmax', kernel_initializer=keras.initializers.Ones()))

model.compile(loss=keras.losses.categorical_crossentropy,
              optimizer=keras.optimizers.Adam(),
        metrics=['accuracy'])

This model consist of 6 layers which, as we can see in the code, include special information in the parameters. It’s important to note that the activation function used in the layers is specified within the layer because alternatively we could just add another layer after the first convolution specifying the activation function.

We can imagine our model as a tunnel in which each layer is a different part of the tunnel. In order to obtain the output of a specific layer we need to parcellate a subtunnel. As we are interested in the output of the first convolutional layer after the activation function, our subtunnel will be bounded from the input of the first layer to the output of the first layer (which includes the activation funcion because it was specified in the code). We will use the function “function” to create this subtunnel specifying its beginning and end.

from keras import backend as K
fun = K.function([model.layers[0].input],[model.layers[0].output])

After that we simply have to accommodate the input and pass it to that function.

x_inp = np.reshape(x,(1,28,28,1))
layer_output = fun([x_inp])[0]

In the Source code section, the script called debugging1.py shows how subtunnels were created from the beginning to each layer of the network. In addition, it shows an alternative way to obtain the same results providing a good understand of what’s going on in the network and both outcomes are compared to check that they are the same.

Obtaining the output of a specific layer before its activation function

The only difference with regards to the previous section is that this time the model needs to be modified to have its activation functions separated from the layers, as we can see below.

model = Sequential()
model.add(Conv2D(32, kernel_size=(3, 3),
                 input_shape=input_shape,
         kernel_initializer=keras.initializers.Ones()))
model.add(Activation("sigmoid"))

model.add(Conv2D(64, (3, 3), activation='relu', kernel_initializer=keras.initializers.Ones()))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Flatten())
model.add(Dense(128, kernel_initializer=keras.initializers.Ones()))
model.add(Activation("sigmoid"))
model.add(Dense(num_classes, kernel_initializer=keras.initializers.Ones()))
model.add(Activation("softmax"))

model.compile(loss=keras.losses.categorical_crossentropy,
              optimizer=keras.optimizers.Adam(),
        metrics=['accuracy'])

Obtaining the output values is done in a similar way to the previous section. Here we show that obtaining the values before and after the activation it is a matter of changing the output layer.

# With and without the activation
fun_without = K.function([model.layers[0].input],[model.layers[0].output])
fun_with = K.function([model.layers[0].input],[model.layers[1].output])
# Input
x_inp = np.reshape(x,(1,28,28,1))
# Output
layer_output_without = fun_without([x_inp])[0]
layer_output_with = fun_with([x_inp])[0]

In the Source code section, the script called debugging2.py shows this, and as in debugging1.py it also recreates the solution in an alternative way and compare both results.

What if during the training and testing behaviors are different?

Extracted from the Keras website:

Note that if your model has a different behavior in training and testing phase (e.g. if it uses Dropout, BatchNormalization, etc.), you will need to pass the learning phase flag to your function:

get_3rd_layer_output = K.function([model.layers[0].input, K.learning_phase()],
                                  [model.layers[3].output])

# output in test mode = 0
layer_output = get_3rd_layer_output([x, 0])[0]

# output in train mode = 1
layer_output = get_3rd_layer_output([x, 1])[0]

Note how the now the created functor receives both the input and whether it’s training or testing.