Debugging a Keras Neural Network

Learning outcomes:

  • How to get the weights and bias values of the layers.
  • How to get the values between the hidden layers (before and after the activation function)

The goal of this post is to learn how to debug a neural network in Keras. This is extremely important due to a variety of reasons.

  1. Knowing how to debug increases the understanding of the underlying structure of the network and its theoretical background.
  2. Learning what’s going on at each level of the network translates into a better understanding of the outcome.
  3. Knowing about each layer’s outcome can be valuable for research purposes.
  4. Meticulous analyses and splits of the network allow us to easily replace and experiment with some parts of it.

Obtaining general information

Obtaining general information can give us an overview of the model to check whether its components are the ones we initially planned to add. We can simply print the layers of the model or retrieve a more human-friendly summary. Note that the layer of the neural network (input, hidden, output) are not the same as the layers of the Keras model. Our model’s layers are more abstract operations such that transformations, convolutions, activations, etc.

print(model.layers)

Output:

[<keras.layers.convolutional.Conv2D at 0x7faf0c4c9c90>,
 <keras.layers.convolutional.Conv2D at 0x7faf0c4de050>,
 <keras.layers.pooling.MaxPooling2D at 0x7faf0c46bc10>,
 <keras.layers.core.Flatten at 0x7faf0c4de450>,
 <keras.layers.core.Dense at 0x7faf0c46b690>,
 <keras.layers.core.Dense at 0x7faf0e3cf710>]
print(model.summary())

Output:

_________________________________________________________________
Layer (type)                 Output Shape              Param #  
=================================================================
conv2d_1 (Conv2D)            (None, 26, 26, 32)        320      
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 24, 24, 64)        18496    
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 12, 12, 64)        0        
_________________________________________________________________
flatten_1 (Flatten)          (None, 9216)              0        
_________________________________________________________________
dense_1 (Dense)              (None, 128)               1179776  
_________________________________________________________________
dense_2 (Dense)              (None, 10)                1290      
=================================================================
Total params: 1,199,882
Trainable params: 1,199,882
Non-trainable params: 0
_________________________________________________________________

We can also retrieve each layer’s input and output size.

for layer in model.layers:
    print("Input shape: "+str(layer.input_shape)+". Output shape: "+str(layer.output_shape))

Output:

Input shape: (None, 28, 28, 1). Output shape: (None, 26, 26, 32)
Input shape: (None, 26, 26, 32). Output shape: (None, 24, 24, 64)
Input shape: (None, 24, 24, 64). Output shape: (None, 12, 12, 64)
Input shape: (None, 12, 12, 64). Output shape: (None, 9216)
Input shape: (None, 9216). Output shape: (None, 128)
Input shape: (None, 128). Output shape: (None, 10)

Obtaining the output of a specific layer after its activation function

This model is a modified example from the original Keras repository.

model = Sequential()
model.add(Conv2D(32, kernel_size=(3, 3),
                 activation='relu',
                 input_shape=input_shape,
         kernel_initializer=keras.initializers.Ones()))

model.add(Conv2D(64, (3, 3), activation='relu', kernel_initializer=keras.initializers.Ones()))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Flatten())
model.add(Dense(128, activation='relu', kernel_initializer=keras.initializers.Ones()))
model.add(Dense(num_classes, activation='softmax', kernel_initializer=keras.initializers.Ones()))

model.compile(loss=keras.losses.categorical_crossentropy,
              optimizer=keras.optimizers.Adam(),
        metrics=['accuracy'])

This model consist of 6 layers which, as we can see in the code, include special information in the parameters. It’s important to note that the activation function used in the layers is specified within the layer because alternatively we could just add another layer after the first convolution specifying the activation function.

We can imagine our model as a tunnel in which each layer is a different part of the tunnel. In order to obtain the output of a specific layer we need to parcellate a subtunnel. As we are interested in the output of the first convolutional layer after the activation function, our subtunnel will be bounded from the input of the first layer to the output of the first layer (which includes the activation funcion because it was specified in the code). We will use the function “function” to create this subtunnel specifying its beginning and end.

from keras import backend as K
fun = K.function([model.layers[0].input],[model.layers[0].output])

After that we simply have to accommodate the input and pass it to that function.

x_inp = np.reshape(x,(1,28,28,1))
layer_output = fun([x_inp])[0]

In the Source code section, the script called debugging1.py shows how subtunnels were created from the beginning to each layer of the network. In addition, it shows an alternative way to obtain the same results providing a good understand of what’s going on in the network and both outcomes are compared to check that they are the same.

Obtaining the output of a specific layer before its activation function

The only difference with regards to the previous section is that this time the model needs to be modified to have its activation functions separated from the layers, as we can see below.

model = Sequential()
model.add(Conv2D(32, kernel_size=(3, 3),
                 input_shape=input_shape,
         kernel_initializer=keras.initializers.Ones()))
model.add(Activation("sigmoid"))

model.add(Conv2D(64, (3, 3), activation='relu', kernel_initializer=keras.initializers.Ones()))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Flatten())
model.add(Dense(128, kernel_initializer=keras.initializers.Ones()))
model.add(Activation("sigmoid"))
model.add(Dense(num_classes, kernel_initializer=keras.initializers.Ones()))
model.add(Activation("softmax"))

model.compile(loss=keras.losses.categorical_crossentropy,
              optimizer=keras.optimizers.Adam(),
        metrics=['accuracy'])

Obtaining the output values is done in a similar way to the previous section. Here we show that obtaining the values before and after the activation it is a matter of changing the output layer.

# With and without the activation
fun_without = K.function([model.layers[0].input],[model.layers[0].output])
fun_with = K.function([model.layers[0].input],[model.layers[1].output])
# Input
x_inp = np.reshape(x,(1,28,28,1))
# Output
layer_output_without = fun_without([x_inp])[0]
layer_output_with = fun_with([x_inp])[0]

In the Source code section, the script called debugging2.py shows this, and as in debugging1.py it also recreates the solution in an alternative way and compare both results.

What if during the training and testing behaviors are different?

Extracted from the Keras website:

Note that if your model has a different behavior in training and testing phase (e.g. if it uses Dropout, BatchNormalization, etc.), you will need to pass the learning phase flag to your function:

get_3rd_layer_output = K.function([model.layers[0].input, K.learning_phase()],
                                  [model.layers[3].output])

# output in test mode = 0
layer_output = get_3rd_layer_output([x, 0])[0]

# output in train mode = 1
layer_output = get_3rd_layer_output([x, 1])[0]

Note how the now the created functor receives both the input and whether it’s training or testing.

Homography estimation explanation and python implementation

Homographies are transformations of images from one planar surface to another (image registration). Homographies are used for tasks such as camera calibrations, 3D reconstruction, image rectifications. There are multiple methods to calculate an homography and this post explains one of the simplest.

Given a point in a 3D space [latex]x=(x_1,y_1,1)[/latex] and a matrix H, the resulting multiplication will return the new location of that point [latex]x’ = (x_2,y_2,1)[/latex] such that:

[latex]x’ = Hx[/latex]

Due to the dimensions of [latex]x[/latex] and [latex]x'[/latex] we know that H will be a 3×3 matrix but even if there are 9 elements in the matrix, we will have just 8 degrees of freedom. In this Powerpoint presentation [1] we can intuitively see where does it come from.

So we have that:

[latex]
\begin{bmatrix}
u \\
v \\
1
\end{bmatrix}
=
\begin{bmatrix}
h_1 & h_2 & h_3 \\
h_4 & h_5 & h_6 \\
h_7 & h_8 & h_9
\end{bmatrix}
\begin{bmatrix}
x \\
y \\
1
\end{bmatrix}
[/latex]

Where [latex]u[/latex] and [latex]v[/latex] are the new coordinates. Therefore, we have:

[latex]
\\
u = x h_1 + y h_2 + h_3 \\
v = x h_4 + y h_5 + h_6 \\
1 = x h_7 + y h_8 + h_9 \\
[/latex]

So for each point we have:

[latex]
\\
x h_1 + y h_2 + h_3 – u (x h_7 + y h_8 + h_9) = 0 \\
x h_4 + y h_5 + h_6 – v (x h_7 + y h_8 + h_9) = 0 \\

A_i =
\begin{bmatrix}
x & y & 1 & 0 & 0 & 0 & -ux & -vy & -u \\
0 & 0 & 0 & x & y & 1 & -ux & -vy & -u
\end{bmatrix}
[/latex]

Since we have 8 degrees of freedom, we need at least 4 points to obtain H (each point contributes with two new variables to the formula, x and y). We just need to stack [latex]A_1, A_2, A_3, A_4[/latex] to have a 8×9 matrix that we will call A. We are interested in solving the following equation avoiding the trivial solution h=0

[latex]
Ah=0;
\begin{bmatrix}
x_1 & y_1 & 1 & 0 & 0 & 0 & -u_1 x_1 & -v_1 y_1 & -u_1 \\
0 & 0 & 0 & x_1 & y_1 & 1 & -u_1 x_1 & -v_1 y_1 & -u_1 \\
& & & & \cdots
\end{bmatrix}
\begin{bmatrix}
h1 \\ h2 \\ h3 \\ \vdots
\end{bmatrix}
= 0
[/latex]

We will solve this as a least squares problem using singular value decomposition (SVD)

Least squares and SVD

This method (explained very clearly in [2]) is used when we want to approximate a function given different observations. For instance, we have that:

[latex]
\\
c + d x_1 = y_1 \\
c + d x_2 = y_2 \\
c + d x_3 = y_3
[/latex]

If there were no errors those equations would be true, but since our measurements might have noise we want to minimize that errors by minimizing:

[latex]
(c + d x_1 – y_1)^2 + (c + d x_2 – y_2)^2 + (c + d x_3 – y_3)^2
[/latex]

In general, for systems like [latex]Ax=b[/latex] we want to minimize [latex]|| Ax-b ||^2[/latex]

We will use SVD in our matrix A:

[latex]
[U,S,V] = SVD(A)
[/latex]

V are the eigenvectors of [latex]A^T A[/latex]. The solution is therefore the last eigenvector because its eigenvalue (diagonal matrix D) will be zero or close to zero in case of noise. More intuitively, imagine that the largest eigenvectors will depict the largest variance across the data, and we are interested in minimizing, so the eigenvector should have a small eigenvalue.

When performing an homography, the resulting image will probably have different dimensions from the original one since it might be stretched, rotated, and so on. This will result in having many “empty pixels” that must be filled by performing an interpolation.

One of the nicest properties of the homography is that H has an inverse, which means that we can map all points back to the origin by multiplying them to the inverse of H. In order to fill an empty point we will multiply their coordinates by [latex]H^{-1}[/latex] to get the original coordinates, which will be floating point numbers. Those “original coordinates” must be interpolated (for instance, you can round them) to get the closest pixel (nearest neighbor) and put it in the empty pixel.

Example

In the following example the label of this small notebook will be placed horizontally. For this, the location of the 4 pixels corresponding to the four corners is used, and a new location drawing a rectangle is calculated as well. The red dots correspond to the points we want to transform and the green dots their target location.

or

This first approximation is obtained by calculating the new location of each pixel. However, it will leave plenty of empty pixels that can be interpolated after the inverse matrix of the homography transformation matrix is calculated.

_tmp_final

After the interpolation, the final result will not contain any empty pixel.

final

The code is provided in the Source code section.

References

1. https://courses.cs.washington.edu/courses/csep576/11sp/pdf/Transformations.pdf (Accessed on 8-8-2017)
2. http://www.sci.utah.edu/~gerig/CS6640-F2012/Materials/pseudoinverse-cis61009sl10.pdf (Accessed on 8-8-2017)