Title: Maxout Networks
Authors: Ian J. Goodfellow, David Warde-Farley, Mehdi Mirza, Aaron Courville, Yoshua Bengio
Link: https://arxiv.org/abs/1302.4389
Quick summary:
Maxout is an activation function that takes the maximum value of a bunch of neurons. In one sense, one could think as dropout being similar since dropout will discard some neurons and will pass forward others whereas maxout will only pass the maximum value of some of them. In essence, maxout is like max pooling since it reduces the dimensionality leaving only the maximum values.
It is well explained in the following post: http://www.simon-hohberg.de/blog/2015-07-19-maxout
Goodfellow PhD’s defence (talking about maxout): https://www.youtube.com/watch?v=ckoD_bE8Bhs&t=28m
Nowadays it is also implemented in tf.contrib.layers.maxout but here is a very simple implementation:
shape = inputs.get_shape().as_list()
if axis is None:
# Assume that channel is the last dimension
axis = -1
num_channels = shape[axis]
if num_channels % num_units:
raise ValueError('number of features({}) is not a multiple of num_units({})'
.format(num_channels, num_units))
shape[axis] = -1
shape += [num_channels // num_units]
outputs = tf.reduce_max(tf.reshape(inputs, shape), -1, keep_dims=False)
return outputs