Convolutional Autoencoder for Dummies

Each day, I become a bigger fan of Lasagne. Recently, after seeing some cool stuff with a Variational Autoencoder trained on Blade Runner, I have tried to implement a much simpler Convolutional Autoencoder, trained on a lot simpler dataset – mnist. The task turned out to be a really easy one, thanks to two existing in Lasagne layers: Deconv2DLayer and Upscale2DLayer . My Convolution Autoencoder consists of two stages:

  1. Coding consists of convolutions and maxpoolings
  2. Decoding consists of upscalings and deconvolutions.
Outline of Convolutional Autoencoder

Some thought experiment, that must be processed to realize how easy it is, is to realize that deconvolutions are just convolutions! What is more, if somebody read my post Convolutional Neural Networks backpropagation: from intuition to derivation then he or she saw this concept in the backpropagation phase!

Citing myself (I feel really embarrassed now for this didactic tone …):

Yeah, it is a bit different convolution than in previous (forward) case. There we did so called valid convolution, while here we do a full convolution (more about nomenclature here). What is more, we rotate our kernel by 180 degrees. But still, we are talking about convolution!

The Upscale operation seems to be obvious, so I think there is no magic now and we can go into code. As you can see, the Autoencoder is a very symmetric beast. I tried to show it in the snippet:

def build_convolutional_autoencoder(input_var=None):

    l_in = lasagne.layers.InputLayer(shape=(None, 1, 28, 28),

    auto_conv1A = lasagne.layers.Conv2DLayer(
            l_in, num_filters=32, filter_size=(5, 5),

    auto_maxpool1A = lasagne.layers.MaxPool2DLayer(auto_conv1A, pool_size=(2, 2))

    auto_conv2A = lasagne.layers.Conv2DLayer(
            auto_maxpool1A, num_filters=32, filter_size=(5, 5),
    auto_maxpool2A = lasagne.layers.MaxPool2DLayer(auto_conv2A, pool_size=(2, 2))

    auto_dense1A = lasagne.layers.FlattenLayer(auto_maxpool2A) # 512 neurons

    auto_dense2 = lasagne.layers.DenseLayer(
            lasagne.layers.dropout(auto_dense1A, p=.5),

    auto_dense1B = lasagne.layers.DenseLayer(
            lasagne.layers.dropout(auto_dense2, p=.5),

    auto_dense1B = lasagne.layers.ReshapeLayer(auto_dense1B, ([0],32, 4, 4))

    auto_maxpool2B = lasagne.layers.Upscale2DLayer(auto_dense1B, 2) # 2 - scale factor

    auto_conv2B = lasagne.layers.Deconv2DLayer(auto_maxpool2B, auto_conv2A.input_shape[1], auto_conv2A.filter_size, stride=auto_conv2A.stride, crop=auto_conv2A.pad, W=auto_conv2A.W, flip_filters=not auto_conv2A.flip_filters)

    auto_maxpool1B = lasagne.layers.Upscale2DLayer(auto_conv2B, 2) # 2 - scale factor

    auto_conv1B = lasagne.layers.Deconv2DLayer(auto_maxpool1B, auto_conv1A.input_shape[1], auto_conv1A.filter_size, stride=auto_conv1A.stride, crop=auto_conv1A.pad, W=auto_conv1A.W, flip_filters=not auto_conv1A.flip_filters)

    return auto_conv1B


And here some reconstructions (right pictures):






Hope, you like it. I know, it is not a Variational Autoencoder (work in progress), it is not even a Denoising Autoencoder, but the results seem to be quite fine. What’s more, it took only 75 minutes on laptop to get them 🙂