Convolutional Autoencoder for Dummies

Each day, I become a bigger fan of Lasagne. Recently, after seeing some cool stuff with a Variational Autoencoder trained on Blade Runner, I have tried to implement a much simpler Convolutional Autoencoder, trained on a lot simpler dataset – mnist. The task turned out to be a really easy one, thanks to two existing in Lasagne layers: Deconv2DLayer and Upscale2DLayer . My Convolution Autoencoder consists of two stages:

  1. Coding consists of convolutions and maxpoolings
  2. Decoding consists of upscalings and deconvolutions.
Outline of Convolutional Autoencoder

Some thought experiment, that must be processed to realize how easy it is, is to realize that deconvolutions are just convolutions! What is more, if somebody read my post Convolutional Neural Networks backpropagation: from intuition to derivation then he or she saw this concept in the backpropagation phase!

Citing myself (I feel really embarrassed now for this didactic tone …):

Yeah, it is a bit different convolution than in previous (forward) case. There we did so called valid convolution, while here we do a full convolution (more about nomenclature here). What is more, we rotate our kernel by 180 degrees. But still, we are talking about convolution!

The Upscale operation seems to be obvious, so I think there is no magic now and we can go into code. As you can see, the Autoencoder is a very symmetric beast. I tried to show it in the snippet:

def build_convolutional_autoencoder(input_var=None):

    l_in = lasagne.layers.InputLayer(shape=(None, 1, 28, 28),

    auto_conv1A = lasagne.layers.Conv2DLayer(
            l_in, num_filters=32, filter_size=(5, 5),

    auto_maxpool1A = lasagne.layers.MaxPool2DLayer(auto_conv1A, pool_size=(2, 2))

    auto_conv2A = lasagne.layers.Conv2DLayer(
            auto_maxpool1A, num_filters=32, filter_size=(5, 5),
    auto_maxpool2A = lasagne.layers.MaxPool2DLayer(auto_conv2A, pool_size=(2, 2))

    auto_dense1A = lasagne.layers.FlattenLayer(auto_maxpool2A) # 512 neurons

    auto_dense2 = lasagne.layers.DenseLayer(
            lasagne.layers.dropout(auto_dense1A, p=.5),

    auto_dense1B = lasagne.layers.DenseLayer(
            lasagne.layers.dropout(auto_dense2, p=.5),

    auto_dense1B = lasagne.layers.ReshapeLayer(auto_dense1B, ([0],32, 4, 4))

    auto_maxpool2B = lasagne.layers.Upscale2DLayer(auto_dense1B, 2) # 2 - scale factor

    auto_conv2B = lasagne.layers.Deconv2DLayer(auto_maxpool2B, auto_conv2A.input_shape[1], auto_conv2A.filter_size, stride=auto_conv2A.stride, crop=auto_conv2A.pad, W=auto_conv2A.W, flip_filters=not auto_conv2A.flip_filters)

    auto_maxpool1B = lasagne.layers.Upscale2DLayer(auto_conv2B, 2) # 2 - scale factor

    auto_conv1B = lasagne.layers.Deconv2DLayer(auto_maxpool1B, auto_conv1A.input_shape[1], auto_conv1A.filter_size, stride=auto_conv1A.stride, crop=auto_conv1A.pad, W=auto_conv1A.W, flip_filters=not auto_conv1A.flip_filters)

    return auto_conv1B


And here some reconstructions (right pictures):






Hope, you like it. I know, it is not a Variational Autoencoder (work in progress), it is not even a Denoising Autoencoder, but the results seem to be quite fine. What’s more, it took only 75 minutes on laptop to get them 🙂


The simple example of Theano and Lasagne super power

I mentioned in my initial post “Deep Learning Frameworks Overview” that my choice of Deep Learning library is (at least for now)  Theano and Lasagne combination. However, I did not use, in all my post, some most important word: the experiment. So, let’s assume that you have some idea and want to test it quickly. For example, what if we add to a standard CNN (omitted maxpooling for a clarity):


some extra “convolutional branch”, that is concatenated with last but one layer:


This experiment is really easy to do in (based on Theano) Lasagne. I have just added a build_modified_cnn  method to a mnist example (bolded text refers to my “convolutional branch”, rest is the same as a standard build_cnn method):

def build_modified_cnn(input_var=None):
    l_in = lasagne.layers.InputLayer(shape=(None, 1, 28, 28),

    l_conv1 = lasagne.layers.Conv2DLayer(
        l_in, num_filters=32, filter_size=(5, 5),

    l_conv1A = lasagne.layers.Conv2DLayer(
        l_in, num_filters=32, filter_size=(10, 10),

    l_maxpool1A = lasagne.layers.MaxPool2DLayer(l_conv1A, 
        pool_size=(5, 5))

    l_dense1A = lasagne.layers.FlattenLayer(l_maxpool1A)

    l_maxpool1 = lasagne.layers.MaxPool2DLayer(l_conv1, 
        pool_size=(2, 2))

    l_conv2 = lasagne.layers.Conv2DLayer(
        l_maxpool1, num_filters=32, filter_size=(5, 5),
    l_maxpool2 = lasagne.layers.MaxPool2DLayer(l_conv2, 
        pool_size=(2, 2))

    l_dense1 = lasagne.layers.DenseLayer(
        lasagne.layers.dropout(l_maxpool2, p=.5),

    l_concat = lasagne.layers.ConcatLayer([l_dense1, 

    l_dense2 = lasagne.layers.DenseLayer(
        lasagne.layers.dropout(l_concat, p=.5),

    return l_dense2

We see here some standard layers such as Conv2DLayer and MaxPool2DLayer and less standard however self-explanatory ones: FlattenLayer and ConcatLayer. Some results of the test on mnist dataset:

Type Epoch Accuracy Time
Standard CNN 1 56.75 % 6.767s
Standard CNN 30 96.76 % 6.923s
Modified CNN 1 72.51 % 9.552s
Modified CNN 30 96.03 % 9.592s

Not this time. My modification made a significant progress, if we look only at first epoch. However, in general, it learns slower than a standard one. But to see it, I needed only a few minutes of coding – and this is a true power of Theano and Theano-based libs!