Thanks for the great explanation. I am working with my professor to apply a CNN to a gesture recognition problem. In this post, for simplicity I assume, you mentioned doing the backpropogation with an input that was only 2 dimensions (1 layer deep instead of multiple) as well as a filter that was only 2 dimensions (1 layer deep). However, in our code, as forward propagate, your inputs to the convolution become increasingly deep, and your filters likewise grow in the z (depth) dimension. I understand the whole idea of backpropogating with the flipped filter, however, how do you work with differing depths during backprop? If I have an input that is 10 deep, and I convolve it with a filter that is 10 deep, I get out a 1 deep feature map. So when I backprop the error to that 1 deep feature map, I can’t just convolve that 1 deep feature map delta with the 10 deep filter that I used in the forward propogation. Or can I? Any help in this area or pointing towards help would be greatly appreciated. Thank you.

LikeLike

]]>LikeLike

]]>LikeLike

]]>LikeLike

]]>LikeLike

]]>1. I do not see any “z_{x,y}^l=w_{x,y}^l*\sigma(z_{x,y}^l)+b_{x,y}^l”. However, there is an eq. “z_{x,y}^(l+1)=w_{x,y}^l*\sigma(z_{x,y}^l)+b_{x,y}^(l+1)” so it seems legit.

2. I tried to be consistent with Michael Nielsen as much as possible. While he has z_j and b_j I have two-dimensional case so I have z_{x,y} and b_{x,y}. Of course, you can use matrix notation without any x,y positions.

3. In this blog post I simplified a problem, because we do not have here many feature maps at the same level – only one. ^l defines index of layer.

LikeLike

]]>You use the notation a_{x,y}^l for the activation of some filter at some (x,y) in an image. Why would you use the same notation for bias and weight terms? They don’t change based on positions in an input image; they ‘belong’ to the filter. Also, I assume that in the feedforward step, compute z_{x,y}^l for each layer means compute z_{x,y}^l for each pixel in each layer. Is that correct?

LikeLike

]]>Second: This is literally the best article on conv nets I have ever read. Thank you.

LikeLike

]]>If the original convolution (during forward pass) was applied with a stride > 1, does that mean that during the back propagation phase the reverse convolution (by the rotated kernel) also needs to be applied with the same stride?

It seems that that should be true, given this equation (shown above using math symbols):

delta(layer,x,y) = [ delta(layer+1,x,y) conv ROT180(kernel(layer +1)) ] * activationDerivative(output(layer,x,y)) <<< output matrix size depends on stride

Thanks a lot!

LikeLike

]]>