CNNs Convolution

Jul 14, 2024

Convolutions are performed between filters (known as convolutional kernels) and input data. Filters (or kernels) and input data can be represented as matrices.

First, the kernel hovers over the input data matrix to obtain a matrix with the same shape as the kernel. After that, a Hadamard product between the kernel and the matrix got from the input data is applied to obtain a new product matrix. Then, all elements in the product matrix are summed with a bias to obtain an output number. Continuing to slide the kernel over the input data matrix from left to right and then from top to bottom, the matrix Hadamard production and summation is repeated to obtain more output numbers. Finnaly, all output numbers are combined into a new output matrix.

Example

graph LR
  subgraph layer1 [Input Layer]
    i1((i1))
    i2((i2))
    i3((i3))
  end
  subgraph layer2 [Convolution Layer]
    direction TB
    b1
    f1((f1))
    f2((f2))
    b2
  end
  layer1 ~~~ layer2
  b1 --> f1
  r[R] --> i1
  i1 --> |W11| f1 --> output1
  i1 --> |W12| f2
  g[G] --> i2 --> |W22| f2 --> output2
  i2 --> |W21| f1
  b[B] --> i3 --> |W31| f1
  i3 --> |W32| f2
  b2 --> f2

(Flowcharts - Basic Syntax)

Input data R, G, B are 5x5 matrices.

R:             G:             B:
[[1,0,1,0,1],  [[0,1,2,3,4],  [[3,3,3,3,3],
 [0,1,0,1,0],   [5,6,7,8,9],   [3,2,2,2,3],
 [1,0,1,0,1],   [9,8,7,6,5],   [3,2,1,2,3],
 [0,1,0,1,0],   [0,1,2,3,4],   [3,2,2,2,3],
 [1,0,1,0,1]]   [5,6,7,8,9]]   [3,3,3,3,3]]

Filters (or convolutional kernels) are 3x3 matrices

W11=W21=W31:    W21=W22=W32:
[[1,1,0],       [[0,0,1],
 [1,1,0],        [0,0,2],
 [0,0,2]]        [0,1,2]]

Bias b1=1 and b2=2

Then To get output data matrices

output1 = (R conv W11) + (G conv W21) + (B conv W31) + b1
output2 = (R conv W12) + (G conv W22) + (B conv W32) + b2

Solve it by using torch.nn.functional.conv2d

import torch
import torch.nn.functional as F

R = [[1,0,1,0,1],
 [0,1,0,1,0],
 [1,0,1,0,1],
 [0,1,0,1,0],
 [1,0,1,0,1]]
G = [[0,1,2,3,4],
 [5,6,7,8,9],
 [9,8,7,6,5],
 [0,1,2,3,4],
 [5,6,7,8,9]]
B = [[3,3,3,3,3],
 [3,2,2,2,3],
 [3,2,1,2,3],
 [3,2,2,2,3],
 [3,3,3,3,3]]

W11 = W21 = W31 = [[1,1,0],
 [1,1,0],
 [0,0,2]]
W12 = W22 = W32 = [[0,0,1],
 [0,0,2],
 [0,1,2]]

input = torch.tensor([[R, G, B]])
weight = torch.tensor([[W11, W21, W31],[W12, W22, W32]])
bias = torch.tensor([1,2])
F.conv2d(input, weight, groups=1, stride=1, bias=bias)

# output
# tensor([[[[44, 45, 51],
#           [49, 50, 52],
#           [53, 50, 54]],
# 
#          [[54, 55, 60],
#           [41, 45, 52],
#           [50, 55, 62]]]])

Calculate the output data shape

[(W-K+2P)/S]+1

W: input size
K: Filter (kernel) size
S: Stride
P: padding