Convolutions are performed between filters (known as convolutional kernels) and input data. Filters (or kernels) and input data can be represented as matrices.
First, the kernel hovers over the input data matrix to obtain a matrix with the same shape as the kernel. After that, a Hadamard product between the kernel and the matrix got from the input data is applied to obtain a new product matrix. Then, all elements in the product matrix are summed with a bias to obtain an output number. Continuing to slide the kernel over the input data matrix from left to right and then from top to bottom, the matrix Hadamard production and summation is repeated to obtain more output numbers. Finnaly, all output numbers are combined into a new output matrix.
graph LR subgraph layer1 [Input Layer] i1((i1)) i2((i2)) i3((i3)) end subgraph layer2 [Convolution Layer] direction TB b1 f1((f1)) f2((f2)) b2 end layer1 ~~~ layer2 b1 --> f1 r[R] --> i1 i1 --> |W11| f1 --> output1 i1 --> |W12| f2 g[G] --> i2 --> |W22| f2 --> output2 i2 --> |W21| f1 b[B] --> i3 --> |W31| f1 i3 --> |W32| f2 b2 --> f2
Input data R, G, B are 5x5 matrices.
R: G: B: [[1,0,1,0,1], [[0,1,2,3,4], [[3,3,3,3,3], [0,1,0,1,0], [5,6,7,8,9], [3,2,2,2,3], [1,0,1,0,1], [9,8,7,6,5], [3,2,1,2,3], [0,1,0,1,0], [0,1,2,3,4], [3,2,2,2,3], [1,0,1,0,1]] [5,6,7,8,9]] [3,3,3,3,3]]
Filters (or convolutional kernels) are 3x3 matrices
W11=W21=W31: W21=W22=W32: [[1,1,0], [[0,0,1], [1,1,0], [0,0,2], [0,0,2]] [0,1,2]]
Then To get output data matrices
output1 = (R conv W11) + (G conv W21) + (B conv W31) + b1 output2 = (R conv W12) + (G conv W22) + (B conv W32) + b2
Solve it by using torch.nn.functional.conv2d
import torch import torch.nn.functional as F R = [[1,0,1,0,1], [0,1,0,1,0], [1,0,1,0,1], [0,1,0,1,0], [1,0,1,0,1]] G = [[0,1,2,3,4], [5,6,7,8,9], [9,8,7,6,5], [0,1,2,3,4], [5,6,7,8,9]] B = [[3,3,3,3,3], [3,2,2,2,3], [3,2,1,2,3], [3,2,2,2,3], [3,3,3,3,3]] W11 = W21 = W31 = [[1,1,0], [1,1,0], [0,0,2]] W12 = W22 = W32 = [[0,0,1], [0,0,2], [0,1,2]] input = torch.tensor([[R, G, B]]) weight = torch.tensor([[W11, W21, W31],[W12, W22, W32]]) bias = torch.tensor([1,2]) F.conv2d(input, weight, groups=1, stride=1, bias=bias) # output # tensor([[[[44, 45, 51], # [49, 50, 52], # [53, 50, 54]], # # [[54, 55, 60], # [41, 45, 52], # [50, 55, 62]]]])
Calculate the output data shape
- W: input size
- K: Filter (kernel) size
- S: Stride
- P: padding