1 minute read

Recall convolution operation, where we add each element of the feature matrix to its local neighbours, weighted by the kernel. An illustration is at this webpage.

As one can immediately notice, as the kernel “slides” through the feature matrix, same feature matrix elements are repeatedly accessed, thus would cause frequent cache misses. Transforming image convolution to GEMM (General Matrix Multiply) alleviates this problem, while taking advantage of the linear algebra library, or the acceleration provided by the domain specific architecture.

Algorithm

The algorithm can be described with a single image below - (image source). The top sub-figure presents a traditional convolution operation, where both the input feature and the convolution kernel consists of 3 channels, while the bottom sub-figure presents the transformed matrix mulitplication operation. Below sections illustrates the algorithm with the image.

algo

Transform Input Feature to Input Matrix

The first step is transforming input feature to matrix.

Take the example of the left-most input channel of size 3⨉3. Each step as the kernel “slides” through the input channel, a 2⨉2 input sub-matrix is exposed to the kernel, and we flatten the input sub-matrix (in NZ data format) to a row vector of size 1⨉4. Concatenate the transformed row vectors as the input matrix of one channel, of size 4⨉4. For more than one channel, concatenate the individual input matrix by row, of size 4⨉12.

Transform Convolution Kernel to Kernel Matirx

The second step is transforming the convolution kernel to kernel matrix.

Take the example of the left-most convolution kernel of size 2⨉2. We flatten the convolution kernel (in NZ data format) to a of size 4⨉1. For more than one channel, concatenate the transformed column vectors as the kernel matrix of one kernel, of size 12⨉1. For more than one kernel, (here assuming 2 kernels) concatenate the individual kernel matrix by row, of size 12⨉2.

Output Matrix = Input Matrix * Kernel Matirx

Output matrix has a size of 4⨉2. To transform the output matrix back to output feature, reshape each row to size of output matrix 2⨉2 (in NZ data format).