Efficient Convolution

$\mathbf{x}\in \mathbb{R}^n$ and $\mathbf{w}\in \mathbb{R}^m$

Convolution $$y[i] = \sum_k x[i-k] w[k]$$

Output size: $m+n-1$
Order of complexity: $\mathbf{O}(mn)$

$$\left[y_1\\y_2..y_{m+n-1}\begin{array}{c}\end{array}\right] = \left[\begin{array}{cccc}w_1&\end{array}\right] \left[\begin{array}{c}\end{array}\right]$$

Linear and circular convolution are equivalent when $N\ge m + n -1$
- We are performing a N-point DFT: so assuming the signal is repeating with period of N
  - compute the convolution of this repeating signal $X$, with another repeating signal $W$
  - Original signal size: $n$ ($N = n$)
  - Non-zero elements: $m$
  - W is filled with $N - m$ zero elements
  - right at the edge, in linear convolution, signal is filled with zeros, but in circular convolution, some part is wrong.
- But if $N = n + m -1$, then
  - signal has $N - n$ zerofs filled
  - and at the edge, the same thing happens as in linear convolution

Aliasing: when $N < m + n -1$



In [ ]:



In [ ]:



In [ ]:

Long Singnal, Short Filter

Overlap Add (OA)

Break it up to $L$ blocks
perform the convolution
Combine the outputs of each block
- In this case, DFT size is $N = L + m - 1$
- Much smaller, and therefore more efficient
- Since blocks are independent, it can also be performed in parallel

Overlap Save (OS)

Add a few zero in front
Process blocks of size $L$
DFT size is $N = L$ (same as block size)
- as a result, we will have aliasing
Combine the result of each block
- discard $m-1$ eleents due to aliasing

Other tricks

DFT of a signal is Hermitian Symmetric
- Assume we are computing a $N$ point DFT
  - k=0 : DC value
  - k=1 : ..
  - ..
  - k = N-1 : ..
  $$Y[k] = Y[N-k]^* \ \ \ (\text{complex conjugate of }Y)$$
- Advantages:
  - We only need to compute half of the values ($\floor{\frac{N}{2}} + 1$) , not all
  - We do not need to store the other half
- If we have a 2d signal: $N\times N$
  - We need to compute the lower triangular part of the $N\times N$ matrix $N\left(\floor{\frac{N}{2}}+1\right)$
Complex multiplcation: needs 3 real multiplication (as opposed to a naive version which needs 4) $$(a + jb) (c+jd) = ac - bd + j(ad + bc)\\=u_c v_a + u_av_c + j(u_a v_c - u_b v_b)$$

where $u_a = a, v_a = c, u_b = ..$

Winograd Convolution

Minimal filtering algorithm

Is good for only small filters
Not using DFT
$m$ outputs with $r$-tap filter, the optimal solution needs $F(m,r)=m+r-1$ multiplcations
- Original convolution, needs $(n+m-1)\times m$
- input size $n$, filter size $m$, $\Longrightarrow$ outputsize: $n+m-1$
- For each output, we need $m$ multiplcation
- so the original method needs $(n+m-1)\times m$ multiplcations
- with this setting, the Winograd method needs $(n+m-1) + (m-1) = n+2m-2$ multiplcations
Example: output size $m=2$ and $3\times 3$ filter size $r=3$

$$F(2,3) = \left[\begin{array}{ccc}d_0 & d_1 & d_2\\d_1 & d_2 & d_3\end{array}\right] \left[\begin{array}{c}g_0\\g_1\\g_2\end{array}\right] = \left[\begin{array}{c}m_1+m_2+m_3\\m_2-m_3-m_4\end{array}\right]$$

$$\begin{array}{ccc}m_1 = ()g_0 & & m_2=(d_1+d_2)\frac{}{2}\\m_4=..& & m_3=\end{array}$$



In [ ]: