Efficient Convolution

$\mathbf{x}\in \mathbb{R}^n$ and $\mathbf{w}\in \mathbb{R}^m$

  • Convolution $$y[i] = \sum_k x[i-k] w[k]$$
  • Output size: $m+n-1$
  • Order of complexity: $\mathbf{O}(mn)$
$$\left[y_1\\y_2..y_{m+n-1}\begin{array}{c}\end{array}\right] = \left[\begin{array}{cccc}w_1&\end{array}\right] \left[\begin{array}{c}\end{array}\right]$$
  • Linear and circular convolution are equivalent when $N\ge m + n -1$
    • We are performing a N-point DFT: so assuming the signal is repeating with period of N
      • compute the convolution of this repeating signal $X$, with another repeating signal $W$
      • Original signal size: $n$ ($N = n$)
      • Non-zero elements: $m$
      • W is filled with $N - m$ zero elements
      • right at the edge, in linear convolution, signal is filled with zeros, but in circular convolution, some part is wrong.
    • But if $N = n + m -1$, then
      • signal has $N - n$ zerofs filled
      • and at the edge, the same thing happens as in linear convolution

  • Aliasing: when $N < m + n -1$

In [ ]:


In [ ]:


In [ ]:

Long Singnal, Short Filter

Overlap Add (OA)

  • Break it up to $L$ blocks
  • perform the convolution
  • Combine the outputs of each block

    • In this case, DFT size is $N = L + m - 1$
    • Much smaller, and therefore more efficient
    • Since blocks are independent, it can also be performed in parallel

Overlap Save (OS)

  • Add a few zero in front
  • Process blocks of size $L$
  • DFT size is $N = L$ (same as block size)

    • as a result, we will have aliasing
  • Combine the result of each block

    • discard $m-1$ eleents due to aliasing

Other tricks

  • DFT of a signal is Hermitian Symmetric

    • Assume we are computing a $N$ point DFT

      • k=0 : DC value
      • k=1 : ..
      • ..
      • k = N-1 : ..

      $$Y[k] = Y[N-k]^* \ \ \ (\text{complex conjugate of }Y)$$

    • Advantages:

      • We only need to compute half of the values ($\floor{\frac{N}{2}} + 1$) , not all
      • We do not need to store the other half
    • If we have a 2d signal: $N\times N$

      • We need to compute the lower triangular part of the $N\times N$ matrix $N\left(\floor{\frac{N}{2}}+1\right)$
  • Complex multiplcation: needs 3 real multiplication (as opposed to a naive version which needs 4) $$(a + jb) (c+jd) = ac - bd + j(ad + bc)\\=u_c v_a + u_av_c + j(u_a v_c - u_b v_b)$$

    where $u_a = a, v_a = c, u_b = ..$

Winograd Convolution

Minimal filtering algorithm

  • Is good for only small filters
  • Not using DFT

  • $m$ outputs with $r$-tap filter, the optimal solution needs $F(m,r)=m+r-1$ multiplcations

    • Original convolution, needs $(n+m-1)\times m$
    • input size $n$, filter size $m$, $\Longrightarrow$ outputsize: $n+m-1$
    • For each output, we need $m$ multiplcation
    • so the original method needs $(n+m-1)\times m$ multiplcations
    • with this setting, the Winograd method needs $(n+m-1) + (m-1) = n+2m-2$ multiplcations
  • Example: output size $m=2$ and $3\times 3$ filter size $r=3$

    $$F(2,3) = \left[\begin{array}{ccc}d_0 & d_1 & d_2\\d_1 & d_2 & d_3\end{array}\right] \left[\begin{array}{c}g_0\\g_1\\g_2\end{array}\right] = \left[\begin{array}{c}m_1+m_2+m_3\\m_2-m_3-m_4\end{array}\right]$$

$$\begin{array}{ccc}m_1 = ()g_0 & & m_2=(d_1+d_2)\frac{}{2}\\m_4=..& & m_3=\end{array}$$


In [ ]: