Problem set 1

Strassen Algorithm

Let $C = AB$, where $A$ and $B$ are squared matrices of the same size. Direct computation of $C$ requires $\mathcal{O}(n^3)$ arithmetic operations. Fortunately, this complexity can be reduced even for arbitrary matrices $A$ and $B$. The following approach which has $\mathcal{O}(n^{\log_2 7})$ is called Strassen algorithm. Its idea is based on the fact that elements of $2\times 2$ matrix $$ \begin{bmatrix} c_{11} & c_{12} \\ c_{21} & c_{22} \end{bmatrix} = \begin{bmatrix} a_{11} & a_{12} \\ a_{21} & a_{22} \end{bmatrix} \begin{bmatrix} b_{11} & b_{12} \\ b_{21} & b_{22} \end{bmatrix} $$ can be computed using only 7 multiplications: \begin{equation}\begin{split} c_{11} &= f_1 + f_4 - f_5 + f_7, \\ c_{12} &= f_3 + f_5, \\ c_{21} &= f_2 + f_4, \\ c_{22} &= f_1 - f_2 + f_3 + f_6, \end{split}\end{equation} where \begin{equation}\begin{split} f_1 &= (a_{11} + a_{22}) (b_{11} + b_{22}),\\ f_2 &= (a_{21} + a_{22}) b_{11},\\ f_3 &= a_{11} (b_{12} - b_{22}),\\ f_4 &= a_{22} (b_{21} - b_{11}),\\ f_5 &= (a_{11} + a_{12}) b_{22},\\ f_6 &= (a_{21} - a_{11}) (b_{11} + b_{12}),\\ f_7 &= (a_{12} - a_{22}) (b_{21} + b_{22}). \end{split}\end{equation}

Formulas above hold for the case when $a_{ij}, b_{ij}, c_{ij}$ ($i$ and $j=1,2$) are blocks. Therefore, spliting matrices $A$ and $B$ of abitrary sizes into 4 blocks and applying described procedure recursively for blocks one will get $\mathcal{O}(n^{\log_2 7})$ complexity.

Tasks

  • (4 pts) Prove that Strassen alogorithm has $\mathcal{O}(n^{\log_2 7})$ complexity
  • (4 pts) Implement Strassen algorithm in Python. Note: for simplicity consider that $n$ is a power of 2
  • (3 pts) Compare the result with direct matrix-by-matrix multiplication and $\verb|numpy.dot|$ procedure by ploting timings as a function of $n$. Note: use logarithmic scale

In [ ]:

Fast Fourier Transform

Let $y = Ax$ (matvec operation), where $A \in \mathbb{C}^{m\times n}$ and $x \in \mathbb{C}^{n\times 1}$. Direct computation of $y$ requires $\mathcal{O}(n^2)$. Since $A$ contains $n^2$ elements, this complexity can not be reduced for an arbitrary matrix $A$. There are certain classes of matrices for which matvec requires less operations. For instance, sparse, Toeplitz, lowrank, etc. Another important example of structured matrix which arises in a huge amount of applications (signal and image processing, fast PDE solvers) is Fourier matrix $$ F_n = \{ \omega^{kl} \}_{k,l=0}^{n-1}, \quad \text{where} \quad \omega = e^{-\frac{2\pi i}{n}}. $$ Matvec operation with Fourier matrix is called discrete Fourier transform (DFT) and has $\mathcal{O}(n \log n)$ complexity. The simplest way to get this complexity is to spilt odd and even rows in Fourier matrix: \begin{equation} P_n F_n = \begin{bmatrix} F_{n/2} & F_{n/2} \\ F_{n/2} W_{n/2} & -F_{n/2} W_{n/2} \end{bmatrix}, \quad (1) \end{equation} where $P_n$ is a permutaion matrix which permutes odd and even rows, and $W_{n/2}=\text{diag}(1,\omega,\omega^2,\dots,\omega^{n/2-1})$. Thus, multiplication by $F_n$ is reduced to several multiplications by $F_{n/2}$ and linear operations such as multiplication by the diagonal matrix $W_{n/2}$. Continuing this procedure recursively for $F_{n/2}$ we will get $\mathcal{O}(n \log n)$ operations.

Tasks

  • (4 pts) Prove expression (1)
  • (4 pts) Implement the described fft algorithm in Python. Note: for simplicity consider that $n$ is a power of 2
  • (3 pts) Compare the result with $\verb|numpy.dot|$ and $\verb|numpy.fft.fft|$ procedures by ploting timings as a function of $n$. Note: use logarithmic scale

In [ ]: