In [8]:

    
nA = 3
nB = 5
import numpy as np
from matplotlib import pylab as plt
%matplotlib inline



In [13]:

    
A = np.random.randn(100,nA)
B = np.random.randn(100,nB)



In [14]:

    
covA = A.T.dot(A)/np.float(len(A))
covB = B.T.dot(B)/np.float(len(B))



In [62]:

    
C = np.random.randn(100,nA)
D = np.random.randn(100,nB)
covC = C.T.dot(C)/np.float(len(C))
covD = D.T.dot(D)/np.float(len(D))



In [63]:

    
plt.imshow((np.kron(np.array([[2,0.5],[0.5,1]]),(covD))), interpolation='nearest', cmap=plt.get_cmap('gray'),vmin=-0, vmax=2)
plt.axis('off');



In [34]:

    
plt.imshow(((covD)), interpolation='nearest', cmap=plt.get_cmap('gray'),vmin=-0, vmax=2)
plt.axis('off');



In [35]:

    
plt.imshow((np.kron(np.array([[2,0.5],[0.5,1]]),(covD))), interpolation='nearest', cmap=plt.get_cmap('gray'),vmin=-0, vmax=2)
plt.axis('off');



In [ ]:



In [215]:

    
dw = np.zeros((100, A.shape[1], B.shape[1]))
for i in range(len(A)):
    dw[i] = A[i].reshape(-1, 1).dot(B[i].reshape(1, -1))



In [216]:

    
C = dw.reshape(100, A.shape[1] * B.shape[1])
cov = (C.T).dot(C)/100.



In [217]:

    
B[1]









    Out[217]:





array([-0.85689131, -0.21136736,  0.26149773,  0.91899861,  0.51308311])



In [218]:

    
plt.imshow(B[1].reshape(1,-1), interpolation='nearest', cmap=plt.get_cmap('gray'),vmin=-3, vmax=3)
plt.axis('off');



In [219]:

    
plt.imshow(A[1].reshape(-1,1), interpolation='nearest', cmap=plt.get_cmap('gray'),vmin=-3, vmax=3)
plt.axis('off');



In [220]:

    
plt.imshow(dw[1], interpolation='nearest', cmap=plt.get_cmap('gray'),vmin=-3, vmax=3)
plt.axis('off');



In [221]:

    
plt.imshow(cov, interpolation='nearest', cmap=plt.get_cmap('gray'),vmin=-0.5, vmax=2)
plt.axis('off');



In [227]:

    
cov[:5,:5]









    Out[227]:





array([[ 1.10210971,  0.01166719, -0.01992309, -0.14738296, -0.06147655],
       [ 0.01166719,  1.06828835,  0.21011534,  0.00222657,  0.05147025],
       [-0.01992309,  0.21011534,  1.28568938, -0.01604122,  0.03993652],
       [-0.14738296,  0.00222657, -0.01604122,  1.16503805,  0.23498166],
       [-0.06147655,  0.05147025,  0.03993652,  0.23498166,  0.88355273]])



In [226]:

    
kcov[:5,:5]









    Out[226]:





array([[ 0.87546366,  0.0346847 ,  0.08157241, -0.14938068, -0.03037467],
       [ 0.0346847 ,  0.85001178,  0.13108533,  0.07074102,  0.00921527],
       [ 0.08157241,  0.13108533,  1.10315624, -0.16093695,  0.01351656],
       [-0.14938068,  0.07074102, -0.16093695,  1.02271925,  0.06881449],
       [-0.03037467,  0.00921527,  0.01351656,  0.06881449,  1.02163011]])



In [224]:

    
kcov = np.kron(covA, covB)



In [222]:

    
plt.imshow(np.kron(covA, covB), interpolation='nearest', cmap=plt.get_cmap('gray'),vmin=-0.5, vmax=2)
plt.axis('off');



In [223]:

    
plt.imshow(np.diag(np.diag(np.kron(covA, covB))), interpolation='nearest', cmap=plt.get_cmap('gray'),vmin=-0.5, vmax=2)
plt.axis('off');



In [212]:

    
plt.imshow(((covB)), interpolation='nearest', cmap=plt.get_cmap('gray'),vmin=-0, vmax=2)
plt.axis('off');



In [213]:

    
plt.imshow(((covA)), interpolation='nearest', cmap=plt.get_cmap('gray'),vmin=-0, vmax=2)
plt.axis('off');



In [ ]:



In [ ]:



In [ ]:



In [2]:

    
from matplotlib import pylab as plt
%matplotlib inline
import numpy as np



In [ ]:



In [9]:

    
plt.plot(np.linspace(-10, 10),0.5*(1+np.tanh(0.5*np.linspace(-10, 10))))









    Out[9]:





[<matplotlib.lines.Line2D at 0x110a93450>]



In [ ]:

title: "ECE521 Midterm Examination" author: "University of Toronto Faculty of Applied Science and Engineering: Department of Electrical and Computer Engineering" date: "16 February 2017"

$$ $$

Name: __

$$ $$

Student Number: _ Section: $~$ L0101 $~$ or $~$ L0102 $~$ (circle one)

$$ $$$$ $$

Instructions:

Time allowed: 90 minutes
Answer all questions. Page xx has space for overflow
Any questions completed in pencil rather than pen may not be eligible to be remarked even if there was a marking error
Aids allowed: You are allowed to bring in one $8.5'' \times 11''$ aid sheet double-sided, and a non-programmable calculator

$$ $$

	Question	Value	Mark

	1	xx

	2	xx

	3	xx

	4	xx

	5	xx

	6	xx

	Total	xx

\begin{center}This test should have xx pages including this page \end{center}

Question 1: True or false

Circle one answer (2 marks each, 20 marks total):

True False When learning a logistic regression model using SGD, generally it's helpful to centre (zero-mean) the input features True False When using a kNN model based on squared $\ell_2$ distance, generally it's

           helpful to centre the input features

\vspace{10cm}

Question 2: Classification

[a] (3 marks) When faced with a multi-class classification problem, describe a disadvantage of using multiple binary classifiers instead of a single multi-class classifier.

(Jimmy: I didn't talk about balance, voting issues, etc in class. Did you? Again maybe we can mention this in the second half and then keep this question for the exam?)

\vspace{2 cm}

[b] (4 marks) In binary classification, suppose $t_m \in \{-1, 1\}$ instead of the usual $\{0, 1\}$. The cross-entropy loss function here is:

$$ \mathcal{L} = - \frac{1}{2}\sum_{m=1}^M \left\{ \log \left[ \text{sigm}(\mathbf{w}^T\mathbf{x}_m)\right] (t_m+1) + \log \left[ 1 - \text{sigm}(\mathbf{w}^T\mathbf{x}_m) \right] (t_m-1) \right\}$$

where $M$ is the number of observations of input-target pairs $(\mathbf{x}_m,t_m)$, and $\text{sigm}(\cdot)$ is the sigmoid function. Note that the first term is zero when $t_m=-1$ and the second term is zero when $t_m=1$. Show that we can rewrite the above loss function as:

$$ \mathcal{L} = \sum_{m=1}^M \log \left[ 1 + \exp(-t_m \mathbf{w}^T\mathbf{x}_m) \right].$$

\vspace{4 cm}

[c] (2 marks) Consider the two choices for $\mathcal{L}$ stated in part [b], Besides the obvious simplicity of the latter expression for $\mathcal{L}$, what other computational advantage does it have over the first $\mathcal{L}$ expression?

\vspace{2 cm}

[d] (3 marks) Explain the reject option of a classifier. Why is it useful?

\vspace{4 cm}

Question 3: Neural networks

[a] (2 marks) When training a large neural network, we commonly determine whether learning has converged by checking (circle all that apply):

\vspace{-.4cm}

\hspace{1cm} [i] $(y_{nk} - t_{nk})^2 \rightarrow 0$ \hspace{1cm} [ii] $\partial E_n(\mathbf{w}) / \partial w_{ji} \rightarrow 0$ \hspace{1cm} [iii] $\text{tanh}(a_j) \rightarrow 0$ \hspace{1cm} [iv] $\delta_j \rightarrow 0$

Here, $y_{nk}$ are model outputs, $t_{nk}$ are targets, $E_n$ is the error for the $n$th data point, $a_j$ is an activation function, and $\delta_j = \partial E_n / \partial a_j$.

(Jimmy: when did you cover this? I don't see slides or an assignment covering it for example. Maybe we could mention it in writing in the second half of the course and then save this question till the exam?) \vspace{1cm}

[b] (3 marks) When designing neural networks, a problem with the sigmoid function's derivative is that it is very small when its input $a \gg 0$ or $a \ll 0$. Suppose we decide to reparameterize a neural network using tanh$(a)$ instead of sigm$(a)$ as the activation functions. Calculate $dh/da$ for $$ h(a) = \text{tanh}(a) = \frac{e^a - e^{-a}}{e^a+e^{-a}}$$ and comment on whether it exhibits a similar problem to the sigmoid function. \vspace{4cm}

[c] (4 marks) Show that there exists a tanh$(a)$ activation function that outputs the same results as the sigmoid function. \vspace{4cm}

[d] Consider the binary threshold neuron, $y = \text{sgn}(\mathbf{w}^T\mathbf{x})$, with no bias $b$ or $w_0$. Consider the set of points $$\{x\} = \{(1, 0, 0), (0, 1, 0), (0, 0, 1), (1, 1, 1) \}$$

[i] Find a three-dimensional parameter vector $\mathbf{w}$ such that the neuron memorizes {t} = {1, 1, 1, 1}. \vspace{1cm}

[ii] Find a three-dimensional parameter vector $\mathbf{w}$ such that the neuron memorizes {t} = {1, 1, 0, 0}. \vspace{1cm}

[iii] Find an unrealizable labelling $\{t\}$. \vspace{1cm}