The below grading scheme was followed for the correction of the third assignment, on a total of 100 points plus 10 bonus points. You'll also find some comments and common mistakes.
Thanks for your work! It was quite good in general. Some did very well, and I read quite interesting comments throughout.
First, a general remark: use vectorized code (via numpy or pandas) instead of loops. It's much more efficient, because the loop is either written in optimized C (or Fortran) code, or the function is carried by the CPU's single instruction, multiple data (SIMD) unit (c.f. MMX, SSE, AVX instructions for x86 CPUs from Intel and AMD).
Below are some examples from your submissions:
err = np.sum(np.abs(labels - genres))
is better than err = len([1 for i in range(len(labels)) if labels[i] != genres[i]])
.np.mean(mfcc, axis=1)
is better than [np.mean(x) for x in mfcc]
.weights = np.exp(-distances**2 / kernel_width**2)
is better than
for i in range(0,2000):
for j in range(i,2000):
weights[i,j] = math.exp(-math.sqrt(distances[i,j])/math.sqrt(kernel_width))
If, for some reason, you cannot vectorize your code, consider using numba or Cython.
If you wrote any loop for your submission, please look at my solution for ways to avoid them. It's both faster and makes the code easier to understand.
int()
, to a string with str()
, etc.apply
or map
functions are handy..apply(get_genre)
instead of .apply(lambda x: get_genre(x))
. The anonymous function is useless if you don't alter the argument.W.sum(0)
will do.np.linalg.inv(D)
to inverse the degree matrix. The inverse of a diagonal matrix is straightforward and can be computed with np.diag(1 / np.sqrt(degrees))
.scipy.sparse
because they allow us to choose the number of eigenvectors to return. We use them because they implement efficient algorithms for partial eigendecomposition of sparse matrices.eigsh
with which='SM'
, which='SA'
, or sigma=0
are all correct approaches. See the [solution] for more information.