Question repository

A list of open questions and possibly ambiguous stuff encountered throughout the material.

TODO: Tag exam-related ones appropriately, to differentiate them from (exclusively) curiosity-related ones.

Note: An alternative design would consist of adding a questions section to every notebook, tagging it appropriately using IPython metadata, and then using something like a Python/shell script to print all open questions in a centralized way. However...

2. Approximate retrieval

Why perform first step of hashing if we only have a small number of features (e.g. 100)? If many features, why not just do a PCA first?
- might be because we want the shingle representation (0s and 1s) for the nice properties that Jaccard similarity offers us

3. Classification

When transitioning from the first SVM formulation (with slack variables), to the second one aren't we loosening any constraint by fixing $\xi$?
- (tentative) It seems we're not, since we're taking multiple cases into consideration and merging them together into a single formulation using max.
Slide 04:18: Is the first (primal) SVM formulation a (ii)-type one (since it has a minimization and its constraint as separate equation), or is it not eligible for this categorization?
Slide 06:15: How do we go from step 1 to 2? Isn't the $\lambda \| w \|_2^2$ term outside the sum?
- yes it is, but the sum has a convenient $\frac{1}{T}$ in front of it, so we're safe to add the regularization term into the sum.

Why do some SVM OCP implementations always regularize, even when the model was not updated at that stage.

4. Non-linear classification

How exactly is the Lagrangian dual reformulation step (SVMs) different from the first time we reformulated the SVM problem statement to get rid of the slack variables?
- it's different because we changed the objective! We no longer have $\min_w$ or $\min_{w, \xi}$, it's now a maximization of the Lagrance coefficients: $\max_\alpha$; it's not a reformulation, but an equivalent problem

5. Active learning

When doing active learning based on uncertainty sampling, how exactly do we know when we can safely infer some labels?

6. Clustering

Homework 5 solution, 2.2: Why is:

\begin{equation} \operatorname{Var}_{\hat{x}_i \sim q}\left [ \frac{1}{m} \sum_{i=1}^{m} \frac{d(\hat{x}_i; \mu)}{q(\hat{x}_i)} \right ] = \frac{1}{m^2} \sum_{i=1}^m \operatorname{Var}_{x_i \sim q} \left[ \frac{d(x_i; \mu)}{q(x_i)} \right] \end{equation}

And why doe we still have the $i$ subscript in the variance formulation? Can't we just write $x \tilde{} q$?
Have to discuss this with friends!

7. Bandits

8. Exam-specific (and/or for review session on Jan 20)

Exam 2014 Problem 6 (Submodular functions)
- solved by Syd in The Notes. Yay!



In [ ]: