# Question repository

A list of open questions and possibly ambiguous stuff encountered throughout the material.

TODO: Tag exam-related ones appropriately, to differentiate them from (exclusively) curiosity-related ones.

Note: An alternative design would consist of adding a questions section to every notebook, tagging it appropriately using IPython metadata, and then using something like a Python/shell script to print all open questions in a centralized way. However...

## 2. Approximate retrieval

• Why perform first step of hashing if we only have a small number of features (e.g. 100)? If many features, why not just do a PCA first?
• might be because we want the shingle representation (0s and 1s) for the nice properties that Jaccard similarity offers us

## 3. Classification

• When transitioning from the first SVM formulation (with slack variables), to the second one aren't we loosening any constraint by fixing $\xi$?
• (tentative) It seems we're not, since we're taking multiple cases into consideration and merging them together into a single formulation using max.
• Slide 04:18: Is the first (primal) SVM formulation a (ii)-type one (since it has a minimization and its constraint as separate equation), or is it not eligible for this categorization?
• Slide 06:15: How do we go from step 1 to 2? Isn't the $\lambda \| w \|_2^2$ term outside the sum?
• yes it is, but the sum has a convenient $\frac{1}{T}$ in front of it, so we're safe to add the regularization term into the sum.
• Why do some SVM OCP implementations always regularize, even when the model was not updated at that stage.

## 4. Non-linear classification

• How exactly is the Lagrangian dual reformulation step (SVMs) different from the first time we reformulated the SVM problem statement to get rid of the slack variables?
• it's different because we changed the objective! We no longer have $\min_w$ or $\min_{w, \xi}$, it's now a maximization of the Lagrance coefficients: $\max_\alpha$; it's not a reformulation, but an equivalent problem

## 5. Active learning

• When doing active learning based on uncertainty sampling, how exactly do we know when we can safely infer some labels?

## 6. Clustering

• Homework 5 solution, 2.2: Why is:
$$\operatorname{Var}_{\hat{x}_i \sim q}\left [ \frac{1}{m} \sum_{i=1}^{m} \frac{d(\hat{x}_i; \mu)}{q(\hat{x}_i)} \right ] = \frac{1}{m^2} \sum_{i=1}^m \operatorname{Var}_{x_i \sim q} \left[ \frac{d(x_i; \mu)}{q(x_i)} \right]$$
• And why doe we still have the $i$ subscript in the variance formulation? Can't we just write $x \tilde{} q$?
• Have to discuss this with friends!

## 8. Exam-specific (and/or for review session on Jan 20)

• Exam 2014 Problem 6 (Submodular functions)
• solved by Syd in The Notes. Yay!


In [ ]: