In [51]:
%%javascript
/**********************************************************************************************
Known Mathjax Issue with Chrome - a rounding issue adds a border to the right of mathjax markup
https://github.com/mathjax/MathJax/issues/1300
A quick hack to fix this based on stackoverflow discussions:
http://stackoverflow.com/questions/34277967/chrome-rendering-mathjax-equations-with-a-trailing-vertical-line
**********************************************************************************************/
$('.math>span').css("border-left-color","transparent")
In [52]:
%reload_ext autoreload
%autoreload 2
Course Lead: Dr James G. Shanahan (email Jimi via James.Shanahan AT gmail.com)
Name: Your Name Goes Here
Class: MIDS w261 (Section Your Section Goes Here, e.g., Summer 2016 Group 1)
Email: Your UC Berkeley Email Goes Here@iSchool.Berkeley.edu
Week: 11
HW Problems
HW11.0. Broadcast versus Caching in Spark
HW11.1. HW11.2 Gradient descent
HW11.2. Gradient descent
HW11.3. Logistic Regression
HW11.4. SVMs
HW11.5. OPTIONAL Distributed Perceptron algorithm
HW11.6. OPTIONAL Evalution of perceptron algorihtms on PennTreeBank POS corpus
HW11.7. OPTIONAL Kernal Adatron
HW11.8. OPTIONAL Create an animation of gradient descent for the Perceptron
MIDS UC Berkeley, Machine Learning at Scale DATSCIW261 ASSIGNMENT #11
Version 2016-07-27 (FINAL)
=== INSTRUCTIONS for SUBMISSIONS === Follow the instructions for submissions carefully.
https://docs.google.com/forms/d/1ZOr9RnIe_A06AcZDB6K1mJN4vrLeSmS2PD6Xm3eOiis/viewform?usp=send_form
=== IMPORTANT ===
TYPE-2 Fun option: Submit HW11 using a Zeppelin notebook (See Live slides for install instructions)
TYPE-1.5 Fun option: Complete HW11.8 only (no need to complete the rest of the questions)
HW11 can be completed locally on your computer
HW11.0
Q: What is the difference between broadcasting and caching data in Spark? Give an example (in the context of machine learning) of each mechanism (at a highlevel). Feel free to cut and paste code examples from the lectures to support your answer.
Q: __Review the following Spark-notebook-based implementation of KMeans and use the broadcast pattern to make this implementation more efficient. Please describe your changes in English first, implement, comment your code and highlight your changes:
Notebook https://www.dropbox.com/s/41q9lgyqhy8ed5g/EM-Kmeans.ipynb?dl=0
Notebook via NBViewer http://nbviewer.ipython.org/urls/dl.dropbox.com/s/41q9lgyqhy8ed5g/EM-Kmeans.ipynb
HW11.1 Loss Functions
Back to Table of Contents
In the context of binary classification problems, does the linear SVM learning algorithm yield the same result as a L2 penalized logistic regesssion learning algorithm?
In your reponse, please discuss the loss functions, and the learnt models, and separating surfaces between the two classes.
In the context of binary classification problems, does the linear SVM learning algorithm yield the same result as a perceptron learning algorithm?
[OPTIONAL]: generate an artifical binary classification dataset with 2 input features and plot the learnt separating surface for both a linear SVM and for logistic regression. Comment on the learnt surfaces. Please feel free to do this in Python (no need to use Spark).
In [53]:
## Code goes here
In [54]:
## Drivers & Runners
In [55]:
## Run Scripts, S3 Sync
HW11.2 Gradient descent
Back to Table of Contents
In the context of logistic regression describe and define three flavors of penalized loss functions. Are these all supported in Spark MLLib (include online references to support your answers)?
Descibe probabilitic interpretations of the L1 and L2 priors for penalized logistic regression (HINT: see synchronous slides for week 11 for details)
In [56]:
## Code goes here
In [57]:
## Drivers & Runners
In [58]:
## Run Scripts, S3 Sync
HW11.3 Logistic Regression
Back to Table of Contents
Generate 2 sets of linearly separable data with 100 data points each using the data generation code provided below and plot each in separate plots. Call one the training set and the other the testing set.
def generateData(n):
"""
generates a 2D linearly separable dataset with n samples.
The third element of the sample is the label
"""
xb = (rand(n)*2-1)/2-0.5
yb = (rand(n)*2-1)/2+0.5
xr = (rand(n)*2-1)/2+0.5
yr = (rand(n)*2-1)/2-0.5
inputs = []
for i in range(len(xb)):
inputs.append([xb[i],yb[i],1])
inputs.append([xr[i],yr[i],-1])
return inputs
Modify this data generation code to generating non-linearly separable training and testing datasets (with approximately 10% of the data falling on the wrong side of the separating hyperplane. Plot the resulting datasets.
NOTE: For the remainder of this problem please use the non-linearly separable training and testing datasets.
Using MLLib train up a LASSO logistic regression model with the training dataset and evaluate with the testing set. What a good number of iterations for training the logistic regression model? Justify with plots and words.
Derive and implement in Spark a weighted LASSO logistic regression. Implement a convergence test of your choice to check for termination within your training algorithm .
Weight the above training dataset as follows: Weight each example using the inverse vector length (Euclidean norm):
weight(X)= 1/||X||,
where ||X|| = SQRT(X.X)= SQRT(X1^2 + X2^2)
Here X is vector made up of X1 and X2.
Evaluate your homegrown weighted LASSO logistic regression on the test dataset. Report misclassification error (1 - Accuracy) and how many iterations does it took to converge.
Does Spark MLLib have a weighted LASSO logistic regression implementation. If so use it and report your findings on the weighted training set and test set.
In [59]:
## Code goes here
In [60]:
## Drivers & Runners
In [61]:
## Run Scripts, S3 Sync
HW11.4 SVMs
Back to Table of Contents
Use the non-linearly separable training and testing datasets from HW11.3 in this problem.
Using MLLib train up a soft SVM model with the training dataset and evaluate with the testing set. What is a good number of iterations for training the SVM model? Justify with plots and words.
HW11.4.1 [Optional] Derive and Implement in Spark a weighted hard linear svm classification learning algorithm. Feel free to use the following notebook as a starting point
Evaluate your homegrown weighted linear svm classification learning algorithm on the weighted training dataset and test dataset from HW11.3 (linearly separable dataset). Report misclassification error (1 - Accuracy) and how many iterations does it took to converge? How many support vectors do you end up with?
Does Spark MLLib have a weighted soft SVM learner. If so use it and report your findings on the weighted training set and test set.
HW11.4.2 [Optional] Repeat HW11.4.2 using a soft SVM and a nonlinearly separable datasets. Compare the error rates that you get here with the error rates you achieve using MLLib's soft SVM. Report the number of support vectors in both cases (may not be available the MLLib implementation).
In [65]:
## Code goes here
In [66]:
## Drivers & Runners
In [67]:
## Run Scripts, S3 Sync
HW11.5 [OPTIONAL] Distributed Perceptron algorithm.
Back to Table of Contents
Using the following papers as background: http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en//pubs/archive/36266.pdf
https://www.dropbox.com/s/a5pdcp0r8ptudgj/gesmundo-tomeh-eacl-2012.pdf?dl=0
http://www.slideshare.net/matsubaray/distributed-perceptron
Implement each of the following flavors of perceptron learning algorithm:
In [71]:
## Code goes here
In [72]:
## Drivers & Runners
In [73]:
## Run Scripts, S3 Sync
HW11.6 [OPTIONAL: consider doing this in a group] Evalution of perceptron algorihtms on PennTreeBank POS corpus
Back to Table of Contents
Reproduce the experiments reported in the following paper:
Prediction with MapReduce - Andrea Gesmundo and Nadi Tomeh
http://www.aclweb.org/anthology/E12-2020
These experiments focus on the prediction accuracy on a part-of-speech (POS) task using the PennTreeBank corpus. They use sections 0-18 of the Wall Street Journal for training, and sections 22-24 for testing.
HW11.7 [OPTIONAL: consider doing this in a group] Kernal Adatron
Back to Table of Contents
Implement the Kernal Adatron in Spark (contact Jimi for details)
HW11.8 [OPTIONAL] Create an animation of gradient descent for the Perceptron learning or for the logistic regression
Back to Table of Contents
Learning with the following 3 training examples. Present the progress in terms of the 2 dimensional input space in terms of a contour plot and also in terms of the 3D surface plot. See Live slides for an example. Back to Table of Contents Here is a sample training dataset that can be used: -2, 3, +1 -1, -1, -1 2, -3, 1
Please feel free to use
I am happy for folks to collaborate on HW11.8 also.
It would be great to get the 3D surface and contours lines (with solution region and label normalized data) all in the same graph
In [77]:
## Code goes here
In [78]:
## Drivers & Runners
In [79]:
## Run Scripts, S3 Sync
In [ ]: