In this assignment you are going to use Logistic Regression and Neural Networks. You are going to use digits dataset from digits recognition competition on kaggle. First task is to train a logistic regression model from scikit learn on the training dataset and then predict the labels of the given test dataset and submit it to kaggle. Then you are going to play around with the regularization parameter of logistic regression and see if it has any effect on your results. Later you are going to use neural networks from scikit learn and train it on the same dataset and use the trained model to predict the labels of the test dataset and submit the results to kaggle. You will need to report the results of neural networks as well. Lastly you will create some handwritten digits using a drawing software like MS paint or even write it on a paper and take a picture of it and see how good your trained model works on it.
The given images are grey scale and has digits written in white, make sure your generated digits are of the same format.
The dataset you are going to use in this assignment is called Digit Recognizer, available at kaggle. To download the dataset go to dataset data tab. Download 'train.csv', 'test.csv' and 'sample_submission.csv.gz' files. 'train.csv' is going to be used for training the model. 'test.csv' is used to test the model i.e. generalization. 'sample_submission.csv.gz' contain sample submission file that you need to generate to be submitted to kaggle.
Thare are some tutorials available at the dataset tutorial section which you can use as a starting point. Specially the A beginner’s approach to classification which uses scikit learn's SVM classifier. You can replace it with logistic regression and neural network. You can download the notebook by clicking fork notebook first and then download button.
In [ ]:
import cv2
img = cv2.imread('test.png',0)
resized_image = cv2.resize(img, (28, 28), interpolation = cv2.INTER_AREA)
In [ ]:
test_images[test_images>0]=1
train_images[train_images>0]=1
Raschka, Sebastian. Python machine learning. Birmingham, UK: Packt Publishing, 2015. Print.