In [80]:
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import pickle
import functools as ft

In [87]:
X = x_train[1]
X


Out[87]:
array([ 0.84021634, -0.3574927 ])

In [88]:
not ft.reduce(lambda old, new: old == new,X >= 0)


Out[88]:
True

XOR


In [90]:
def xor(X):
    if not ft.reduce(lambda old, new: old == new,X >= 0):
        return 1
    else:
        return 0 
    
x_train = np.array([(np.random.random_sample(5000) - 0.5) * 2 for dim in range(2)]).transpose()
x_test  = np.array([(np.random.random_sample(100)  - 0.5) * 2 for dim in range(2)]).transpose()
y_train = np.apply_along_axis(xor, 1, x_train)
y_test  = np.apply_along_axis(xor, 1, x_test)
with open('data/xor.tuple', 'wb') as xtuple:
    pickle.dump((x_train, y_train, x_test, y_test), xtuple)

Multivariante Regression - Housing Data Set

https://archive.ics.uci.edu/ml/datasets/Housing

  1. CRIM: per capita crime rate by town
  2. ZN: proportion of residential land zoned for lots over 25,000 sq.ft.
  3. INDUS: proportion of non-retail business acres per town
  4. CHAS: Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)
  5. NOX: nitric oxides concentration (parts per 10 million)
  6. RM: average number of rooms per dwelling
  7. AGE: proportion of owner-occupied units built prior to 1940
  8. DIS: weighted distances to five Boston employment centres
  9. RAD: index of accessibility to radial highways
  10. TAX: full-value property-tax rate per \$10,000
  11. PTRATIO: pupil-teacher ratio by town
  12. B: 1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town
  13. LSTAT: \% lower status of the population
  14. MEDV: Median value of owner-occupied homes in $1000's

In [16]:
!wget -P data/ https://archive.ics.uci.edu/ml/machine-learning-databases/housing/housing.data


--2016-07-13 19:47:59--  https://archive.ics.uci.edu/ml/machine-learning-databases/housing/housing.data
Auflösen des Hostnamens »archive.ics.uci.edu (archive.ics.uci.edu)« … 128.195.10.249
Verbindungsaufbau zu archive.ics.uci.edu (archive.ics.uci.edu)|128.195.10.249|:443 … verbunden.
HTTP-Anforderung gesendet, auf Antwort wird gewartet … 200 OK
Länge: 49082 (48K) [text/plain]
Wird in »»data/housing.data«« gespeichert.

housing.data        100%[===================>]  47.93K   133KB/s    in 0.4s    

2016-07-13 19:48:00 (133 KB/s) - »data/housing.data« gespeichert [49082/49082]


In [31]:
housing = pd.read_csv('data/housing.data', delim_whitespace=True, 
                   names=['CRIM', 
                          'ZM', 
                          'INDUS', 
                          'CHAS', 
                          'NOX', 
                          'RM', 
                          'AGE', 
                          'DIS', 
                          'RAD',
                          'TAX',
                          'PTRATIO',
                          'B',
                          'LSTAT',
                          'MEDV'])
housing.head()
with open('data/housing.dframe', 'wb') as dhousing:
    pickle.dump(housing, dhousing)

Binary Classification - Pima Indians Diabetes Data Set

https://archive.ics.uci.edu/ml/datasets/Pima+Indians+Diabetes

  1. Number of times pregnant
  2. Plasma glucose concentration a 2 hours in an oral glucose tolerance test
  3. Diastolic blood pressure (mm Hg)
  4. Triceps skin fold thickness (mm)
  5. 2-Hour serum insulin (mu U/ml)
  6. Body mass index (weight in kg/(height in m)^2)
  7. Diabetes pedigree function
  8. Age (years)
  9. Class variable (0 or 1)

In [2]:
!wget -P data/ https://archive.ics.uci.edu/ml/machine-learning-databases/pima-indians-diabetes/pima-indians-diabetes.data


--2016-07-13 19:28:32--  https://archive.ics.uci.edu/ml/machine-learning-databases/pima-indians-diabetes/pima-indians-diabetes.data
Auflösen des Hostnamens »archive.ics.uci.edu (archive.ics.uci.edu)« … 128.195.10.249
Verbindungsaufbau zu archive.ics.uci.edu (archive.ics.uci.edu)|128.195.10.249|:443 … verbunden.
HTTP-Anforderung gesendet, auf Antwort wird gewartet … 200 OK
Länge: 23279 (23K) [text/plain]
Wird in »»data/pima-indians-diabetes.data«« gespeichert.

pima-indians-diabet 100%[===================>]  22.73K   129KB/s    in 0.2s    

2016-07-13 19:28:33 (129 KB/s) - »data/pima-indians-diabetes.data« gespeichert [23279/23279]


In [72]:
data = pd.read_csv('data/pima-indians-diabetes.data',
                   names=['n_pregnant', 
                          'glucose', 
                          'mmHg', 
                          'triceps', 
                          'insulin', 
                          'BMI', 
                          'pedigree', 
                          'age', 
                          'class'])
data.head()
x = np.array(data)[:,:-1]
y = np.array(data)[:,-1]
n_train = int(len(x) * 0.70)
x_train = x[:n_train]
x_test  = x[n_train:]
y_train = y[:n_train]
y_test  = y[n_train:]
with open('data/pima-indians-diabetes.tuple', 'wb') as xtuple:
    pickle.dump((x_train, y_train, x_test, y_test), xtuple)

Image Classification - MNIST dataset

http://deeplearning.net/data/mnist/mnist.pkl.gz


In [9]:
!wget -P data/ http://deeplearning.net/data/mnist/mnist.pkl.gz


--2016-07-13 15:39:19--  http://deeplearning.net/data/mnist/mnist.pkl.gz
Auflösen des Hostnamens »deeplearning.net (deeplearning.net)« … 132.204.26.28
Verbindungsaufbau zu deeplearning.net (deeplearning.net)|132.204.26.28|:80 … verbunden.
HTTP-Anforderung gesendet, auf Antwort wird gewartet … 200 OK
Länge: 16168813 (15M) [application/x-gzip]
Wird in »»data/mnist.pkl.gz«« gespeichert.

mnist.pkl.gz        100%[===================>]  15.42M  3.95MB/s    in 5.7s    

2016-07-13 15:39:25 (2.71 MB/s) - »data/mnist.pkl.gz« gespeichert [16168813/16168813]


In [11]:
import cPickle, gzip, numpy

# Load the dataset
f = gzip.open('data/mnist.pkl.gz', 'rb')
train_set, valid_set, test_set = cPickle.load(f)
f.close()

In [26]:
plt.imshow(train_set[0][0].reshape((28,28)),cmap='gray', interpolation=None)


Out[26]:
<matplotlib.image.AxesImage at 0x7f5bd1e97850>

In [71]:
!wget -P data/ http://data.dmlc.ml/mxnet/data/mnist.zip
!unzip -d data/ -u data/mnist.zip


--2016-07-14 02:04:00--  http://data.dmlc.ml/mxnet/data/mnist.zip
Auflösen des Hostnamens »data.dmlc.ml (data.dmlc.ml)« … 128.2.209.42
Verbindungsaufbau zu data.dmlc.ml (data.dmlc.ml)|128.2.209.42|:80 … verbunden.
HTTP-Anforderung gesendet, auf Antwort wird gewartet … 200 OK
Länge: 11595270 (11M) [application/zip]
Wird in »»data/mnist.zip«« gespeichert.

mnist.zip           100%[===================>]  11.06M  2.02MB/s    in 8.7s    

2016-07-14 02:04:09 (1.26 MB/s) - »data/mnist.zip« gespeichert [11595270/11595270]

Archive:  data/mnist.zip
  inflating: data/t10k-images-idx3-ubyte  
  inflating: data/t10k-labels-idx1-ubyte  
  inflating: data/train-images-idx3-ubyte  
  inflating: data/train-labels-idx1-ubyte  

Image Classification - CIFAR-10 dataset

https://www.cs.toronto.edu/~kriz/cifar.html


In [1]:
!wget -P data/ https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz
!tar -xzf data/cifar-10-python.tar.gz -C data/


--2016-07-13 15:27:29--  https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz
Auflösen des Hostnamens »www.cs.toronto.edu (www.cs.toronto.edu)« … 128.100.3.30
Verbindungsaufbau zu www.cs.toronto.edu (www.cs.toronto.edu)|128.100.3.30|:443 … verbunden.
HTTP-Anforderung gesendet, auf Antwort wird gewartet … 200 OK
Länge: 170498071 (163M) [application/x-gzip]
Wird in »»data/cifar-10-python.tar.gz«« gespeichert.

cifar-10-python.tar 100%[===================>] 162.60M  2.12MB/s    in 82s     

2016-07-13 15:28:51 (1.99 MB/s) - »data/cifar-10-python.tar.gz« gespeichert [170498071/170498071]


In [27]:
with open('data/cifar-10-batches-py/data_batch_1', 'rb') as batch:
    cifar1 = cPickle.load(batch)

In [57]:
cifar1.keys()


Out[57]:
['data', 'labels', 'batch_label', 'filenames']

In [54]:
img = np.stack([cifar1['data'][0].reshape((3,32,32))[0,:,:],
                cifar1['data'][0].reshape((3,32,32))[1,:,:],
                cifar1['data'][0].reshape((3,32,32))[2,:,:]],axis=2)
plt.imshow(img, cmap='gray')


Out[54]:
<matplotlib.image.AxesImage at 0x7f5bcb9a7a90>