notebook.community



In [80]:

    
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import pickle
import functools as ft



In [87]:

    
X = x_train[1]
X









    Out[87]:





array([ 0.84021634, -0.3574927 ])



In [88]:

    
not ft.reduce(lambda old, new: old == new,X >= 0)









    Out[88]:





True

XOR



In [90]:

    
def xor(X):
    if not ft.reduce(lambda old, new: old == new,X >= 0):
        return 1
    else:
        return 0 
    
x_train = np.array([(np.random.random_sample(5000) - 0.5) * 2 for dim in range(2)]).transpose()
x_test  = np.array([(np.random.random_sample(100)  - 0.5) * 2 for dim in range(2)]).transpose()
y_train = np.apply_along_axis(xor, 1, x_train)
y_test  = np.apply_along_axis(xor, 1, x_test)
with open('data/xor.tuple', 'wb') as xtuple:
    pickle.dump((x_train, y_train, x_test, y_test), xtuple)

Multivariante Regression - Housing Data Set

https://archive.ics.uci.edu/ml/datasets/Housing

CRIM: per capita crime rate by town
ZN: proportion of residential land zoned for lots over 25,000 sq.ft.
INDUS: proportion of non-retail business acres per town
CHAS: Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)
NOX: nitric oxides concentration (parts per 10 million)
RM: average number of rooms per dwelling
AGE: proportion of owner-occupied units built prior to 1940
DIS: weighted distances to five Boston employment centres
RAD: index of accessibility to radial highways
TAX: full-value property-tax rate per \$10,000
PTRATIO: pupil-teacher ratio by town
B: 1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town
LSTAT: \% lower status of the population
MEDV: Median value of owner-occupied homes in $1000's



In [16]:

    
!wget -P data/ https://archive.ics.uci.edu/ml/machine-learning-databases/housing/housing.data









    



--2016-07-13 19:47:59--  https://archive.ics.uci.edu/ml/machine-learning-databases/housing/housing.data
Auflösen des Hostnamens »archive.ics.uci.edu (archive.ics.uci.edu)« … 128.195.10.249
Verbindungsaufbau zu archive.ics.uci.edu (archive.ics.uci.edu)|128.195.10.249|:443 … verbunden.
HTTP-Anforderung gesendet, auf Antwort wird gewartet … 200 OK
Länge: 49082 (48K) [text/plain]
Wird in »»data/housing.data«« gespeichert.

housing.data        100%[===================>]  47.93K   133KB/s    in 0.4s    

2016-07-13 19:48:00 (133 KB/s) - »data/housing.data« gespeichert [49082/49082]



In [31]:

    
housing = pd.read_csv('data/housing.data', delim_whitespace=True, 
                   names=['CRIM', 
                          'ZM', 
                          'INDUS', 
                          'CHAS', 
                          'NOX', 
                          'RM', 
                          'AGE', 
                          'DIS', 
                          'RAD',
                          'TAX',
                          'PTRATIO',
                          'B',
                          'LSTAT',
                          'MEDV'])
housing.head()
with open('data/housing.dframe', 'wb') as dhousing:
    pickle.dump(housing, dhousing)

Binary Classification - Pima Indians Diabetes Data Set

https://archive.ics.uci.edu/ml/datasets/Pima+Indians+Diabetes

Number of times pregnant
Plasma glucose concentration a 2 hours in an oral glucose tolerance test
Diastolic blood pressure (mm Hg)
Triceps skin fold thickness (mm)
2-Hour serum insulin (mu U/ml)
Body mass index (weight in kg/(height in m)^2)
Diabetes pedigree function
Age (years)
Class variable (0 or 1)



In [2]:

    
!wget -P data/ https://archive.ics.uci.edu/ml/machine-learning-databases/pima-indians-diabetes/pima-indians-diabetes.data









    



--2016-07-13 19:28:32--  https://archive.ics.uci.edu/ml/machine-learning-databases/pima-indians-diabetes/pima-indians-diabetes.data
Auflösen des Hostnamens »archive.ics.uci.edu (archive.ics.uci.edu)« … 128.195.10.249
Verbindungsaufbau zu archive.ics.uci.edu (archive.ics.uci.edu)|128.195.10.249|:443 … verbunden.
HTTP-Anforderung gesendet, auf Antwort wird gewartet … 200 OK
Länge: 23279 (23K) [text/plain]
Wird in »»data/pima-indians-diabetes.data«« gespeichert.

pima-indians-diabet 100%[===================>]  22.73K   129KB/s    in 0.2s    

2016-07-13 19:28:33 (129 KB/s) - »data/pima-indians-diabetes.data« gespeichert [23279/23279]



In [72]:

    
data = pd.read_csv('data/pima-indians-diabetes.data',
                   names=['n_pregnant', 
                          'glucose', 
                          'mmHg', 
                          'triceps', 
                          'insulin', 
                          'BMI', 
                          'pedigree', 
                          'age', 
                          'class'])
data.head()
x = np.array(data)[:,:-1]
y = np.array(data)[:,-1]
n_train = int(len(x) * 0.70)
x_train = x[:n_train]
x_test  = x[n_train:]
y_train = y[:n_train]
y_test  = y[n_train:]
with open('data/pima-indians-diabetes.tuple', 'wb') as xtuple:
    pickle.dump((x_train, y_train, x_test, y_test), xtuple)

Image Classification - MNIST dataset

http://deeplearning.net/data/mnist/mnist.pkl.gz



In [9]:

    
!wget -P data/ http://deeplearning.net/data/mnist/mnist.pkl.gz









    



--2016-07-13 15:39:19--  http://deeplearning.net/data/mnist/mnist.pkl.gz
Auflösen des Hostnamens »deeplearning.net (deeplearning.net)« … 132.204.26.28
Verbindungsaufbau zu deeplearning.net (deeplearning.net)|132.204.26.28|:80 … verbunden.
HTTP-Anforderung gesendet, auf Antwort wird gewartet … 200 OK
Länge: 16168813 (15M) [application/x-gzip]
Wird in »»data/mnist.pkl.gz«« gespeichert.

mnist.pkl.gz        100%[===================>]  15.42M  3.95MB/s    in 5.7s    

2016-07-13 15:39:25 (2.71 MB/s) - »data/mnist.pkl.gz« gespeichert [16168813/16168813]



In [11]:

    
import cPickle, gzip, numpy

# Load the dataset
f = gzip.open('data/mnist.pkl.gz', 'rb')
train_set, valid_set, test_set = cPickle.load(f)
f.close()



In [26]:

    
plt.imshow(train_set[0][0].reshape((28,28)),cmap='gray', interpolation=None)









    Out[26]:





<matplotlib.image.AxesImage at 0x7f5bd1e97850>



In [71]:

    
!wget -P data/ http://data.dmlc.ml/mxnet/data/mnist.zip
!unzip -d data/ -u data/mnist.zip









    



--2016-07-14 02:04:00--  http://data.dmlc.ml/mxnet/data/mnist.zip
Auflösen des Hostnamens »data.dmlc.ml (data.dmlc.ml)« … 128.2.209.42
Verbindungsaufbau zu data.dmlc.ml (data.dmlc.ml)|128.2.209.42|:80 … verbunden.
HTTP-Anforderung gesendet, auf Antwort wird gewartet … 200 OK
Länge: 11595270 (11M) [application/zip]
Wird in »»data/mnist.zip«« gespeichert.

mnist.zip           100%[===================>]  11.06M  2.02MB/s    in 8.7s    

2016-07-14 02:04:09 (1.26 MB/s) - »data/mnist.zip« gespeichert [11595270/11595270]

Archive:  data/mnist.zip
  inflating: data/t10k-images-idx3-ubyte  
  inflating: data/t10k-labels-idx1-ubyte  
  inflating: data/train-images-idx3-ubyte  
  inflating: data/train-labels-idx1-ubyte

Image Classification - CIFAR-10 dataset

https://www.cs.toronto.edu/~kriz/cifar.html



In [1]:

    
!wget -P data/ https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz
!tar -xzf data/cifar-10-python.tar.gz -C data/









    



--2016-07-13 15:27:29--  https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz
Auflösen des Hostnamens »www.cs.toronto.edu (www.cs.toronto.edu)« … 128.100.3.30
Verbindungsaufbau zu www.cs.toronto.edu (www.cs.toronto.edu)|128.100.3.30|:443 … verbunden.
HTTP-Anforderung gesendet, auf Antwort wird gewartet … 200 OK
Länge: 170498071 (163M) [application/x-gzip]
Wird in »»data/cifar-10-python.tar.gz«« gespeichert.

cifar-10-python.tar 100%[===================>] 162.60M  2.12MB/s    in 82s     

2016-07-13 15:28:51 (1.99 MB/s) - »data/cifar-10-python.tar.gz« gespeichert [170498071/170498071]



In [27]:

    
with open('data/cifar-10-batches-py/data_batch_1', 'rb') as batch:
    cifar1 = cPickle.load(batch)



In [57]:

    
cifar1.keys()









    Out[57]:





['data', 'labels', 'batch_label', 'filenames']



In [54]:

    
img = np.stack([cifar1['data'][0].reshape((3,32,32))[0,:,:],
                cifar1['data'][0].reshape((3,32,32))[1,:,:],
                cifar1['data'][0].reshape((3,32,32))[2,:,:]],axis=2)
plt.imshow(img, cmap='gray')









    Out[54]:





<matplotlib.image.AxesImage at 0x7f5bcb9a7a90>