In [1]:
from IPython.html.services.config import ConfigManager
from IPython.utils.path import locate_profile
cm = ConfigManager(profile_dir=locate_profile(get_ipython().profile))
cm.update('livereveal', {
'width': 1024,
'height': 768,
})
Out[1]:
In [2]:
from IPython.html.services.config import ConfigManager
from IPython.utils.path import locate_profile
cm = ConfigManager(profile_dir=locate_profile(get_ipython().profile))
cm.update('livereveal', {
'theme': 'simple',
'transition': 'linear',
'start_slideshow_at': 'selected',
'center': 'False',
})
Out[2]:
This conference is different. The audience is mostly people working in science, but many people working in tech and software development. It's a very diverse crowd, and they are friendly.
The full tutorial materials can be found here: https://github.com/amueller/scipy_2015_sklearn_tutorial
This was a 2-session (8 hour) tutorial; it covered a host of topics:
In [3]:
from sklearn.datasets import load_digits
digits = load_digits()
%matplotlib inline
import matplotlib.pyplot as plt
In [4]:
fig = plt.figure(figsize=(6, 6)) # figure size in inches
fig.subplots_adjust(left=0, right=1, bottom=0, top=1, hspace=0.05, wspace=0.05)
# plot the digits: each image is 8x8 pixels
for i in range(24, 48):
ax = fig.add_subplot(8, 8, i + 1, xticks=[], yticks=[])
ax.imshow(digits.images[i], cmap=plt.cm.binary, interpolation='nearest')
# label the image with the target value
ax.text(0, 7, str(digits.target[i]))
We can train, for example, a Gaussian Naive Bayes classifier to identify digits from a test set.
In [5]:
from sklearn.naive_bayes import GaussianNB
from sklearn.cross_validation import train_test_split
In [6]:
digits.data.shape
Out[6]:
In [7]:
# split the data into training and validation sets
X_train, X_test, y_train, y_test = train_test_split(digits.data, digits.target, random_state=5)
# train the model
clf = GaussianNB()
clf.fit(X_train, y_train)
# use the model to predict the labels of the test data
predicted = clf.predict(X_test)
expected = y_test
How did we do?
In [8]:
fig = plt.figure(figsize=(6, 6)) # figure size in inches
fig.subplots_adjust(left=0, right=1, bottom=0, top=1, hspace=0.05, wspace=0.05)
# plot the digits: each image is 8x8 pixels
for i in range(24,48):
ax = fig.add_subplot(8, 8, i + 1, xticks=[], yticks=[])
ax.imshow(X_test.reshape(-1, 8, 8)[i], cmap=plt.cm.binary,
interpolation='nearest')
# label the image with the target value
if predicted[i] == expected[i]:
ax.text(0, 7, str(predicted[i]), color='green')
else:
ax.text(0, 7, str(predicted[i]), color='red')
In [9]:
print(clf.score(X_test, y_test))
In [10]:
from sklearn import datasets
lfw_people = datasets.fetch_lfw_people(min_faces_per_person=70, resize=0.4,
data_home='../tutorials/scipy_2015_sklearn_tutorial/notebooks/datasets/')
lfw_people.data.shape
Out[10]:
In [11]:
fig = plt.figure(figsize=(14, 4))
# plot several images
for i in range(20):
ax = fig.add_subplot(2, 10, i + 1, xticks=[], yticks=[])
ax.imshow(lfw_people.images[i], cmap=plt.cm.bone)
We can perform PCA to extract features from the set of images:
In [12]:
from sklearn.cross_validation import train_test_split
X_train, X_test, y_train, y_test = train_test_split(
lfw_people.data,
lfw_people.target,
random_state=0)
print(X_train.shape, X_test.shape)
In [13]:
from sklearn import decomposition
pca = decomposition.RandomizedPCA(n_components=150, whiten=True)
pca.fit(X_train)
Out[13]:
And we can view the mean face:
In [14]:
plt.imshow(pca.mean_.reshape((50, 37)), cmap=plt.cm.bone)
Out[14]:
As well as the eigenfaces, if we like:
In [15]:
fig = plt.figure(figsize=(16, 6))
for i in range(30):
ax = fig.add_subplot(3, 10, i + 1, xticks=[], yticks=[])
ax.imshow(pca.components_[i].reshape((50, 37)), cmap=plt.cm.bone)
And plenty more example activities:
And plenty of examples:
Application : Image classification
Model complexity, learning curves and validation curves
The full tutorial materials can be found here: https://github.com/chdoig/scipy2015-blaze-bokeh
This session covered mainly Bokeh's API, which allows one to easily build interactive plots that can be rendered in a browser.
In [16]:
import pandas as pd
from bokeh.plotting import figure, show, output_notebook
output_notebook()
# Get data
df = pd.read_csv('../tutorials/scipy2015-blaze-bokeh/data/Land_Ocean_Monthly_Anomaly_Average.csv')
# Process data
df['datetime'] = pd.to_datetime(df['datetime'])
df.head()
Out[16]:
In [17]:
# Create plot
f = figure(plot_width=800, plot_height=400)
f.line(df['datetime'], df['anomaly'], color='skyblue', legend='Temp')
f.line(df['datetime'], pd.rolling_mean(df['anomaly'], window=10), color='grey', legend='Rolling mean')
f.title = 'Temperature anomaly with time'
# Show plot
show(f)
In [18]:
from datetime import timedelta
from IPython.display import YouTubeVideo
start=int(timedelta(minutes=7, seconds=45).total_seconds())
YouTubeVideo("1kkFZ4P-XHg", start=start, autoplay=0, theme="light",
color="blue", height=400, width=700)
Out[18]:
In [19]:
from datetime import timedelta
from IPython.display import YouTubeVideo
start=int(timedelta(minutes=0, seconds=40).total_seconds())
YouTubeVideo("TBBtOeY2Q78", start=start, autoplay=0, theme="light",
color="red", height=400, width=700)
Out[19]:
A talk by the authors of Mesa, a new package that fills a hole in the python ecosystem: a package for building agent-based simulations.
Great talk by Luke Campagnola on the capabilities of VisPy to generate complex visualizations in real time using the GPU and OpenGL.
In [20]:
from datetime import timedelta
from IPython.display import YouTubeVideo
start=int(timedelta(minutes=11, seconds=43).total_seconds())
YouTubeVideo("_3YoaeoiIFI", start=start, autoplay=0, theme="light",
color="red", height=400, width=700)
Out[20]:
There were three keynote talks, one given each day of the main conference. All three are worth watching: