Multiple languages are supported through the concept of kernels: interpreters that execute tiny scripts one by one, on demand, while maintaining a runtime environment. Basically a REPL that's called from the web UI. The list currently includes:
Python and R are also popular for data science and machine learning, so people made sure they integrate well with Jupyter Notebooks. This means that many objects render nicely on the Notebook UI:
For machine learning, 3 types of libraries always pop up:
In [21]:
import os
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# Make charts a bit prettier
plt.style.use('ggplot')
In [43]:
titanic = pd.read_csv('titanic.csv', sep = ',')
In [44]:
# What are the dimensions
titanic.shape
Out[44]:
In [45]:
# What are the column names
titanic.columns
Out[45]:
In [46]:
titanic['MySum'] = titanic["Survived"] + titanic["Pclass"]
In [47]:
# What do the first few rows look like
titanic.head()
Out[47]:
In [48]:
# Let's x cleanup the data a bit
city_names = {"C": "Cherbourg", "Q": "Queenstown", "S": "Southampton"}
titanic["EmbarkedCode"] = titanic["Embarked"]
titanic["Embarked"] = titanic["EmbarkedCode"].apply(lambda value: city_names.get(value))
In [49]:
# Check if it worked
titanic.head()
Out[49]:
In [50]:
#Tell matplotlib to render graphs inside this notebook
%matplotlib inline
In [53]:
# Let's create a contingency table
pd.crosstab(titanic.Pclass, titanic.Survived, margins = True)
Out[53]:
In [30]:
# Let's do the same but as percentages
pd.crosstab(titanic.Pclass, titanic.Survived, margins = True).apply(lambda row: row/len(titanic))
Out[30]:
In [57]:
titanic.groupby(["Sex", "Survived"]).count().unstack("Survived")["PassengerId"]
Out[57]:
In [31]:
# Let's create a stacked bar chart for sex vs. survivability
titanic.groupby(["Sex", "Survived"]).count().unstack("Survived")["PassengerId"].plot(kind="bar", stacked=True)
Out[31]:
In [ ]:
In [64]:
titanic[titanic.Sex == 'female']
Out[64]:
In [32]:
# Do the same graph, but only for people older than 18 years old
titanic[titanic.Age >= 18].groupby(["Sex", "Survived"]).count().unstack("Survived")["PassengerId"].plot(kind="bar", stacked=True)
Out[32]:
In [33]:
games = pd.read_csv("videogames.csv", sep = ",")
In [34]:
games.head()
Out[34]:
In [ ]:
In [83]:
by_publisher = games.groupby("Publisher").agg({"NA_Sales": sum,
"EU_Sales": sum,
"JP_Sales": sum,
"Global_Sales": sum,
"Critic_Score": np.mean})
by_publisher["Nintendo"]
In [69]:
by_publisher.columns
Out[69]:
In [36]:
top_publishers = by_publisher.sort_values("Global_Sales", ascending = False)[0:15][["NA_Sales", "EU_Sales", "JP_Sales"]]
top_publishers
Out[36]:
In [37]:
top_publishers.plot(kind="bar", figsize=(12,5))
Out[37]:
In [38]:
# And again, as a barplot
top_publishers.plot(kind="bar", stacked = True, figsize=(12,5))
Out[38]:
If running Linux, MacOS or Windows 10, you can get Docker native at docker.com.
If you're running Windows 8 then you need Docker Toolbox
Open Docker Quickstart Terminal and run the following:
$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
You're probably seeing an empty list. This is ok, docker is running, you just don't have any container running.
$ docker run -p 8888:8888 -v /home/jovyan/work --name jupyternb jupyter/scipy-notebook start-notebook.sh --NotebookApp.token=''
A bit about what this does:
docker run
is used to run a new container-p 8888:8888
tells docker to map the port 8888 from the container to the host machine (or docker-machine vm)-v /home/jovyan/work
tells docker to create a persistent volume for the directory where the notebooks are stored. Without this, all work will be lost when stopping the docker container.--name jupyternb
specifies the name of the container. Without it, docker will generate a random namejupyter/scipy-notebook
is the name of the image from the docker hub to runstart-notebook.sh --NotebookApp.token=''
: totally optional, but this is specific to the Jupyter Notebook Docker image and tells it to disable authentication. Otherwise, you would have to get the initial configuration token from the docker logs.If using docker native, the app will be available at http://localhost:8888.
If using docker-machine, you'll need to find out its IP first using docker-machine inspect default | grep IPAddress
, usually 192.168.99.100. The app will be avaialble then at e.g. http://192.168.99.100:8888.
If the container is stopped (e.g after reboot), it can be started with:
$ docker start jupyternb
You should now be able to upload or create notebooks, as well as datasets that can be loaded from the notebooks.
Note: All work will be persisted to the docker volume, but you are encouraged to keep your files separately anwyay. They can be downloaded by choosing File > Download as > Notebook from the menu.
In [ ]: