Altair Basic Charting

This notebook seeks to walk you through many of the basic chart types you're going to be building with Altair, such as line charts, bar charts, histograms, etc.


In [12]:
import random

from IPython.display import HTML, display
import numpy as np
import pandas as pd

import altair.api as alt
from altair import html

First we're going to generate some data that might be representative of datasets you're using. Altair works with Pandas DataFrames, so our examples will move from regular Python data structures to DataFrames.

First we're going to create a simple bar chart out of an iterable of x/y pairs. We'll first encode the col_1 and col_2 data to x and y, and then specify widths/heights. Vega-lite has faceting built-in, so it has two dimension specifications:

  • width/height: The width/height of the entire chart
  • singleWidth/singleHeight: The width/height of a single facet

In the following example, we only have one facet, so we'll bound our box appropriately.


In [13]:
# Dict of lists
list_data = {'col_1': [1, 2, 3, 4, 5, 6, 7], 
             'col_2': [10, 20, 30, 20, 15, 30, 45]}

df = pd.DataFrame(list_data)
fig_1 = (alt.Viz(list_data)
            .encode(x="col_1", y="col_2")
            .configure(width=600, height=400, 
                       singleWidth=500, singleHeight=300))

Let's take a look a quick look at our dat encoding


In [14]:
fig_1.encoding.x, fig_1.encoding.y


Out[14]:
({'name': 'col_1', 'type': 'Q', 'bin': False},
 {'name': 'col_2', 'type': 'Q', 'bin': False})

As you can see, Altair figured out that both data types where Quantitative, or Q. Let's make a chart:


In [15]:
out = html.render(fig_1)
display(HTML(html.render(fig_1)))


By default Altair created a scatterplot with a Quantitative axis. Let's take a look at similar data, but with categorical types in the first column.


In [16]:
list_data = {'col_1': ['A', 'B', 'C', 'D', 'E', 'F', 'G'], 
             'col_2': [10, 20, 30, 20, 15, 30, 45]}
df = pd.DataFrame(list_data)
fig_2 = (alt.Viz(list_data)
            .encode(x="col_1:O", y="col_2:Q")
            .configure(width=600, height=400, 
                       singleWidth=500, singleHeight=300)
            .bar())
fig_2.encoding.x.band = alt.Band(size=60)
display(HTML(html.render(fig_2)))


You'll see a couple things above. First is that we presented some type hints for the x and y encoding that were parsed by Altair. Second is that the band on the x encoding above was important- this dictates the width of each categorical element on that axis. In the case of Ordinal or Nominal data (O or N), the band width determines the total chart width.

However, you might not want to spend a lot of time fiddling with band widths and so on. If you just want to create a single chart and fiddle with the dimensions, you can use the set_single_dims helper;


In [17]:
df = pd.DataFrame(list_data)
fig_3 = (alt.Viz(list_data)
            .encode(x="col_1", y="col_2")
            .set_single_dims(width=500, height=300)
            .circle())
display(HTML(html.render(fig_3)))


Altair supports some higher-level charts, such as histograms. For these high-level charts, you can pass your x-encoding straight into the hist method call:


In [18]:
normal = np.random.normal(size=1000)
df = pd.DataFrame({"normal": normal})
fig_4 = (alt.Viz(df)
            .hist(x="normal:O", bins=20)
            .configure(width=700, height=500, 
                       singleHeight=300))
# Band width will largely depend on your number of bins
fig_4.encoding.x.band = alt.Band(size=30)
display(HTML(html.render(fig_4)))


/Users/robstory/src/altair/env/lib/python3.4/site-packages/pandas/core/internals.py:956: FutureWarning: comparison to `None` will result in an elementwise object comparison in the future.
  return self._try_coerce_result(func(values, other))

You can also facet histograms by a second column/dimension, using the color keyword in the hist call:


In [19]:
normal = np.random.normal(size=1000)
cats = np.random.choice(["A", "B", "C", "D"], size=1000)
df = pd.DataFrame({"normal": normal, "cats": cats})
fig_5 = (alt.Viz(df)
            .hist(x="normal:O", color="cats", bins=20)
            .configure(width=700, height=500, 
                       singleHeight=300))
# Band width will largely depend on your number of bins
fig_5.encoding.x.band = alt.Band(size=30)
display(HTML(html.render(fig_5)))


/Users/robstory/src/altair/env/lib/python3.4/site-packages/pandas/core/internals.py:956: FutureWarning: comparison to `None` will result in an elementwise object comparison in the future.
  return self._try_coerce_result(func(values, other))

We can also build line charts with Time dimension types. The following example also uses the aggregation shorthand to average the y-values.


In [20]:
cats = ['y1', 'y2', 'y3', 'y4']
date_cats = pd.Categorical.from_array(pd.date_range('1/1/2015', periods=365, freq='D')).astype(str)
date_data = {"date": date_cats,
             "values": np.random.randint(0, 100, size=365),
             "categories": np.random.choice(cats, size=365)}
df = pd.DataFrame(date_data)
fig_6 = (alt.Viz(df)
            .encode(x="date:T", y="avg(values):Q")
            .configure(width=900, height=500, singleHeight=400)
            .line())
fig_6.encoding.x.band = alt.Band(size=50)
fig_6.encoding.x.timeUnit = "month"
display(HTML(html.render(fig_6)))


/Users/robstory/src/altair/env/lib/python3.4/site-packages/pandas/core/internals.py:956: FutureWarning: comparison to `None` will result in an elementwise object comparison in the future.
  return self._try_coerce_result(func(values, other))