In [12]:
import random
from IPython.display import HTML, display
import numpy as np
import pandas as pd
import altair.api as alt
from altair import html
First we're going to generate some data that might be representative of datasets you're using. Altair works with Pandas DataFrames, so our examples will move from regular Python data structures to DataFrames.
First we're going to create a simple bar chart out of an iterable of x/y pairs. We'll first encode the col_1
and col_2
data to x
and y
, and then specify widths/heights. Vega-lite has faceting built-in, so it has two dimension specifications:
width
/height
: The width/height of the entire chartsingleWidth
/singleHeight
: The width/height of a single facetIn the following example, we only have one facet, so we'll bound our box appropriately.
In [13]:
# Dict of lists
list_data = {'col_1': [1, 2, 3, 4, 5, 6, 7],
'col_2': [10, 20, 30, 20, 15, 30, 45]}
df = pd.DataFrame(list_data)
fig_1 = (alt.Viz(list_data)
.encode(x="col_1", y="col_2")
.configure(width=600, height=400,
singleWidth=500, singleHeight=300))
Let's take a look a quick look at our dat encoding
In [14]:
fig_1.encoding.x, fig_1.encoding.y
Out[14]:
As you can see, Altair figured out that both data types where Quantitative, or Q
. Let's make a chart:
In [15]:
out = html.render(fig_1)
display(HTML(html.render(fig_1)))
By default Altair created a scatterplot with a Quantitative axis. Let's take a look at similar data, but with categorical types in the first column.
In [16]:
list_data = {'col_1': ['A', 'B', 'C', 'D', 'E', 'F', 'G'],
'col_2': [10, 20, 30, 20, 15, 30, 45]}
df = pd.DataFrame(list_data)
fig_2 = (alt.Viz(list_data)
.encode(x="col_1:O", y="col_2:Q")
.configure(width=600, height=400,
singleWidth=500, singleHeight=300)
.bar())
fig_2.encoding.x.band = alt.Band(size=60)
display(HTML(html.render(fig_2)))
You'll see a couple things above. First is that we presented some type hints for the x and y encoding that were parsed by Altair. Second is that the band
on the x encoding above was important- this dictates the width of each categorical element on that axis. In the case of Ordinal or Nominal data (O
or N
), the band width determines the total chart width.
However, you might not want to spend a lot of time fiddling with band widths and so on. If you just want to create a single chart and fiddle with the dimensions, you can use the set_single_dims
helper;
In [17]:
df = pd.DataFrame(list_data)
fig_3 = (alt.Viz(list_data)
.encode(x="col_1", y="col_2")
.set_single_dims(width=500, height=300)
.circle())
display(HTML(html.render(fig_3)))
Altair supports some higher-level charts, such as histograms. For these high-level charts, you can pass your x-encoding straight into the hist
method call:
In [18]:
normal = np.random.normal(size=1000)
df = pd.DataFrame({"normal": normal})
fig_4 = (alt.Viz(df)
.hist(x="normal:O", bins=20)
.configure(width=700, height=500,
singleHeight=300))
# Band width will largely depend on your number of bins
fig_4.encoding.x.band = alt.Band(size=30)
display(HTML(html.render(fig_4)))
You can also facet histograms by a second column/dimension, using the color
keyword in the hist
call:
In [19]:
normal = np.random.normal(size=1000)
cats = np.random.choice(["A", "B", "C", "D"], size=1000)
df = pd.DataFrame({"normal": normal, "cats": cats})
fig_5 = (alt.Viz(df)
.hist(x="normal:O", color="cats", bins=20)
.configure(width=700, height=500,
singleHeight=300))
# Band width will largely depend on your number of bins
fig_5.encoding.x.band = alt.Band(size=30)
display(HTML(html.render(fig_5)))
We can also build line charts with Time dimension types. The following example also uses the aggregation shorthand to average the y-values.
In [20]:
cats = ['y1', 'y2', 'y3', 'y4']
date_cats = pd.Categorical.from_array(pd.date_range('1/1/2015', periods=365, freq='D')).astype(str)
date_data = {"date": date_cats,
"values": np.random.randint(0, 100, size=365),
"categories": np.random.choice(cats, size=365)}
df = pd.DataFrame(date_data)
fig_6 = (alt.Viz(df)
.encode(x="date:T", y="avg(values):Q")
.configure(width=900, height=500, singleHeight=400)
.line())
fig_6.encoding.x.band = alt.Band(size=50)
fig_6.encoding.x.timeUnit = "month"
display(HTML(html.render(fig_6)))