In a recent blog post, I showed figures from a recent paper and invited readers to redesign them to communicate their message more effectively.

This notebook shows one way we might redesign the figures. At the same time, it demonstrates a simple use of a Pandas MultiIndex.

```
In [1]:
```import matplotlib.pyplot as plt
import pandas as pd

```
In [2]:
```scale = ['10-point', '6-point']

At the next level, they distinguish fields of study as "least" or "most" male-dominated.

```
In [3]:
```area = ['LeastMaleDominated', 'MostMaleDominated']

And they distinguish between male and female instructors.

```
In [4]:
```instructor = ['Male', 'Female']

We can assemble those levels into a MultiIndex like this:

```
In [5]:
```index = pd.MultiIndex.from_product([scale, area, instructor],
names=['Scale', 'Area', 'Instructor'])
index

```
Out[5]:
```

For each of these eight conditions, the original paper reports the entire distribution of student evaluation scores. To make a simpler and clearer visualization of the results, I am going to present a summary of these distributions.

I could take the mean of each distribution, and that would show the effect. But to make it even clearer, I will use the fraction of "top" scores, meaning a 9 or 10 on the 10-point scale and a 6 on the 6-point scale.

Now, to get the data, I used the figures from the paper and estimated numbers by eye. **So these numbers are only approximate!**

```
In [6]:
```data = [60, 60, 54, 38, 43, 42, 41, 41]
df = pd.DataFrame(data, columns=['TopScore%'], index=index)
df

```
Out[6]:
```

To extract the subset of the data on a 10-point scale, we can use `loc`

in the usual way.

```
In [7]:
```df.loc['10-point']

```
Out[7]:
```

`xs`

. This example takes a cross-section of the second level.

```
In [8]:
```df.xs('MostMaleDominated', level='Area')

```
Out[8]:
```

This example takes a cross-section of the third level.

```
In [9]:
```df.xs('Male', level='Instructor')

```
Out[9]:
```

```
In [10]:
```ten = df.loc['10-point']
ten

```
Out[10]:
```

Now, the primary thing I want the reader to see is a discrepancy in percentages. For comparison of two or more values, a bar plot is often a good choice.

As a starting place, I'll try the Pandas default for showing a bar plot of this data.

```
In [11]:
```ten.unstack().plot(kind='bar');

```
```

As defaults go, that's not bad. From this figure it is immediately clear that there is a substantial difference in scores between male and female instructors in male-dominated areas, and no difference in other areas.

The following function cleans up some of the details in the presentation.

```
In [12]:
```def make_bar_plot(df):
# make the plot (and set the rotation of the x-axis)
df.unstack().plot(kind='bar', rot=0, alpha=0.7);
# clean up the legend
plt.gca().legend(['Female', 'Male'])
# label the y axis
plt.ylabel('Fraction of instructors getting top scores')
# set limits on the 7-axis (in part to make room for the legend)
plt.ylim([0, 75])

Here are the results for the 10-point scale.

```
In [13]:
```make_bar_plot(ten)
plt.title('10-point scale');

```
```

```
In [14]:
```six = df.loc['6-point']
make_bar_plot(six)
plt.title('6-point scale');

```
```

Presenting two figures might be the best option, but in my challenge I asked for a single figure.

Here's a version that uses Pandas defaults with minimal customization.

```
In [17]:
```df.unstack().plot(kind='barh', xlim=[0, 65], alpha=0.7);
plt.gca().legend(['Female', 'Male'])
plt.gca().invert_yaxis()
plt.xlabel('Fraction of instructors getting top scores')
plt.tight_layout()
plt.savefig('gender_bias.png')

```
```

```
In [ ]:
```