CASE - Biodiversity data - analysis

DS Data manipulation, analysis and visualisation in Python
December, 2019

© 2016, Joris Van den Bossche and Stijn Van Hoey (mailto:jorisvandenbossche@gmail.com, mailto:stijnvanhoey@gmail.com). Licensed under CC BY 4.0 Creative Commons



In [1]:
%matplotlib inline

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

plt.style.use('seaborn-whitegrid')

Reading in the enriched survey data set

EXERCISE:
  • Read in the 'survey_data_completed.csv' file and save the resulting DataFrame as variable survey_data_processed (if you did not complete the previous notebook, a version of the csv file is available in the `../data` folder).
  • Interpret the 'eventDate' column directly as python datetime object and make sure the 'occurrenceID' column is used as the index of the resulting DataFrame (both can be done at once when reading the csv file using parameters of the `read_csv` function)
  • Inspect the resulting frame (remember `.head()` and `.info()`) and check that the 'eventDate' indeed has a datetime data type.

In [2]:
survey_data_processed = pd.read_csv("../data/survey_data_completed.csv", 
                                    parse_dates=['eventDate'], index_col="occurrenceID")

In [3]:
survey_data_processed.head()


Out[3]:
verbatimLocality verbatimSex wgt datasetName sex eventDate decimalLongitude decimalLatitude genus species taxa name class kingdom order phylum scientificName status usageKey
occurrenceID
1 2 M NaN Ecological Archives E090-118-D1. male 1977-07-16 -109.081975 31.938887 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2 3 M NaN Ecological Archives E090-118-D1. male 1977-07-16 -109.081208 31.938896 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
3 2 F NaN Ecological Archives E090-118-D1. female 1977-07-16 -109.081975 31.938887 Dipodomys merriami Rodent Dipodomys merriami Mammalia Animalia Rodentia Chordata Dipodomys merriami Mearns, 1890 ACCEPTED 2439521.0
4 7 M NaN Ecological Archives E090-118-D1. male 1977-07-16 -109.082816 31.938113 Dipodomys merriami Rodent Dipodomys merriami Mammalia Animalia Rodentia Chordata Dipodomys merriami Mearns, 1890 ACCEPTED 2439521.0
5 3 M NaN Ecological Archives E090-118-D1. male 1977-07-16 -109.081208 31.938896 Dipodomys merriami Rodent Dipodomys merriami Mammalia Animalia Rodentia Chordata Dipodomys merriami Mearns, 1890 ACCEPTED 2439521.0

In [4]:
survey_data_processed.info()


<class 'pandas.core.frame.DataFrame'>
Int64Index: 35550 entries, 1 to 35550
Data columns (total 19 columns):
verbatimLocality    35550 non-null int64
verbatimSex         33042 non-null object
wgt                 32283 non-null float64
datasetName         35550 non-null object
sex                 33041 non-null object
eventDate           35550 non-null datetime64[ns]
decimalLongitude    35550 non-null float64
decimalLatitude     35550 non-null float64
genus               33535 non-null object
species             33535 non-null object
taxa                33535 non-null object
name                33535 non-null object
class               33448 non-null object
kingdom             33448 non-null object
order               33448 non-null object
phylum              33448 non-null object
scientificName      33448 non-null object
status              33448 non-null object
usageKey            33448 non-null float64
dtypes: datetime64[ns](1), float64(4), int64(1), object(13)
memory usage: 5.4+ MB

Tackle missing values (NaN) and duplicate values

EXERCISE: How many records are in the data set without information on the 'species' name?

In [5]:
survey_data_processed['species'].isnull().sum()


Out[5]:
2015
EXERCISE: How many duplicate records are present in the dataset? _Tip_: Pandas has a function to find `duplicated` values...

In [6]:
survey_data_processed.duplicated().sum()


Out[6]:
1577
EXERCISE: Extract a list of all duplicates, sort on the columns `eventDate` and `verbatimLocality` and show the first 10 records _Tip_: Check documentation of `duplicated`

In [7]:
survey_data_processed[survey_data_processed.duplicated(keep=False)].sort_values(["eventDate", "verbatimLocality"]).head(10)


Out[7]:
verbatimLocality verbatimSex wgt datasetName sex eventDate decimalLongitude decimalLatitude genus species taxa name class kingdom order phylum scientificName status usageKey
occurrenceID
5 3 M NaN Ecological Archives E090-118-D1. male 1977-07-16 -109.081208 31.938896 Dipodomys merriami Rodent Dipodomys merriami Mammalia Animalia Rodentia Chordata Dipodomys merriami Mearns, 1890 ACCEPTED 2439521.0
14 3 M NaN Ecological Archives E090-118-D1. male 1977-07-16 -109.081208 31.938896 Dipodomys merriami Rodent Dipodomys merriami Mammalia Animalia Rodentia Chordata Dipodomys merriami Mearns, 1890 ACCEPTED 2439521.0
4 7 M NaN Ecological Archives E090-118-D1. male 1977-07-16 -109.082816 31.938113 Dipodomys merriami Rodent Dipodomys merriami Mammalia Animalia Rodentia Chordata Dipodomys merriami Mearns, 1890 ACCEPTED 2439521.0
13 7 M NaN Ecological Archives E090-118-D1. male 1977-07-16 -109.082816 31.938113 Dipodomys merriami Rodent Dipodomys merriami Mammalia Animalia Rodentia Chordata Dipodomys merriami Mearns, 1890 ACCEPTED 2439521.0
34 11 F NaN Ecological Archives E090-118-D1. female 1977-07-17 -109.079307 31.938056 Dipodomys merriami Rodent Dipodomys merriami Mammalia Animalia Rodentia Chordata Dipodomys merriami Mearns, 1890 ACCEPTED 2439521.0
38 11 F NaN Ecological Archives E090-118-D1. female 1977-07-17 -109.079307 31.938056 Dipodomys merriami Rodent Dipodomys merriami Mammalia Animalia Rodentia Chordata Dipodomys merriami Mearns, 1890 ACCEPTED 2439521.0
40 11 F NaN Ecological Archives E090-118-D1. female 1977-07-17 -109.079307 31.938056 Dipodomys merriami Rodent Dipodomys merriami Mammalia Animalia Rodentia Chordata Dipodomys merriami Mearns, 1890 ACCEPTED 2439521.0
27 15 M NaN Ecological Archives E090-118-D1. male 1977-07-17 -109.081036 31.937059 Dipodomys merriami Rodent Dipodomys merriami Mammalia Animalia Rodentia Chordata Dipodomys merriami Mearns, 1890 ACCEPTED 2439521.0
28 15 M NaN Ecological Archives E090-118-D1. male 1977-07-17 -109.081036 31.937059 Dipodomys merriami Rodent Dipodomys merriami Mammalia Animalia Rodentia Chordata Dipodomys merriami Mearns, 1890 ACCEPTED 2439521.0
45 18 F NaN Ecological Archives E090-118-D1. female 1977-07-18 -109.078633 31.937126 Dipodomys merriami Rodent Dipodomys merriami Mammalia Animalia Rodentia Chordata Dipodomys merriami Mearns, 1890 ACCEPTED 2439521.0
EXERCISE: Exclude the duplicate values from the survey data set and save the result as survey_data_unique __Tip__: Next to finding `duplicated` values, Pandas has a function to `drop duplicates`...

In [8]:
survey_data_unique = survey_data_processed.drop_duplicates()

In [9]:
len(survey_data_unique)


Out[9]:
33973
EXERCISE: For how many records (rows) we have all the information available (i.e. no NaN values in any of the columns)? __Tip__: Just counting the nan (null) values won't work, maybe `dropna` can help you?

In [10]:
len(survey_data_unique.dropna())


Out[10]:
29777
EXERCISE: Select the subset of records without a species name, while having information on the sex and store the result as variable not_identified __Tip__: next to `isnull`, also `notnull` exists...

In [11]:
mask = survey_data_unique['species'].isnull() & survey_data_unique['sex'].notnull()
not_identified = survey_data_unique[mask]

In [12]:
not_identified.head()


Out[12]:
verbatimLocality verbatimSex wgt datasetName sex eventDate decimalLongitude decimalLatitude genus species taxa name class kingdom order phylum scientificName status usageKey
occurrenceID
1 2 M NaN Ecological Archives E090-118-D1. male 1977-07-16 -109.081975 31.938887 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2 3 M NaN Ecological Archives E090-118-D1. male 1977-07-16 -109.081208 31.938896 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
23 15 F NaN Ecological Archives E090-118-D1. female 1977-07-17 -109.081036 31.937059 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
39 17 M NaN Ecological Archives E090-118-D1. male 1977-07-17 -109.079415 31.937117 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
73 2 M NaN Ecological Archives E090-118-D1. male 1977-08-19 -109.081975 31.938887 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
EXERCISE: Select only those records that do have species information and save them as the variable survey_data. Make sure survey_data is a copy of the original DataFrame. This is the DataFrame we will use in the further analyses.

In [13]:
survey_data = survey_data_unique.dropna(subset=['species']).copy()
NOTE: For biodiversity studies, absence values (knowing that someting is not present) are useful as well to normalize the observations, but this is out of scope for these exercises.

Observations over time

EXERCISE: Make a plot visualizing the evolution of the number of observations for each of the individual years (i.e. annual counts). __Tip__: In the `pandas_04_time_series_data.ipynb` notebook, a powerful command to resample a time series

In [14]:
survey_data.resample('A', on='eventDate').size().plot()


Out[14]:
<matplotlib.axes._subplots.AxesSubplot at 0x7fbfec81ead0>

To evaluate the intensity or number of occurrences during different time spans, a heatmap is an interesting representation. We can actually use the plotnine library as well to make heatmaps, as it provides the geom_tile geometry. Loading the library:


In [15]:
import plotnine as pn
EXERCISE: Create a table, called heatmap_prep_plotnine, based on the survey_data DataFrame with a column for the years, a column for the months a column with the counts (called `count`). __Tip__: You have to count for each year/month combination. Also `reset_index` could be useful.

In [16]:
heatmap_prep_plotnine = survey_data.groupby([survey_data['eventDate'].dt.year, 
                                             survey_data['eventDate'].dt.month]).size()
heatmap_prep_plotnine.index.names = ["year", "month"]
heatmap_prep_plotnine = heatmap_prep_plotnine.reset_index(name='count')

In [17]:
heatmap_prep_plotnine.head()


Out[17]:
year month count
0 1977 7 47
1 1977 8 74
2 1977 9 76
3 1977 10 88
4 1977 11 59
EXERCISE: Based on heatmap_prep_plotnine, make a heatmap using the plotnine package. __Tip__: When in trouble, check [this section of the documentation](http://plotnine.readthedocs.io/en/stable/generated/plotnine.geoms.geom_tile.html#Annotated-Heatmap)

In [18]:
(pn.ggplot(heatmap_prep_plotnine, pn.aes(x="factor(month)", y="year", fill="count"))
    + pn.geom_tile()
    + pn.scale_fill_cmap("Reds")
    + pn.scale_y_reverse()
    + pn.theme( 
     axis_ticks=pn.element_blank(),
     panel_background=pn.element_rect(fill='white'))
)


Out[18]:
<ggplot: (8778890668909)>

Remark that we started from a tidy data format (also called long format).

The heatmap functionality is also provided by the plotting library seaborn (check the docs!). Based on the documentation, seaborn uses the short format with in the row index the years, in the column the months and the counts for each of these year/month combinations as values.

Let's reformat the heatmap_prep_plotnine data to be useable for the seaborn heatmap function:

EXERCISE: Create a table, called heatmap_prep_sns, based on the heatmap_prep_plotnine DataFrame with in the row index the years, in the column the months and as values of the table, the counts for each of these year/month combinations. __Tip__: The `pandas_07_reshaping_data.ipynb` notebook provides all you need to know

In [19]:
heatmap_prep_sns = heatmap_prep_plotnine.pivot_table(index='year', columns='month', values='count')
EXERCISE: Using the seaborn documentation make a heatmap starting from the heatmap_prep_sns variable.

In [20]:
fig, ax = plt.subplots(figsize=(10, 8))
ax = sns.heatmap(heatmap_prep_sns, cmap='Reds')


EXERCISE: Based on the heatmap_prep_sns DataFrame, return to the long format of the table with the columns `year`, `month` and `count` and call the resulting variable heatmap_tidy. __Tip__: The `pandas_07_reshaping_data.ipynb` notebook provides all you need to know, but a `reset_index` could be useful as well

In [21]:
heatmap_tidy = heatmap_prep_sns.reset_index().melt(id_vars=["year"], value_name="count")
heatmap_tidy.head()


Out[21]:
year month count
0 1977 1 NaN
1 1978 1 63.0
2 1979 1 55.0
3 1980 1 124.0
4 1981 1 157.0

Species abundance for each of the plots

The name of the observed species consists of two parts: the 'genus' and 'species' columns. For the further analyses, we want the combined name. This is already available as the 'name' column if you completed the previous notebook, otherwise you can add this again in the following exercise.

EXERCISE: Make a new column 'name' that combines the 'Genus' and 'species' columns (with a space in between). __Tip__: You are aware you can count with strings in Python 'a' + 'b' = 'ab'?

In [22]:
survey_data['name'] = survey_data['genus'] + ' ' + survey_data['species']
EXERCISE: Which 8 species have been observed most of all? __Tip__: Pandas provide a function to combine sorting and showing the first n records, see [here](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.nlargest.html)...

In [23]:
survey_data.groupby("name").size().nlargest(8)


Out[23]:
name
Dipodomys merriami           10025
Dipodomys ordii               2966
Chaetodipus penicillatus      2928
Chaetodipus baileyi           2696
Reithrodontomys megalotis     2485
Dipodomys spectabilis         2481
Onychomys torridus            2220
Perognathus flavus            1475
dtype: int64

In [24]:
survey_data['name'].value_counts()[:8]


Out[24]:
Dipodomys merriami           10025
Dipodomys ordii               2966
Chaetodipus penicillatus      2928
Chaetodipus baileyi           2696
Reithrodontomys megalotis     2485
Dipodomys spectabilis         2481
Onychomys torridus            2220
Perognathus flavus            1475
Name: name, dtype: int64
EXERCISE: How many records are available of each of the species in each of the plots (called `verbatimLocality`)? How would you visualize this information with seaborn?

In [25]:
species_per_plot = survey_data.reset_index().pivot_table(index="name", 
                                                         columns="verbatimLocality", 
                                                         values="occurrenceID", 
                                                         aggfunc='count')

# alternative ways to calculate this
#species_per_plot =  survey_data.groupby(['name', 'plot_id']).size().unstack(level=-1)
#species_per_plot = pd.crosstab(survey_data['name'], survey_data['plot_id'])

In [26]:
fig, ax = plt.subplots(figsize=(8,8))
sns.heatmap(species_per_plot, ax=ax, cmap='Reds')


Out[26]:
<matplotlib.axes._subplots.AxesSubplot at 0x7fbfea62b210>
EXERCISE: What is the number of different species in each of the plots? Make a bar chart, using Pandas `plot` function, providing for each plot the diversity of species, by defining a matplotlib figure and ax to make the plot. Change the y-label to 'plot number' __Tip__: next to `unique`, Pandas also provides a function `nunique`...

In [27]:
n_species_per_plot = survey_data.groupby(["verbatimLocality"])["name"].nunique()

fig, ax = plt.subplots(figsize=(6, 6))
n_species_per_plot.plot(kind="barh", ax=ax, color="lightblue")
ax.set_ylabel("plot number")

# Alternative option:
# inspired on the pivot table we already had:
# species_per_plot = survey_data.reset_index().pivot_table(
#     index="name", columns="verbatimLocality", values="occurrenceID", aggfunc='count')
# n_species_per_plot = species_per_plot.count()


Out[27]:
Text(0, 0.5, 'plot number')
EXERCISE: What is the number of plots each species have been observed? Make an horizontal bar chart using Pandas `plot` function providing for each species the spread amongst the plots for which the species names are sorted to the number of plots

In [28]:
n_plots_per_species = survey_data.groupby(["name"])["verbatimLocality"].nunique().sort_values()

fig, ax = plt.subplots(figsize=(8, 8))
n_plots_per_species.plot(kind="barh", ax=ax, color='0.4')

# Alternatives
# species_per_plot2 = survey_data.reset_index().pivot_table(index="verbatimLocality",
#                                                           columns="name",
#                                                           values="occurrenceID",
#                                                           aggfunc='count')
# nplots_per_species = species_per_plot2.count().sort_values(ascending=False)
# or
# species_per_plot.count(axis=1).sort_values(ascending=False).plot(kind='bar')


Out[28]:
<matplotlib.axes._subplots.AxesSubplot at 0x7fbfea4c9d90>
EXERCISE: First, exclude the NaN-values from the `sex` column and save the result as a new variable called `subselection_sex`. Based on this variable `subselection_sex`, calculate the amount of males and females present in each of the plots. Save the result (with the verbatimLocality as index and sex as column names) as a variable n_plot_sex. __Tip__: Release the power of `unstack`...

In [31]:
subselection_sex = survey_data.dropna(subset=["sex"])
#subselection_sex = survey_data[survey_data["sex"].notnull()]

In [32]:
n_plot_sex = subselection_sex.groupby(["sex", "verbatimLocality"]).size().unstack(level=0)
n_plot_sex.head()


Out[32]:
sex female male
verbatimLocality
1 792 1027
2 838 1018
3 810 742
4 825 972
5 495 553

As such, we can use the variable n_plot_sex to plot the result:


In [33]:
n_plot_sex.plot(kind='bar', figsize=(12, 6), rot=0)


Out[33]:
<matplotlib.axes._subplots.AxesSubplot at 0x7fbfe88a3c10>
EXERCISE: Create the previous plot with the plotnine library, directly from the variable subselection_sex. __Tip__: When in trouble, check these [docs](http://plotnine.readthedocs.io/en/stable/generated/plotnine.geoms.geom_col.html#Two-Variable-Bar-Plot).

In [34]:
(pn.ggplot(subselection_sex, pn.aes(x="verbatimLocality", fill="sex"))
     + pn.geom_bar(position='dodge')
     + pn.scale_x_discrete(breaks=np.arange(1, 25, 1), limits=np.arange(1, 25, 1))
)


Out[34]:
<ggplot: (8778888519333)>

Select subsets according to taxa of species


In [35]:
survey_data["taxa"].unique()


Out[35]:
array(['Rodent', 'Rodent-not censused', 'Rabbit', 'Bird', 'Reptile'],
      dtype=object)

In [36]:
survey_data['taxa'].value_counts()
#survey_data.groupby('taxa').size()


Out[36]:
Rodent                 30939
Rodent-not censused      595
Bird                     354
Rabbit                    59
Reptile                   14
Name: taxa, dtype: int64
EXERCISE: Select the records for which the `taxa` is equal to 'Rabbit', 'Bird' or 'Reptile'. Call the resulting variable `non_rodent_species`. __Tip__: You do not have to combine three different conditions, as Pandas has a function to check if something is in a certain list of values

In [37]:
non_rodent_species = survey_data[survey_data['taxa'].isin(['Rabbit', 'Bird', 'Reptile'])]

In [38]:
len(non_rodent_species)


Out[38]:
427
EXERCISE: Select the records for which the `taxa` starts with an 'ro' (make sure it does not matter if a capital character is used in the 'taxa' name). Call the resulting variable r_species. __Tip__: Remember the `.str.` construction to provide all kind of string functionalities?

In [39]:
r_species = survey_data[survey_data['taxa'].str.lower().str.startswith('ro')]

In [40]:
len(r_species)


Out[40]:
31534
EXERCISE: Select the records that are not Birds. Call the resulting variable non_bird_species.

In [41]:
non_bird_species = survey_data[survey_data['taxa'] != 'Bird']

In [42]:
len(non_bird_species)


Out[42]:
31607

(OPTIONAL SECTION) Evolution of species during monitoring period

In this section, all plots can be made with the embedded Pandas plot function, unless specificly asked

EXERCISE: Plot using Pandas `plot` function the number of records for `Dipodomys merriami` on yearly basis during time

In [43]:
merriami = survey_data[survey_data["name"] == "Dipodomys merriami"]

In [44]:
fig, ax = plt.subplots()
merriami.groupby(merriami['eventDate'].dt.year).size().plot(ax=ax)
ax.set_xlabel("")
ax.set_ylabel("number of occurrences")


Out[44]:
Text(0, 0.5, 'number of occurrences')
NOTE: Check the difference between the following two graphs? What is different? Which one would you use?

In [45]:
merriami = survey_data[survey_data["species"] == "merriami"]
fig, ax = plt.subplots(2, 1, figsize=(14, 8))
merriami.groupby(merriami['eventDate']).size().plot(ax=ax[0], style="-") # top graph
merriami.resample("D", on="eventDate").size().plot(ax=ax[1], style="-") # lower graph


Out[45]:
<matplotlib.axes._subplots.AxesSubplot at 0x7fbfe84f5190>
EXERCISE: Plot, for the species 'Dipodomys merriami', 'Dipodomys ordii', 'Reithrodontomys megalotis' and 'Chaetodipus baileyi', the monthly number of records as a function of time for the whole monitoring period. Plot each of the individual species in a separate subplot and provide them all with the same y-axis scale __Tip__: have a look at the documentation of the pandas plot function.

In [46]:
subsetspecies = survey_data[survey_data["name"].isin(['Dipodomys merriami', 'Dipodomys ordii',
                                                      'Reithrodontomys megalotis', 'Chaetodipus baileyi'])]

In [47]:
month_evolution = subsetspecies.groupby("name").resample('M', on='eventDate').size()

In [48]:
species_evolution = month_evolution.unstack(level=0)
axs = species_evolution.plot(subplots=True, figsize=(14, 8), sharey=True)


EXERCISE: Reproduce the previous plot using the plotnine package.

In [49]:
subsetspecies = survey_data[survey_data["name"].isin(['Dipodomys merriami', 'Dipodomys ordii',
                                                      'Reithrodontomys megalotis', 'Chaetodipus baileyi'])]
month_evolution = subsetspecies.groupby("name").resample('M', on='eventDate').size()

In [50]:
(pn.ggplot(month_evolution.reset_index(name='count'), 
           pn.aes(x='eventDate', y='count', color='name'))
    + pn.geom_line()
    + pn.facet_wrap('name', nrow=4)
    + pn.theme_light()
)


Out[50]:
<ggplot: (8778888171901)>
EXERCISE: Evaluate the yearly amount of occurrences for each of the 'taxa' as a function of time.

In [51]:
year_evolution = survey_data.groupby("taxa").resample('A', on='eventDate').size()
species_evolution = year_evolution.unstack(level=0)
axs = species_evolution.plot(subplots=True, figsize=(16, 8), sharey=False)


EXERCISE: Calculate the number of occurrences for each weekday, grouped by each year of the monitoring campaign, without using the `pivot` functionality. Call the variable count_weekday_years

In [52]:
count_weekday_years = survey_data.groupby([survey_data["eventDate"].dt.year, survey_data["eventDate"].dt.dayofweek]).size().unstack()

In [53]:
# Alternative
#years = survey_data["eventDate"].dt.year.rename('year')
#dayofweaks = survey_data["eventDate"].dt.dayofweek.rename('dayofweak')
#count_weekday_years = pd.crosstab(index=years, columns=dayofweaks)

In [54]:
count_weekday_years.head()


Out[54]:
eventDate 0 1 2 3 4 5 6
eventDate
1977 89.0 35.0 NaN NaN 19.0 106.0 160.0
1978 121.0 14.0 16.0 71.0 148.0 277.0 275.0
1979 39.0 79.0 121.0 58.0 NaN 165.0 161.0
1980 234.0 162.0 94.0 62.0 81.0 160.0 489.0
1981 446.0 63.0 87.0 126.0 74.0 83.0 427.0

In [55]:
count_weekday_years.plot()


Out[55]:
<matplotlib.axes._subplots.AxesSubplot at 0x7fbfe3e99050>
EXERCISE: Based on the variable `count_weekday_years`, calculate for each weekday the median amount of records based on the yearly count values. Modify the labels of the plot to indicate the actual days of the week (instead of numbers)

In [56]:
fig, ax = plt.subplots()
count_weekday_years.median(axis=0).plot(kind='barh', ax=ax, color='#66b266')
xticks = ax.set_yticklabels(['Monday', 'Tuesday', 'Wednesday', "Thursday", "Friday", "Saturday", "Sunday"])


Nice work!