ADS-DV

Data exploration with Tableau

Summary

This assignment lets you practice data exploration from scratch using Tableau.

Data exploration workflow

To guide you in this assignment you can follow this workflow:

  1. First you have to find an interesting dataset. Tableau links to a set of interesting and compatible datasets: https://public.tableau.com/s/resources
  2. Connect with your data in Tableau. You can import and join multiple datasets.
  3. Browse you data and make sure you have an overview of the dimensions and measures in your dataset.
  4. Formulate questions about the data, or at least one to start with.
  5. Try to get an insight into the answer to your question by creating a visualization. Start by looking for the dimensions and measures that could help you reach the answer and what measures to assign to size, colors and labels. Do you need to filter the data?
  6. Did you answer the question? Then pose and answer the next question.
  7. Create a Dashboard that gives an overview of your visualizations.

Assignment A

Download one of the datasets from the link above that interests you. Alternatively, you can use a different dataset. In that case, make sure you include the link or dataset in this iPython Notebook.

As you have seen in the tutorial workbooks of the previous assignment, the visualizations start with a question such as: 'Which artists sell the most?'. When you pose a question, think of searching for outliers, trends or clusters.

Ask 3 questions about your dataset and include them here. Then start the exploration workflow and include the resulting dashboard in this notebook.

Dataset used

I used the following dataset: Cat vs Dog Popularity in the US

Questions about the dataset

  1. What is the most popular animal per state: Cats or Dogs?
  2. What are the top 10 states which contain the most pets?
  3. What are the top 10 states which contains the highest percentage of pet households?

Dashboard

Assignment B (optional)

While you can create beautiful visualizations that are extremely useful in the data expoloration phase with Tableau, if you want to use it as part of reproducible research you have to be able to get to the same visualization using Python.

This assignment is an optional challenge. Recreate your dashboard above using iPython. This means you must:

  1. Import the data into iPython.
  2. Depending on your Tableau plots, clean or filter your data.
  3. Plot using a Python library such as Matplotlib.

You do not have to reproduce exactly the style and colors of the Tableau plots, but they should be similar enough to give the same insights. Good luck!


In [ ]:
# your iPython version of the above visualization