Exercises week 3: What are Digital Methods?

1. Install Tableau Desktop

Students are given a license for this software.

2. Open the DAMD data in Tableau

What shape is the data? What variables are there? Is the data complete, or are there missing values somewhere? How is the phenomenon represented? What is the data even about?

3. How is the data distributed over time?

Create a histogram over the years. What new insight is gained to the data? What are we looking at? What is this thing called "time"?

3.1 A reproduction with Python

To demonstrate programming, similar visualization can be produced computationally. Can you follow the steps and get a general idea what the steps are?


In [3]:
import pandas as pd
%matplotlib inline

Open the data file


In [4]:
damd = pd.read_csv("20170718 hashtag_damd uncleaned.csv")
damd.columns


Out[4]:
Index(['Unnamed: 0', 'tweet_id', 'user_id', 'user_name', 'reply_to_id',
       'created', 'message', 'geodata', 'place_id', 'place_type', 'place_name',
       'place_country', 'language', 'retweet_count', 'hashtags',
       'user_mentions_name', 'user_mentions_id', 'urls', 'media_id',
       'media_type', 'media_url'],
      dtype='object')

Let's look at the created variable.


In [5]:
damd['created'].head(3)


Out[5]:
0    Thu Jul 13 07:33:03 +0000 2017
1    Mon Sep 05 16:13:07 +0000 2016
2    Sun Feb 05 06:04:58 +0000 2017
Name: created, dtype: object

Looks like dates, great. Let's set the data type.


In [13]:
damd['created'] = pd.to_datetime(damd['created'])
damd['created'].head(3)


Out[13]:
0   2017-07-13 07:33:03
1   2016-09-05 16:13:07
2   2017-02-05 06:04:58
Name: created, dtype: datetime64[ns]

Let's group the data by year, and plot the count of items per year as a vertical barchart.


In [30]:
damd['created'].groupby(by=damd['created'].dt.year).count().plot.bar(figsize=(5, 6), title="Tweet activity over years").grid(True, axis="y")


Did the above program create same output as you did in Tableau?

4. Produce an interactive visualization for tweet exploration with Tableau

In Tableau, create a timeline, where tweets are coloured by username, and the details include the tweet content. Use this to explore the topics in the DAMD data.

Compare Twitter and Facebook data.

5. Submit a data visualisation

Hand in a visualization on LearnIT.