Exercises week 2: Qualitative Approaches to Quantitative Data

1. We have seen network graphs

We saw network graphs on the lecture. Where else have you seen them? What purpose do you remember they have served?

Talk with neighbour for 10 min what do you think is necessary for making one. What is in a network graph, and what is needed.

2. DAMD data

We have received DAMD data, let's explore it as a graph.


In [ ]:
import pandas as pd

If you place the data file in the same directory as this Notebook, it can be read into Python.


In [ ]:
damd = pd.read_csv("20170718 hashtag_damd uncleaned.csv")
damd.head(3)

Describe in words what is the shape of the data? What is an item of data? What do we know about each of the items? Do you already have an idea how would you like to start analyzing such data?

3. DAMD data as a graph

To explore the DAMD data, let's conceptualize how rows are related to one another? Let's imagine a graph. Don't hesitate to grab pen+paper or the whiteboards.

We can use Table 2 Net to build such a graph. The tool will give us a .gexf graph file. Build a bipartite graph, with tweet_id and hashtags as the two types of nodes, separating the latter by ;. Open .gexf file with you browser, what does it look like? Is it different shape that the .csv file?

3.1 Alternative graph creation in Python

Ooh it so happens, that ETHOS Lab has a little code thing to turn a matrix to a graph. Please take a look. If you copypaste the buildHashtagCooccurrenceGraph function definition below and have run the code earlier in this notebook, you can create the graph in Python.


In [ ]:
import networkx as nx

# copy+paste the function definition below
def buildHashtagCooccurrenceGraph(tweets):
    g = ....
    .
    .
    return g

In [ ]:
damd_graph = buildHashtagCooccurrenceGraph(damd)
print(nx.info(damd_graph))

In [ ]:
nx.write_gexf(damd_graph, "damd_graph.gexf")

4. Work in Gephi

Once we have created the graph, we use Gephi to interactively and visually explore it. Gephi is a popular and powerful tool to visualize and analyze graphs. There are other tools too, of course.

Most of Thursday is spent in Gephi.

4.1 Example graph visualization

Here is an example visualization of DAMD data. It is a hashtag cooccurrence graph, with red tweets and green hashtags. Top hashtags labels are shown.

The central node, the one for hashtag #damd has been removed. Can you tell why?

The process for producing the above visualization, in Gephi, was approximately as follows:

  1. Run ForceAtlas2 for a while to position the nodes
  2. Set gravity to 0.01, and enable stronger gravity
  3. Filter out the central node, using NOT operator and it's degree value
  4. Color the nodes by attribute Type
  5. Show labels, and increase label size
  6. Filter nodes with degree over 15, and hide the values of other
  7. Run ForceAtlas2 for a while to reposition the nodes
  8. Tweak visual settings, e.g. set node opacity to 50%
  9. Export to PNG file

5. Homework: help John

What qualitative questions come up when you explored the graphs? What quantitative questions came up?

Think about how is what you did today answering what John is trying to research. How would you tweak it (in your mind?). How would you sketch it? How would your tweaks look like? Tweaks are conceptual, not code