Iterations and Relationships in Data

We have seen in previous tutorials Playing with python a bit about sequential data structures such as lists and how we can use them to store a set of numbers, words and other type of object. In this tutorial we will examine how to go through sequential data structures in order to combine their contents to obtain a result.

Loops

Loops are statements which allow us to iterate (go through) the elements insice a list. By this I mean say I have the following attendance list for a school

Student   Absence
-----------------------
John        yes
Paul        no
Kate        yes
Catriona    yes
Peter       no
Alex        yes
Jordan      no
William     yes

And I want to find out the percentage of students absent for the record given above. To do that I have to scan through the list and count the number of "no's" out of the total number of students. The process of scanning through the list is a form of looping. In this example we have 3 out of 8 students absent.

For Loops

These are the most simple and used type of loops. They have a general structure of:

# loop statement:
for element in list_of_elements:
    # Loop body (things done to the element like counting the
    # no's happen here

The idea is that they pull out every element in a list of elements in our example above (the table of students) the first iteration of the loop would pull out John, yes the third iteration would pull out Kate, yes and the sixth iteration would pull out ?


In [7]:
days_of_the_week = ["Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday"]
# This is a for loop
for day in days_of_the_week:
    print(day)


Monday
Tuesday
Wednesday
Thursday
Friday
Saturday
Sunday

In [8]:
# If we would like to know the place(index) the element
# has in the list we can modify our for loop slighty
for index, day in enumerate(days_of_the_week):
    print(index, day)

# What enumerate does
print("\nenumerate(days_of_the_week): \n\t" + str(list(enumerate(days_of_the_week))))
# Enumerate gives you a list of tuples which the for loop statement
# can split using the syntactic sugar we used above for element1, element 2 in list_of_tuples


0 Monday
1 Tuesday
2 Wednesday
3 Thursday
4 Friday
5 Saturday
6 Sunday

enumerate(days_of_the_week): 
	[(0, 'Monday'), (1, 'Tuesday'), (2, 'Wednesday'), (3, 'Thursday'), (4, 'Friday'), (5, 'Saturday'), (6, 'Sunday')]

In [9]:
# List of ages in a DDS tutorial
age_list = [18, 17, 30, 20, 22]
# This is a function declaration
def my_sum(list_of_numbers):
    # This is the body of the function
    output_sum = 0
    for number in list_of_numbers:
        # updates output sum:
        output_sum = output_sum + number
        # alternatively:
        # output += number
    return output_sum

# Test that my_sum works correctly:
assert(my_sum(age_list) == sum(age_list))
print(my_sum(age_list) == sum(age_list))


True

Your Turn:

  • Fill in the body of the script and function bellow in order to automate the process of finding the number of students which are absent.

In [17]:
# Find a way way of representing this table suitably in python
# Hint : a list of something....
student_absence_table = []
def count_absent_students(list_of_students):
    # think of how this was used in my_sum
    number_of_absence = 0
    for student, status in list_of_students:
        # check if status is yes or no ?
        # how would you check this ?
        pass
    return number_of_absence

While loops

Another very popular way of looping which has the following structure:

# loop statement:
condition = True
while(condition):
    # Loop body (things done to the element like counting the
    # no's happen here
    # condition may be updated in body

The idea is that the body of the loop keeps getting executed untill the condition turns false. Lets rewrite sum using a while loop:


In [11]:
# List of ages in a DDS tutorial
age_list = [18, 17, 30, 20, 22]
# This is a function declaration
def my_while_sum(list_of_numbers):
    # This is the body of the function
    copy_of_list = list(list_of_numbers)
    output_sum = 0
    # Checks that the list of numbers is not empty
    while(len(copy_of_list) > 0):
        # some_list.pop() pops out an element form some_list
        # reducing its size by 1 each time
        # updates output sum:
        output_sum = output_sum + copy_of_list.pop()
        # alternatively:
        # output += list_of_numbers.pop()
    return output_sum

# Test that my_sum works correctly:
assert(my_while_sum(age_list) == sum(age_list))
print(my_while_sum(age_list) == sum(age_list))


True

Relationships in Data

Lets say I want to represent the tutoring structure of this course in some form of table:

Member   Information   Tutors Directly
---------------------------------------
Ewan        Lecturer   Gavin, Francisco
Gavin       TA         Kate, Paul
Francisco   TA         John, Catriona
John        Student    
Catriona    Student    William, Jordan
Kate        Student 
Paul        Student
William     Student
Jordan      Student

William and Jordan joined the course late so they are getting more direct help from Catriona through the Universities P@LS scheeme in order to catch up.

It is quite clear that there is a hierarchichal structure in this table yet the table representation does not allow us to make much out of the strucutre. There are also relationships in the data members of the DDS course who are directly linked to each other yet this is not so easy to see when looking at this table.

Instead of a table a network like structure with nodes(vertices or dots) can be used to represent the students and edges in between the nodes to represent their direct connection. This kind of network diagram is called a graph (dont confuse with a cartesian plot). For this data set we will be dealing with a special type of graph that is very important in computer science called a binary tree. We will be representing our binary trees with a data structure used to represent graphs called an adjecency list (since it is more or less the same as our table):

Member(Nodes)      Tutors Directly (Connected to)
---------------------------------------------------
Ewan               Gavin, Francisco
Gavin              Kate, Paul
Francisco          John, Catriona    
Catriona           William, Jordan

NOTE: This is not the best way of representing binary trees but that is out of the scope of the course. You may visit the wikipedia page for binary trees if you want some additional information as to how to encode them. We can do this in python using a dictionary lets have a look at what the tree for this table looks like:


In [1]:
# Tree support functions
from dds_lab.graphdat import *
# Graphing facilities
from bokeh.io import output_notebook
from bokeh.plotting import show
output_notebook()


BokehJS successfully loaded.

In [2]:
names = ["Ewan", "Francisco", "Gavin", "John", "Catriona", "Kate", "Paul", "William", "Jordan"]
info = ["Lecturer", "TA", "TA", "Student", "Student", "Student", "Student", "Student", "Student"]
# the adjacency list is more or less exactly the same as our table without the information collumn:
adjacency_list = {"EK": ["FV", "GG"],
                  "FV": ["S3", "S4"],
                  "GG": ["S1", "S2"],
                  "S4": ["S5","S6"]}

#Generate tree plot
fig, ad_list = get_tree_plot(adjacency_list,
                            names=names,
                            info=info,
                            title="DDS Help Tree")
show(fig)


Now You Try

  • For the tree diagrame bellow fill in the adjecency list like table:
Nodes      Adjecent to
------------------------
  • Convert this tabel to the python adjecency list data structre.
  • [Optional\Advance] Using a for loop travel down from EK to S3 in the DDS tree

For more information on parse trees check here


In [14]:
show(adjacency_list_exercise())


---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-14-6b1639216348> in <module>()
----> 1 show(adjacency_list_exercise())

NameError: name 'show' is not defined

In [15]:
# Fill in:
adjacency_list2 = {}
names = ["Sentence",
         "Noun phrase",
         "Verb phrase",
         "Verb",
         "Noun phrase",
         "Delimiter",
         "Noun"]
inf = ["John hit the ball",
       "John",
       "hit the ball",
       "hit",
       "the ball",
       "the",
       "ball"]
# This is what your attempt will look like (second plot titled your attempt)
show(get_tree_plot(adjecency_list2,
                   names=names,
                   info=inf,
                   title="Your attempt")[0])


---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-15-9001b04d29bc> in <module>()
     16        "ball"]
     17 # This is what your attempt will look like (second plot titled your attempt)
---> 18 show(get_tree_plot(adjecency_list2,
     19                    names=names,
     20                    info=inf,

NameError: name 'show' is not defined

In [16]:
# Reminder:
print("Adjecency list: " + str(adjecency_list))
root = 'EK'
current_level = [root]
# While there are still children
while(len(current_level) != 0):
    # Hints :
    print("Current node: " + str(current_level))
    print("Left child: " + str(adjecency_list[current_level[0]][0]))
    print("Right child: " + str(adjecency_list[current_level[0]][1]))
    # Think how to update current level to go down the correct path.
    # Consider using conditionals to cater for edge cases and
    # index out of bounds errors.
    current_level = []
# You should be printing : EK, FV , S3


Adjecency list: {'FV': ['S3', 'S4'], 'EK': ['FV', 'GG'], 'GG': ['S1', 'S2'], 'S4': ['S5', 'S6']}
Current node: ['EK']
Left child: FV
Right child: GG