To read the data, we're going to use the csv
module. First, we need to import it:
In [1]:
import csv
Then, we need to go through all the rows in the file, and for each add the RecombinantFraction
to the right genetic Line
and InfectionStatus
. To do so, we need to choose a data structure. Here we use a dictionary, where the keys are given by Line
, and each value of the dictionary is another dictionary where the keys W
and I
index lists of RecombinantFraction
.
In [2]:
my_data = {}
In [3]:
with open('../data/Singh2015_data.csv') as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
my_line = row['Line']
my_status = row['InfectionStatus']
my_recomb = float(row['RecombinantFraction'])
# Test by printing the values
print(my_line, my_status, my_recomb)
# just print the first row
break
Now we need to perform operations for each row. First, we're going to check whether my_data
already contains the Line
for that row. If not, we'll create the key-value in the dictionary. Then, we're going to add the value to the list.
In [4]:
with open('../data/Singh2015_data.csv') as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
my_line = row['Line']
my_status = row['InfectionStatus']
my_recomb = float(row['RecombinantFraction'])
# if my_line is not present in the dictionary:
if my_line not in my_data:
# create and initialize with a dictionary containing
# two empty lists
my_data[my_line] = {'W': [], 'I': []}
# Now insert the value in the right list
my_data[my_line][my_status].append(my_recomb)
Now we should have the data organized in a nice structure:
In [5]:
my_data
Out[5]:
Time to calculate the means and print the results:
In [6]:
for line in my_data:
print('Line', line, 'Average Recombination Rate:')
# extract the relevant data
my_subset = my_data[line]
for status in ['W', 'I']:
print(status, ':', end = '') # to prevent new line
my_mean = sum(my_subset[status])
my_num_elements = len(my_subset[status])
my_mean = my_mean / my_num_elements
print(' ', round(my_mean, 3))
print('') # to separate the lines