Question 1: What is our independent variable? What is our dependent variable?
Answer: Word condtion is our independent variable i.e. congruent or incongruent . Time taken to name the ink color is our dependent variable.
Question 2: What is an appropriate set of hypotheses for this task? What kind of statistical test do you expect to perform? Justify your choices.
Answer:
Hypothesis:
We can perform t test
on data because only sample data are available.
In t test
we don't have any information available about population but we want to make inference about population based on sample data available. e.g. We want to know the election outcome in India. We can't conduct a survey on whole population. So we conduct survey on some random people across demographics, religion, caste, sex and age. Now this data acts as a sample as we didn't conduct survey on whole population. Based on this we want to make inference about the whole population i.e. who is going to win.
Here we have data from a stroop task conducted on some people and we are trying to make inference whether there is a difference between congruent means and incongruent means or not.
In [144]:
import csv
from pprint import pprint
import math
stat = {'Congruent': { 'data': [] }, 'Incongruent': { 'data': [] }, 'Difference': { 'data': [] }}
with open('./stroopdata.csv', 'r') as st_data:
reader = csv.DictReader(st_data)
for row in reader:
cong = float(row['Congruent'])
incong = float(row['Incongruent'])
diff = cong-incong
stat['Congruent']['data'].append(cong)
stat['Incongruent']['data'].append(incong)
stat['Difference']['data'].append(diff)
for k in stat:
print(k + ": ")
pprint(stat[k]['data'])
Question 3: Report some descriptive statistics regarding this dataset. Include at least one measure of central tendency and at least one measure of variability.
Answer: Some descriptive statistics from the sample data is as follows:
Data | Mean | Median | Variance | Standard Deviation |
---|---|---|---|---|
Congruent | 14.051 | 14.3565 | 12.669 | 3.559 |
Incongruent | 22.016 | 21.0175 | 23.012 | 4.797 |
Difference | -7.965 | -7.666 | 23.667 | 4.865 |
In [145]:
def variance(data):
"""
This function returns variance of given sample data.
"""
mean = sum(data)/len(data)
squared_diff = 0
for d in data:
squared_diff += pow((d - mean), 2)
return squared_diff/(len(data) - 1)
In [146]:
for k in stat:
list_data = sorted(stat[k]['data'])
count = len(list_data)
mean = sum(list_data)/count
median = list_data[int(count/2)]
var = variance(list_data)
std = math.sqrt(var)
if count%2 == 0:
median = (list_data[int(count/2)] + list_data[int(count/2 - 1) ])/2
stat[k]['mean'] = mean
stat[k]['median'] = median
stat[k]['variance'] = var
stat[k]['std'] = std
print('\n' + k + ': ')
print('Mean: ', mean)
print('Median: ', median)
print('Variance: ', var)
print('Standard Deviation: ', std)
Question 4: Provide one or two visualizations that show the distribution of the sample data. Write one or two sentences noting what you observe about the plot or plots.
Answer: Data is visualized below:
Observations:
In [153]:
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
colors = {'Congruent': 'red', 'Incongruent': 'green', 'Difference':'blue'}
for k in stat:
ax.hist(stat[k]['data'], color = colors[k], label=k, rwidth = 0.9)
legend = ax.legend(loc='upper left')
plt.title('Distribution of Time')
plt.xlabel('Time Taken')
plt.show()
Question 5: Now, perform the statistical test and report your results. What is your confidence level and your critical statistic value? Do you reject the null hypothesis or fail to reject it? Come to a conclusion in terms of the experiment task. Did the results match up with your expectations?
Answer:
t
value after t test
is -8.021
which has a probablity of less than 0.0001
.-9.667
and -6.263
at $\alpha = 0.10$-8.021
is statistically significant hence we reject the test.
In [156]:
list_d = stat['Difference']['data']
se = stat['Difference']['std']/math.sqrt(len(list_d))
t = (stat['Congruent']['mean'] - stat['Incongruent']['mean'])/se
print(se)
print(t)
Question 6: Optional: What do you think is responsible for the effects observed? Can you think of an alternative or similar task that would result in a similar effect? Some research about the problem will be helpful for thinking about these two questions!
Answer:
Resources Used: