Statistics: The Science of Decisions Project Instructions

Background Information

In a Stroop task, participants are presented with a list of words, with each word displayed in a color of ink. The participant’s task is to say out loud the color of the ink in which the word is printed. The task has two conditions: a congruent words condition, and an incongruent words condition. In the congruent words condition, the words being displayed are color words whose names match the colors in which they are printed: for example RED, BLUE. In the incongruent words condition, the words displayed are color words whose names do not match the colors in which they are printed: for example PURPLE, ORANGE. In each case, we measure the time it takes to name the ink colors in equally-sized lists. Each participant will go through and record a time from each condition.

Questions For Investigation

As a general note, be sure to keep a record of any resources that you use or refer to in the creation of your project. You will need to report your sources as part of the project submission.

What is our independent variable? What is our dependent variable?

R: Independent: Words congruence condition. Dependent: Naming time.

What is an appropriate set of hypotheses for this task? What kind of statistical test do you expect to perform? Justify your choices.

R: Where $\mu_{congruent}$ and $\mu_{incongruent}$ stand for congruent and incongruent population means, respectively: $H_0: \mu_{congruent} = \mu_{incongruent} $ — The time to name the ink colors doesn't change with the congruency condition

$H_A: \mu_{congruent} \neq \mu_{incongruent} $ — The time to name the ink colors changes with the congruency condition

To perform the test I will use a 2-tailed paired t-test. A t-test is apropriated since we don't the standard deviations of the population. A two-sample kind of t-test is necessary since we don't know the population mean. The sample sizes is below 30 (N=24), which is compatible with a t-test. I am also assuming that the population is normally distributed.

Now it’s your chance to try out the Stroop task for yourself. Go to this link, which has a Java-based applet for performing the Stroop task. Record the times that you received on the task (you do not need to submit your times to the site.) Now, download this dataset which contains results from a number of participants in the task. Each row of the dataset contains the performance for one participant, with the first number their results on the congruent task and the second number their performance on the incongruent task.

Report some descriptive statistics regarding this dataset. Include at least one measure of central tendency and at least one measure of variability.

R: Central tendency: mean; measure of variability: standard deviation.



In [9]:

    
%matplotlib inline
import pandas
import matplotlib.pyplot as plt
plt.rcParams['figure.figsize'] = (16.0, 8.0)

df = pandas.read_csv('./stroopdata.csv')



In [5]:

    
df.describe()









    Out[5]:






  
    
      
      Congruent
      Incongruent
    
  
  
    
      count
      24.000000
      24.000000
    
    
      mean
      14.051125
      22.015917
    
    
      std
      3.559358
      4.797057
    
    
      min
      8.630000
      15.687000
    
    
      25%
      11.895250
      18.716750
    
    
      50%
      14.356500
      21.017500
    
    
      75%
      16.200750
      24.051500
    
    
      max
      22.328000
      35.255000

Provide one or two visualizations that show the distribution of the sample data. Write one or two sentences noting what you observe about the plot or plots.



In [12]:

    
df.hist()









    Out[12]:





array([[<matplotlib.axes._subplots.AxesSubplot object at 0x1133a1890>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x11720e310>]], dtype=object)

This histograms show that, in this sample, times are longer in the incrongruent experiment than in the congruent experiment.
In the congruent experiment, the interval with more values is aproximately between 14 and 16 values. In the incronguent experiment the interval with more values is aproximately (20,22).

Now, perform the statistical test and report your results. What is your confidence level and your critical statistic value? Do you reject the null hypothesis or fail to reject it? Come to a conclusion in terms of the experiment task. Did the results match up with your expectations?

R: I'm going to perform the test for a confidence level of 95%, which means that our t-critical values are {-2.069,2.069}



In [31]:

    
import math
df['differences'] = df['Incongruent']-df['Congruent']
N =df['differences'].count()
print "Sample size:\t\t%d"% N
print "DoF:\t\t\t%d"%(df['differences'].count()-1)
mean = df['differences'].mean()
std = df['differences'].std()
tscore = mean/(std/math.sqrt(N))
print "Differences Mean:\t%.3f" % mean
print "Differences Std:\t%.3f" % std
print "t-score:\t\t%.3f" %tscore









    



Sample size:		24
DoF:			23
Differences Mean:	7.965
Differences Std:	4.865
t-score:		8.021

We can reject the null hypothesis, since the t-score is greater than 2.069. In this case I have used $\alpha=0.05$, but a bigger confidence level could also reject $H_0$. This means that incongruency affects the naming time, which validates the evidence found in the histograms.

Optional: What do you think is responsible for the effects observed? Can you think of an alternative or similar task that would result in a similar effect? Some research about the problem will be helpful for thinking about these two questions!

The effects observed are related with the reaction time of our brain. When there is congruency our brain does not need to make a conscient operation and the participant can trust in the first response provided by the brain. When there is incongruency, the participant has conscienscly go through the process of finding the color, which results in a longer response time. Another experiment would be writing with different types of keyboards (e.g., QWERTY, AZERTY, etc.)



In [ ]:

	Congruent	Incongruent
count	24.000000	24.000000
mean	14.051125	22.015917
std	3.559358	4.797057
min	8.630000	15.687000
25%	11.895250	18.716750
50%	14.356500	21.017500
75%	16.200750	24.051500
max	22.328000	35.255000