Questions for Investigation
This experiment will require the use of a standard deck of playing cards. This is a deck of fifty-two cards divided into four suits (spades, hearts, diamonds, and clubs), each suit containing thirteen cards (Ace, numbers 2-10, and face cards Jack, Queen, and King). You can use either a physical deck of cards for this experiment or you may use a virtual deck of cards such as that found on random.org (http://www.random.org/playing-cards/). For the purposes of this task, assign each card a value: The Ace takes a value of 1, numbered cards take the value printed on the card, and the Jack, Queen, and King each take a value of 10.
Criteria
Responses to Project Questions
Question 1: Plotting a histogram of card values
Question 2: Obtain samples from a deck of cards
Question 3: Report descriptive statistics regarding sample taken
Question 4: Plotting a histogram of sampled values
Question 5: Making estimates based on the sampled distribution
Does Not Meet Specifications
Meets Specifications
The histogram does not accurately reflect the card values’ relative frequency distribution or no histogram is provided.
A histogram is provided that accurately reflects the card values’ relative frequency distribution.
Sampled data is not provided, insufficient, or does not reflect the experiment being performed for the project.
At least thirty samples have been performed and the summed values from each sample have been reported in a submitted spreadsheet.
Two measures of central tendency and variability are not reported to describe the sample or are not computed correctly.
At least two measures of central tendency and two measures of variability are accurately reported to summarize and describe the samples taken for Question 2.
The histogram does not accurately reflect sampled values or no histogram is provided. No discussion of the shape of the distribution is provided or comparison is not well-justified.
A histogram accurately reflecting the sampled data is provided. Discussion of the shape is provided, including a comparison to that of the histogram of the original card values.
Estimates made for the prompted questions do not reflect the values obtained from the sample.
Estimates are made for the prompted questions that reflect the samples taken and their distribution.
Lets start with a rough draft of generating a deck of cards. Lets create 4 suites first. Lets create a dictionary of each suite in the deck named deck
In [32]:
import matplotlib
%matplotlib inline
from matplotlib.pylab import hist
import numpy as np
from numpy import mean, std, var
from pprint import pprint
In [13]:
heart_suite = {"H": [str(i) + 'H' for i in ['A']+range(2,11)+['J', 'Q', 'K']]}
diamond_suite = {"D": [str(i) + 'D' for i in ['A']+range(2,11)+['J', 'Q', 'K']]}
club_suite = {"C": [str(i) + 'C' for i in ['A']+range(2,11)+['J', 'Q', 'K']]}
spade_suite = {"S": [str(i) + 'S' for i in ['A']+range(2,11)+['J', 'Q', 'K']]}
deck = (heart_suite, diamond_suite, club_suite, spade_suite)
deck = {k:{value:num+1 for num, value in enumerate(cardict[k])} for cardict in deck for k in cardict.iterkeys()}
print deck
Lets create a function to pick a pseudo random card out of 4 randomly selected cards from the dictionary of 4 suites.
In [14]:
def random_deck_card(deck):
selection = ({t[0]:deck[t[0]][t[1]] \
for t in \
[(k, random.choice(deck[k].keys())) \
for k in deck.iterkeys()]})
res = random.choice(selection.keys())
return res, selection[res]
Lets create a function to generate as asked in the question, a list of 30 samples with each of their sums, from 3 random selections using random_deck_card and deck
In [15]:
def sum_shuffled_cards(deck, mini_range, grand_range):
return [(lambda x: (x, sum([i[1] for i in x])))([random_deck_card(deck) \
for _ in xrange(mini_range)]) for _ in xrange(grand_range)]
In [26]:
#the historgram of this is almost uniform. for instance;
all_cards = reduce(lambda x,y: (x+y), [[i for i in xrange(len(dct))] for dct in deck.itervalues()])
print hist(all_cards)
Lets create a histogram of 30 samples from 3 randomly picked cards each time.
In [33]:
shuffled_cards = sum_shuffled_cards(deck, 3, 30)
print hist(zip(*shuffled_cards)[1])
Lets see the shuffled cards
In [34]:
pprint(shuffled_cards)
Lets analyze the sample distribution of samples.
In [47]:
sum_samples = zip(*shuffled_cards)[1]
#standard error
all_cards_std = np.std(all_cards)
se = all_cards_std/np.sqrt(len(sum_samples))
#mean
sum_samples_mean = mean(sum_samples)
print "Average of list of Sum of all samples: {}\n".format(sum_samples_mean)
print hist(sum_samples)
#std deviation
sum_sample_std = sum([(x - sum_samples_mean)**2 for x in sum_samples])/float(len(sum_samples)-1)
print "\nUnbiased STD DEV of list of sum of all samples std: {}\n".format(sum_sample_std)
print "Standard Error of distribution of 30 sample size of 3 samples each: {}\n".format(se)
print "Average of list of Sum of all cards: {}\n".format(mean(all_cards))
print hist(all_cards)
print "STD DEV Of all Cards in a suite: {}\n".format(all_cards_std)
#the histrogram plot - averaging at around 20.9666666667
print hist(zip(*shuffled_cards)[1])
There are 2 histograms. Green: Distribution of all cards in a Deck- The distribution is uniform because each card is exactly 4 in number. Re: Distribution shows almost a normal curve. This is because we took 30 samples fo 3 random picks of cards per turn and it shows the most likely/frequent value it sums upto, averaging at around 20.96 ~ 21 - which mean on an average the 3 rounds we picked, summed up to around 21. Had we taken not sum but MEAN and then sampled them, it would have been more normal, smoother, skinnier.
Now lets, calculate the SE, Mean and Standard Deviation of this sample of mean plot against the SE, Mean and Standard Deviation of the plot for summed samples. It is shone in the graph below of what happens if we dont sum but take Mean of each times and then take 30 samples of each mean, what will be its mean.
In [48]:
shuffled_cards_sample_mean = [mean([i[1] for i in each_stack[0]]) for each_stack in shuffled_cards]
print hist(shuffled_cards_sample_mean)
print hist(sum_samples)
print "\nAverage of list of Sum of all samples: {}\n".format(sum_samples_mean)
print "\nUnbiased STD DEV of list of sum of all samples std: {}\n".format(sum_sample_std)
print "Standard Error of distribution of 30 sample size of 3 samples each: {}\n".format(se)
#mean and std deviation
mean_shuffled_cards_sample_mean = mean(shuffled_cards_sample_mean)
print "Average of list of Mean of all samples: {}\n".format(mean_shuffled_cards_sample_mean)
shuffled_cards_sample_mean_std = sum([(x - mean_shuffled_cards_sample_mean)**2 for x in shuffled_cards_sample_mean])/float(len(shuffled_cards_sample_mean)-1)
print "\nUnbiased STD DEV of list of sum of all samples std: {}\n".format(shuffled_cards_sample_mean_std)
mean_se = std(shuffled_cards_sample_mean)/np.sqrt(len(shuffled_cards_sample_mean))
print "Standard Error of distribution of 30 sample size of 3 samples each: {}\n".format(mean_se)
As we can see from above, the difference between sampling:
We can note:
which means SE of Mean Samples is lower than of Sum Samples. and Average has come down from ~21 to 6.98888888889 which is alot closer to Average of list of Sum of all cards: 6.0;
However, In the function sum_shuffled_cards If we raise the mini_range from 3 to say 30 and grand_range from 30 to 1000, the curve will be even smoother and skinnier with narrower tails meaning, the average will become more defined and precise.
In [ ]: