In [ ]:
%pylab
%matplotlib inline
import pandas as pd
from pandas import DataFrame, Series

import json
import seaborn as sns

import IPython
from IPython.display import Image, display

pd.set_option('display.max_rows', 500)
pd.set_option('display.max_columns', 500)

sns.set_style("darkgrid")
sns.set_palette("bright")

plt.rcParams['figure.figsize'] = (13, 9)
plt.rcParams['font.family'] = 'sans-serif'

Facebook Narcissism

Earlier in the chapter we mentioned a study that looked at ratings of Facebook profile pictures (rated on coolness, fashion, attractiveness and glamour) and predicting them from how highly the person posting the picture scores on narcissism (Ong et al., 2011).

Field, Andy,Miles, Jeremy,Field, Zoe. Discovering Statistics Using R (p. 133). SAGE Publications. Kindle Edition.

The data structure is:

  1. id: a number indicating from which participant the profile photo came.
  2. NPQC_R_Total: the total score on the narcissism questionnaire.
  3. Rating_Type: whether the rating was for coolness, glamour, fashion or attractiveness (stored as strings of text).
  4. Rating: the rating given (on a scale from 1 to 5).

Field, Andy,Miles, Jeremy,Field, Zoe. Discovering Statistics Using R (p. 133). SAGE Publications. Kindle Edition.


In [ ]:
facebookdata = pd.read_table('../../DSUR/04/FacebookNarcissism.dat')

In [ ]:
facebookdata.head(10)

In [ ]:
sns.lmplot(data=facebookdata, x="NPQC_R_Total", y="Rating", fit_reg=False)

In [ ]:
sns.lmplot(data=facebookdata, x="NPQC_R_Total", y="Rating", col="Rating_Type", y_jitter=.25,fit_reg=False)

In [ ]:
sns.lmplot(data=facebookdata, x="NPQC_R_Total", y="Rating", markers=["x","o","s","v"], hue="Rating_Type", y_jitter=0.25,fit_reg=False)

Exam Anxiety

For example, a psychologist was interested in the effects of exam stress on exam performance. So, she devised and validated a questionnaire to assess state anxiety relating to exams (called the Exam Anxiety Questionnaire, or EAQ). This scale produced a measure of anxiety scored out of 100. Anxiety was measured before an exam, and the percentage mark of each student on the exam was used to assess the exam performance.

Field, Andy,Miles, Jeremy,Field, Zoe. Discovering Statistics Using R (p. 136). SAGE Publications. Kindle Edition.


In [ ]:
examdata = pd.read_table('../../DSUR/04/Exam Anxiety.dat')

In [ ]:
examdata.head(10)

In [ ]:
sns.lmplot(data=examdata, x="Anxiety", y="Exam", fit_reg=False)

In [ ]:
sns.lmplot(data=examdata, x="Anxiety", y="Exam")

In [ ]:
sns.lmplot(data=examdata, x="Anxiety", y="Exam", order=2)

In [ ]:
sns.lmplot(data=examdata, x="Anxiety", y="Exam", order=3, ci=None)

In [ ]:
sns.lmplot(data=examdata, x="Anxiety", y="Exam", hue="Gender")

In [ ]:
sns.lmplot(data=examdata, x="Anxiety", y="Exam", col="Gender", hue="Gender")

Festival Data

Hygiene was measured using a standardized technique (don’t worry, it wasn’t licking the person’s armpit) that results in a score ranging between 0 (you smell like a corpse that’s been left to rot up a skunk’s arse) and 4 (you smell of sweet roses on a fresh spring day).

Field, Andy,Miles, Jeremy,Field, Zoe. Discovering Statistics Using R (p. 142). SAGE Publications. Kindle Edition.


In [ ]:
festivaldata = pd.read_table('../../DSUR/04/DownloadFestival.dat')

In [ ]:
festivaldata.head(10)

In [ ]:
sns.distplot(festivaldata.day1, kde=False)

In [ ]:
d1 = festivaldata.day1
print ("Stdev:", d1.std())
print ("3x Stdev:", 3*d1.std())
print ("Distance from mean:", ((d1-d1.mean()).abs()).head(5))

In [ ]:
cleand1 = d1[~((d1-d1.mean()).abs()>3*d1.std())]

In [ ]:
sns.distplot(cleand1, kde=False)